efeslab / Atom

[MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
259 stars 21 forks source link

not including dynamic quantizaiton when reproducing results, why? #6

Closed priscilla-pan closed 7 months ago

priscilla-pan commented 7 months ago

I have reproduced the results of llama-7b, the ppl of WikiText2 is the same as it in Table 3(6.16). But in the code you provide, the dynamic quantization for activations is not included. As far as I know, dynamic quantization also cause quantization error. Why omit dynamic quantization in your code?

happierpig commented 7 months ago

Hi,

Thanks for your interest in Atom. Sorry for being confusing in our codebase. But I believe the code logic of the accuracy part strictly follows the workflow shown in our paper (Check Fig.6), including dynamic activation quantization.

To be specific, Atom fuses the dynamic quantization operator into the previous element-wise operator, e.g. layer norm or activate function. You can check with code snippets like LayerNorm and Activate.

Hope this can solve your question.

priscilla-pan commented 7 months ago

In quant.pyhttps://github.com/efeslab/Atom/blob/e89479c5d58a9a7669175a5b186d2d171f337e49/model/quant.py#L254 you just return , https://github.com/efeslab/Atom/blob/e89479c5d58a9a7669175a5b186d2d171f337e49/model/quant.py#L249

not quantized, actually

happierpig commented 7 months ago

Here is just the default constructor of our wrapper class Quantizer. We configure all quantization functions (in fact replace the lambda x: x) at here. Please check with this.