OpenGVLab / OmniQuant

[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.
MIT License
689 stars 53 forks source link

[Llama-2-7B-chat] ppl of w4a8 is nan #51

Closed xingchensong closed 9 months ago

xingchensong commented 9 months ago

When I perform w4a4 quantization and w4a8 quantization separately on the Llama-2-7B-chat model, w4a8 yields significantly lower loss compared to w4a4. However, the PPL of w4a8 is "nan," while the PPL of w4a4 is 23.7.

please see the script and log I used to quantize the model:

ChenMnZ commented 9 months ago

It is weird. I will give it a try.

xingchensong commented 9 months ago

According to this comment https://github.com/OpenGVLab/OmniQuant/issues/25#issuecomment-1770278455

It might be caused by let, I will conduct a comparative experiment that does not involve it. (However, this issue still remains peculiar because w4a4 also incorporates the use of let)

ChenMnZ commented 9 months ago

Sorry for the confusion.

Actually, the command you used is right. For LLaMa weight only quantization, we only use --lwc. For LLaMa weight-activation quantization, we use both --lwc and --let.

xingchensong commented 9 months ago

Adding the parameters --let_lr 1e-3 and --alpha 0.75 resolved the issue for the W4A8 configuration.