Closed xingchensong closed 9 months ago
It is weird. I will give it a try.
According to this comment https://github.com/OpenGVLab/OmniQuant/issues/25#issuecomment-1770278455
It might be caused by let, I will conduct a comparative experiment that does not involve it. (However, this issue still remains peculiar because w4a4 also incorporates the use of let)
Sorry for the confusion.
Actually, the command you used is right.
For LLaMa weight only quantization, we only use --lwc
.
For LLaMa weight-activation quantization, we use both --lwc
and --let
.
Adding the parameters --let_lr 1e-3
and --alpha 0.75
resolved the issue for the W4A8 configuration.
When I perform w4a4 quantization and w4a8 quantization separately on the Llama-2-7B-chat model, w4a8 yields significantly lower loss compared to w4a4. However, the PPL of w4a8 is "nan," while the PPL of w4a4 is 23.7.
please see the script and log I used to quantize the model:
w4a4
w4a8