ModelTC / llmc

[EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit".
https://arxiv.org/abs/2405.06001
Apache License 2.0
301 stars 30 forks source link

PPL results for AWQ is not correct? #161

Closed yc2367 closed 2 hours ago

yc2367 commented 3 hours ago

Hi, thanks for the nice work. I have one question regarding the AWQ evaluation for Wikitext PPL. I have run the original AWQ codebase, which gives Wikitext PPL of 6.14 for Llama-3-8B at FP16, same as what you have reported in Table 2 of the paper. However, the AWQ codebase w3g128 gives PPL 8.16, which is much lower than the 8.57 in your Table 2. As another reference, the Table 3 of EfficientQAT also shows AWQ w3g128 PPL of 8.16. May I know what is the difference in your Wikitext PPL evaluation script?

I think a similar issue occurs in Table 6 of the paper. I expect the Avg. PPL of AWQ to be lower than GPTQ based on running with their independent codebase. But yours shows AWQ PPL 10.98, much higher than GPTQ PPL 10.67.

Harahan commented 2 hours ago

In Table 2, we cancel the weight-clipping strategy and only consider the equivalent transformation (mentioned in the footnote). In Table 6, we run the two algorithms under the same condition (AWQ and GPTQ employ different calibration data and pre-process).

You can check the settings in our paper, where we also provide accuracy and PPL alignment experiments with their original implementations (under the same settings).

yc2367 commented 2 hours ago

Thanks a lot for the quick response. I understand now :)