OpenGVLab / OmniQuant

[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.
MIT License
626 stars 49 forks source link

OPT Model Reproduction Discrepancies #63

Closed fantasysee closed 4 months ago

fantasysee commented 4 months ago

Dear authors,

Thank you for sharing your remarkable work.

I am currently focusing on replicating the evaluation results mentioned in your paper as part of our research efforts. While I've successfully matched results for llama-7b and llama-2-7b, I've encountered discrepancies in the OPT series results. Please see my attached pictures.

I'm using the OPT-1.3b model from https://huggingface.co/facebook/opt-1.3b with the following command line:

CUDA_VISIBLE_DEVICES=0 python main.py \ --model /PATH/TO/OPT-1.3b \ --epochs 0 --output_dir ./log/test \ --eval_ppl --wbits 4 --abits 16 --group_size 128 --lwc \ --resume /PATH/TO/opt-1.3b-w4a16g128.pth

issue mismatch_c4 mismatch_wiki2

Could you confirm if the OPT models' base models match the ones used in your pre-trained models? If different, please specify.

Also, for replicating only the paper's results, I can skip the first three steps in the Readme's Usage section, Right?

Thank you for your time and assistance.

Best regards, Chao

ChenMnZ commented 4 months ago

Yew, you can skip the first three steps in the Readme's Usage section for the reproduction.

For weight-only quantization, we only use --lwc for LLaMA, but use both --lwc and --let for OPT.

So , for reproduction the results with our checkpoint of OPT, you should activate both --lwc and --lwt. For example:

CUDA_VISIBLE_DEVICES=0 python main.py
--model /PATH/TO/OPT-1.3b
--epochs 0 --output_dir ./log/test
--eval_ppl --wbits 4 --abits 16 --group_size 128 --lwc --let
--resume /PATH/TO/opt-1.3b-w4a16g128.pth

I just ran and here's the result: image

fantasysee commented 4 months ago

Thank you for your prompt assistance! I have successfully reproduced the OPT model results on my end. Appreciate your help!

Best regards, Chao