W4A4 in llama2-7b - Githubissues

OpenGVLab / OmniQuant

[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.

MIT License

626 stars 49 forks source link

W4A4 in llama2-7b #70

Closed chenzx921020 closed 2 months ago

chenzx921020 commented 3 months ago

When I do W4A4 weight-activation quantization with Omniquant on llama2-7b, I found ppl in wikitext2 is unstable, in 3090 is 17.24, in A6000 is 19.95, all settings is following paper, but in paper it's 14.26. Why does the above phenomenon occur？

ChenMnZ commented 3 months ago

What command did you use?

chenzx921020 commented 3 months ago

What command did you use?

['main.py', '--model', '/data01/llama2-7b-hf/', '--epochs', '20', '--output_dir', './log/llama--7b-w4a4', '--eval_ppl', '--wbits', '4', '--abits', '4', '--lwc', '--let', '--act-scales', './act_scales/llama2-7b.pt', '--act-shifts', './act_shifts/llama2-7b.pt', '--net', 'Llama-2-7b']

ChenMnZ commented 3 months ago

Please following this script:

CUDA_VISIBLE_DEVICES=0 python main.py \
--model /PATH/TO/LLaMa/Llama-2-7b --eval_ppl \
--epochs 20 --output_dir ./log/Llama-2-7b-w4a4 \
--wbits 4 --abits 4 --lwc --let  \
--let_lr 1e-3 --alpha 0.75

chenzx921020 commented 3 months ago

ok， thankyou