Closed chenzx921020 closed 2 months ago
What command did you use?
What command did you use?
['main.py', '--model', '/data01/llama2-7b-hf/', '--epochs', '20', '--output_dir', './log/llama--7b-w4a4', '--eval_ppl', '--wbits', '4', '--abits', '4', '--lwc', '--let', '--act-scales', './act_scales/llama2-7b.pt', '--act-shifts', './act_shifts/llama2-7b.pt', '--net', 'Llama-2-7b']
Please following this script:
CUDA_VISIBLE_DEVICES=0 python main.py \
--model /PATH/TO/LLaMa/Llama-2-7b --eval_ppl \
--epochs 20 --output_dir ./log/Llama-2-7b-w4a4 \
--wbits 4 --abits 4 --lwc --let \
--let_lr 1e-3 --alpha 0.75
ok, thankyou
When I do W4A4 weight-activation quantization with Omniquant on llama2-7b, I found ppl in wikitext2 is unstable, in 3090 is 17.24, in A6000 is 19.95, all settings is following paper, but in paper it's 14.26. Why does the above phenomenon occur?