Dear authors, we have tried llama-2-7b omniquant in W4A4 following the given script. It is not stable nor successful in training as follows, the loss in the first epoch is NAN. The training scheme is the same. Are there any solutions to fix it?
We set the epoch=1,
CUDA_VISIBLE_DEVICES=0 python main.py \
--model hzq/llama/llama-2-7b-hf --eval_ppl \
--epochs 1 --output_dir ./log/Llama-2-7b-chat-w4a4 \
--wbits 4 --abits 4 --lwc --let \
--let_lr 1e-3 --alpha 0.75
Dear authors, we have tried llama-2-7b omniquant in W4A4 following the given script. It is not stable nor successful in training as follows, the loss in the first epoch is NAN. The training scheme is the same. Are there any solutions to fix it? We set the epoch=1, CUDA_VISIBLE_DEVICES=0 python main.py \ --model hzq/llama/llama-2-7b-hf --eval_ppl \ --epochs 1 --output_dir ./log/Llama-2-7b-chat-w4a4 \ --wbits 4 --abits 4 --lwc --let \ --let_lr 1e-3 --alpha 0.75