OpenGVLab / OmniQuant

[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.
MIT License
689 stars 53 forks source link

OPT-30B #76

Open Arthur-Ling opened 5 months ago

Arthur-Ling commented 5 months ago

" CUDA_VISIBLE_DEVICES=0 python main.py \ --model /home/Projects/model_zoo/facebook/opt-30b \ --epochs 20 --output_dir ./log/opt-30b-w6a6 \ --wbits 6 --abits 6 --lwc --let --alpha 0.75 --eval_ppl \ --net opt-30b "

When I use omniquant to quantizate OPT-30B to w6a6, an error happens in omniquant.py: scale = (act.pow(args.alpha)/weight.pow(1-args.alpha)).clamp(min=1e-5)

" RuntimeError: The size of tensor a (7168) must match the size of tensor b (5120) at non-singleton dimension 0 "

I find the shape of act is [7168], but the shape of weight is [5120].