OpenGVLab / OmniQuant

[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.
MIT License
672 stars 52 forks source link

how to enable llama3-8b int4 awq models #90

Open FlexLaughing opened 2 weeks ago

FlexLaughing commented 2 weeks ago

Hi , I got an auto-awq models (--wbits=4 --groupsize=128),and using command to run the ppl base on gpu card,
--model /home/ubuntu/qllm_v0.2.0_Llama3-8B-Chinese-Chat_q4 --epochs 0 --eval_ppl --wbits 4 --abits 16 --lwc --net llama-7b met an error when parse https://github.com/OpenGVLab/OmniQuant/blob/main/quantize/int_linear.py#L26 seems QuantLinear define not support qweight for autoawq, Please have a check for the args, Thanks!