OpenGVLab / OmniQuant

[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.
MIT License
626 stars 49 forks source link

Llama-3-8B #75

Open hsb1995 opened 3 months ago

hsb1995 commented 3 months ago

image

I have obtained the weight offset factor for llama3-8b, but there was a unique mismatch issue during my compression process.

image My scaling factor code has not been changed, but there was a dimension issue when I started compressing. The parameter settings are as follows: --model ${}$Llama-3-8b/ --epochs 20 --output_dir ${}$llama-3-8b-w6a6/ --eval_ppl --wbits 6 --abits 6 --lwc --let --net Llama-3-8b --tasks arc_easy,arc_challenge,boolq,hellaswag,winogrande,piqa

hsb1995 commented 3 months ago

w=16,a=16 I can obtain the uncompressed values of w=16 and a=16. But once the compression value is set(w=6,a=6), problems arise image

ChenMnZ commented 2 months ago

@hsb1995 LLaMA-3-8B uses GQA (Group Query Attention), which is not supported by current ‘let’.

hsb1995 commented 2 months ago

@hsb1995 LLaMA-3-8B uses GQA (Group Query Attention), which is not supported by current ‘let’.

Professor, thank you for your full work. I really don't know how GQA is handled as you mentioned

Can I understand what you said that I kept the original "generateAct_scale.shift" file unchanged to obtain the "act_scales" and "act_shifts" files. And then I will do our weight quantification for processing? Parameter settings: CUDA_VISIBLE_DEVICES=0 python main.py \ --model /PATH/TO/LLaMA/llama-8b \ --epochs 20 --output_dir ./log/llama-8b-w6a6 \ --eval_ppl --wbits 6 --abits 6 --lwc Is the above operation possible? I only deleted the let operation.

hsb1995 commented 2 months ago

Hey, professor. I gave it a try. It's really difficult to change. The current errors are as follows. What should I do when encountering these?

[2024-04-24 17:14:17 root](omniquant.py 50): INFO Starting ... Some weights of LlamaForCausalLM were not initialized from the model checkpoint at /home/sam/Doctorproject/weight/llama-3-8b/LLM-Research/Llama-3-8b/ and are newly initialized: ['model.layers.17.self_attn.rotary_emb.inv_freq', 'model.layers.1.self_attn.rotary_emb.inv_freq', 'model.layers.3.self_attn.rotary_emb.inv_freq', 'model.layers.4.self_attn.rotary_emb.inv_freq', 'model.layers.16.self_attn.rotary_emb.inv_freq', 'model.layers.31.self_attn.rotary_emb.inv_freq', 'model.layers.21.self_attn.rotary_emb.inv_freq', 'model.layers.10.self_attn.rotary_emb.inv_freq', 'model.layers.24.self_attn.rotary_emb.inv_freq', 'model.layers.28.self_attn.rotary_emb.inv_freq', 'model.layers.11.self_attn.rotary_emb.inv_freq', 'model.layers.13.self_attn.rotary_emb.inv_freq', 'model.layers.14.self_attn.rotary_emb.inv_freq', 'model.layers.15.self_attn.rotary_emb.inv_freq', 'model.layers.2.self_attn.rotary_emb.inv_freq', 'model.layers.20.self_attn.rotary_emb.inv_freq', 'model.layers.27.self_attn.rotary_emb.inv_freq', 'model.layers.0.self_attn.rotary_emb.inv_freq', 'model.layers.7.self_attn.rotary_emb.inv_freq', 'model.layers.6.self_attn.rotary_emb.inv_freq', 'model.layers.9.self_attn.rotary_emb.inv_freq', 'model.layers.29.self_attn.rotary_emb.inv_freq', 'model.layers.26.self_attn.rotary_emb.inv_freq', 'model.layers.22.self_attn.rotary_emb.inv_freq', 'model.layers.19.self_attn.rotary_emb.inv_freq', 'model.layers.12.self_attn.rotary_emb.inv_freq', 'model.layers.8.self_attn.rotary_emb.inv_freq', 'model.layers.30.self_attn.rotary_emb.inv_freq', 'model.layers.25.self_attn.rotary_emb.inv_freq', 'model.layers.5.self_attn.rotary_emb.inv_freq', 'model.layers.18.self_attn.rotary_emb.inv_freq', 'model.layers.23.self_attn.rotary_emb.inv_freq'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. Traceback (most recent call last): File "/home/sam/Doctorproject/OmniQuant-main/main.py", line 419, in main() File "/home/sam/Doctorproject/OmniQuant-main/main.py", line 383, in main omniquant( File "/home/sam/Doctorproject/OmniQuant-main/quantize/omniquant.py", line 102, in omniquant raise ValueError("Only support for opt/llama/Llama-2/Llama-3/falcon/mixtral now") ValueError: Only support for opt/llama/Llama-2/Llama-3/falcon/mixtral now