amd / RyzenAI-SW

MIT License
364 stars 60 forks source link

run_awq.py using qwen1.5-7b-chat when quantize error #125

Open Wikeolf opened 5 days ago

Wikeolf commented 5 days ago

python run_awq.py --model_name Qwen/Qwen1.5-7B-Chat --task quantize Namespace(model_name='Qwen/Qwen1.5-7B-Chat', target='aie', profile_layer=False, task='quantize', precision='w4abf16', flash_attention_plus=False, profilegemm=False, dataset='raw', fast_mlp=False, fast_attention=False, w_bit=4, group_size=128, algorithm='awq', gen_onnx_nodes=False, mhaops='all') Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████| 4/4 [00:13<00:00, 3.44s/it] Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Qwen2ModelEval( (model): Qwen2Model( (embed_tokens): Embedding(151936, 4096) (layers): ModuleList( (0-31): 32 x Qwen2DecoderLayer( (self_attn): Qwen2Attention( (q_proj): Linear(in_features=4096, out_features=4096, bias=True) (k_proj): Linear(in_features=4096, out_features=4096, bias=True) (v_proj): Linear(in_features=4096, out_features=4096, bias=True) (o_proj): Linear(in_features=4096, out_features=4096, bias=False) (rotary_emb): Qwen2RotaryEmbedding() ) (mlp): Qwen2MLP( (gate_proj): Linear(in_features=4096, out_features=11008, bias=False) (up_proj): Linear(in_features=4096, out_features=11008, bias=False) (down_proj): Linear(in_features=11008, out_features=4096, bias=False) (act_fn): SiLU() ) (input_layernorm): Qwen2RMSNorm() (post_attention_layernorm): Qwen2RMSNorm() ) ) (norm): Qwen2RMSNorm() ) (lm_head): Linear(in_features=4096, out_features=151936, bias=False) ) [RyzenAILLMQuantizer] [AWQ] Calculating AWQ scales ... Repo card metadata block was not found. Setting CardData to empty. Token indices sequence length is longer than the specified maximum sequence length for this model (57053 > 32768). Running this sequence through the model will result in indexing errors

Wikeolf commented 5 days ago

update transeformers to 4.39.3 will have same error

shivani-athavale commented 2 days ago

Hi @Wikeolf, when I try to run the command:

python run_awq.py --model_name Qwen/Qwen1.5-7B-Chat --task quantize

I get the following error message and the code exits:

[RyzenAILLMQuantizer] [AWQ] Looking for Z:\ext\awq_cache\Qwen1.5-7B-Chat-w4-g128.pt [RyzenAILLMQuantizer] [AWQ] No precalculated scales available for Qwen1.5-7B-Chat w_bit:4 group_size:128

I was curious to know where you got the AWQ scales for Qwen or how do you not see this message?

Wikeolf commented 14 hours ago

Hi @Wikeolf, when I try to run the command:

python run_awq.py --model_name Qwen/Qwen1.5-7B-Chat --task quantize

I get the following error message and the code exits:

[RyzenAILLMQuantizer] [AWQ] Looking for Z:\ext\awq_cache\Qwen1.5-7B-Chat-w4-g128.pt [RyzenAILLMQuantizer] [AWQ] No precalculated scales available for Qwen1.5-7B-Chat w_bit:4 group_size:128

I was curious to know where you got the AWQ scales for Qwen or how do you not see this message? I have seen this message.

If you search for the content of the message in this repo, you can find the context in which the message was printed. The AWQ Model Zoo does not provide the file Qwen1.5-7B-Chat-w4-g128.pt, so it is impossible to find this file. This problem can be solved by modifying the code in run_awq.py.

            # set use_scales = False in quant config to calculate new awq scales
            use_qscales = False

I suspect this might be a bug in the development process. In run_awq.py, some special configurations were made for Qwen, but the variable was not set to False, leading to this issue.

If you set this variable to False, you should get the same result as me, but good luck.