Qwen2 inference - Githubissues

kechunFIVE commented 1 month ago

when i do inference with qwen2, i got the following error:

You are using a model of type qwen2 to instantiate a model of type llama. This is not supported for all configurations of models and can yield errors.

 raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for Model:
        Unexpected key(s) in state_dict: "layers.0.self_attn.q_proj.bias", "layers.0.self_attn.k_proj.bias", "layers.0.self_attn.v_proj.bias".

and it seems modeling_llama_kv.py is the same as modeling_qwen2_kv.py?

Siegfried-qgf commented 1 month ago

修改一下cnet.py里面的llama attention里面的线性层参数 bias设置为True

jzzzf commented 4 weeks ago

How to gen_data and evaluate Qwen2? Just use the same as Llama3?

xiangyw99 commented 3 weeks ago

I run command

python -m eagle.evaluation.gen_ea_answer_vicuna --ea-model-path /data2/common/models/LLM-Research/eagle-qwen-instruct --base-model-path /data2/common/models/LLM-Research/Qwen2-7B-Instruct --model-id /0822/qwen_run1 --temperature 0.0

to test Eagle on Qwen2 model, but got the warning:

Some weights of LlamaForCausalLM were not initialized from the model checkpoint 
at /data2/common/models/LLM-Research/Qwen2-7B-Instruct and are newly initialized: 
['model.layers.0.self_attn.rotary_emb.inv_freq', 'model.layers.1.self_attn.rotary_emb.inv_freq', ...]
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
You are using a model of type qwen2 to instantiate a model of type llama. This is not supported for all configurations of models and can yield errors.

and meanwhile, the inference results seem not correct. Why would it happen? Which command should I run to test Qwen? I believe I downloaded the correct Qwen2 model weight, because I have tested it on a demo successfully. I think it was caused by wrongly loading the model weight in Eagle. I appreciate it if you could give some support! Thank you ;) 微信截图_20240822155916

Siegfried-qgf commented 3 weeks ago

我遇到过你相同的问题，你需要修改modeling_kv.py中attention模块中的参数，把旋转位置编码的base参数形参去掉就可以，具体原因这是因为qwen2的训练权重中没有报错中的那些参数? 而且qwen2的base参数为1000000，llama为10000

---Original--- From: "Xiang @.> Date: Thu, Aug 22, 2024 16:03 PM To: @.>; Cc: "Guofeng @.**@.>; Subject: Re: [SafeAILab/EAGLE] Qwen2 inference (Issue #117)

I run command python -m eagle.evaluation.gen_ea_answer_vicuna --ea-model-path /data2/common/models/LLM-Research/eagle-qwen-instruct --base-model-path /data2/common/models/LLM-Research/Qwen2-7B-Instruct --model-id /0822/qwen_run1 --temperature 0.0
to test Eagle on Qwen2 model, but got the warning: Some weights of LlamaForCausalLM were not initialized from the model checkpoint at /data2/common/models/LLM-Research/Qwen2-7B-Instruct and are newly initialized: ['model.layers.0.self_attn.rotary_emb.inv_freq', 'model.layers.1.self_attn.rotary_emb.inv_freq', ...] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. You are using a model of type qwen2 to instantiate a model of type llama. This is not supported for all configurations of models and can yield errors.
and meanwhile, the inference results seem not correct. Why would it happen? Which command should I run to test Qwen? _20240822155916.png (view on web)

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

xiangyw99 commented 3 weeks ago

谢谢你的解答，我注意到了Eagle提供的modeling_qwen_kv已经做了您提到的修改（传入base=1000000），但我依旧无法解决加载模型的警告问题，也不能用Qwen2生成正确的答案。能分享一下你运行Qwen2推理的command吗？

Siegfried-qgf commented 3 weeks ago

你尝试一下不修改为1000000 直接传参把base这个参数去掉，用默认值。我是这样修改的，传入1000000可能还是会出现问题，具体原因我也没有研究。代码在公司我没法传出哈

---Original--- From: "Xiang @.> Date: Thu, Aug 22, 2024 17:22 PM To: @.>; Cc: "Guofeng @.**@.>; Subject: Re: [SafeAILab/EAGLE] Qwen2 inference (Issue #117)

谢谢你的解答，我注意到了Eagle提供的modeling_qwen_kv已经做了您提到的修改（传入base=1000000），但我依旧无法解决加载模型的警告问题，也不能用Qwen2生成正确的答案。能分享一下你运行Qwen2推理的command吗？

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

xiangyw99 commented 3 weeks ago

是不是把这里的注释掉呀？

def _init_rope(self):
        if self.config.rope_scaling is None:
            self.rotary_emb = LlamaRotaryEmbedding(
                self.head_dim, max_position_embeddings=self.max_position_embeddings # [MODIFIED], base=self.config.rope_theta
            )

但我试了一下，还是无法生成正确的答案。另外，想确认一下您是不是用这个文件运行的推理？gen_ea_answer_vicuna.py command是 python -m eagle.evaluation.gen_ea_answer_vicuna --ea-model-path XXXXXX --base-model-path XXXXXX 再次感谢您！：）

Siegfried-qgf commented 3 weeks ago

把attention类init传入的base参数去掉我是参照gen_ea_answer_vicuna.py修改重写的。但改动不大。我还没有看eagle自己实现的qwen2和llama的modeling.py的区别，我自己适配qwen2就只改了base参数这一个部分。

---Original--- From: "Xiang @.> Date: Thu, Aug 22, 2024 17:31 PM To: @.>; Cc: "Guofeng @.**@.>; Subject: Re: [SafeAILab/EAGLE] Qwen2 inference (Issue #117)

是不是把这里的注释掉呀？ def _init_rope(self): if self.config.rope_scaling is None: self.rotary_emb = LlamaRotaryEmbedding( self.head_dim, max_position_embeddings=self.max_position_embeddings # [MODIFIED], base=self.config.rope_theta )
但我试了一下，还是无法生成正确的答案。另外，想确认一下您是不是用这个文件运行的推理？gen_ea_answer_vicuna.py command是 python -m eagle.evaluation.gen_ea_answer_vicuna --ea-model-path XXXXXX --base-model-path XXXXXX 再次感谢您！：）

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

Hermetist commented 2 weeks ago

@xiangyw99 @Siegfried-qgf 打扰下两位，我也遇到了同样问题。我把modeling_qwen2_kv.py中base参数去掉了如下：

但是结果还是类似 @xiangyw99 的错误输出。请问下这里修改对吗？感谢~

我使用的命令是：

python3 -m eagle.evaluation.gen_ea_answer_llama2chat \
--ea-model-path ./yuhuili/EAGLE-Qwen2-7B-Instruct \
--base-model-path ./Qwen/Qwen2-7B-Instruct \
--answer-file ./tmp/result/qwen_tmp0.0.jsonl

xiangyw99 commented 2 weeks ago

加载qwen模型时使用bf16精度，在from_pretrained里加上 .to(torch.bfloat16)就可以正常生成。关于这点Eagle已经做了如下说明了

SafeAILab / EAGLE

Qwen2 inference #117