Open IvanDeng0 opened 1 month ago
Same issue! Have you solve it now?
Not yet, I tried to change architecture from 'qwen2' to 'eagle' in config.json but encountered another error
@IvanDeng0 I have solve this issue, you need to convert the eagle model to adapt to vllm. You can refer to vllm/model_executor/models/eagle.py:L126. But I encountered another problem, because fc layer of eagle-qwen2 has bias attribute, but vllm only considers the case without bias, so I still cannot deploy eagle-qwen2 successfully
@crownz248 Thanks(I tried, but eagle seems not support TP > 1), I noticed your other issue, have you tried convert the qwen2 weight to llama weight to deploy on vllm? (maybe refer to https://github.com/Minami-su/character_AI_open/blob/main/Qwen2_llamafy_Mistralfy/llamafy_qwen_v2.py)
And, have you compared the performance (e.g. TTFT, TPOT) of llama with naive inference?
Thanks for your great work, I would like deploy Qwen2-7B-Instruct in vllm, my current command is:
python3 -m vllm.entrypoints.openai.api_server \ --host 0.0.0.0 \ --trust-remote-code \ --dtype half \ --max-model-len 32768 \ --port=7801 \ --disable-log-requests \ --model=/mnt/model/Qwen2-7B-Instruct \ --tokenizer-mode=auto \ --speculative-model=/mnt/model/EAGLE-Qwen2-7B-Instruct \ --use-v2-block-manager \ --num-speculative-tokens 2 \ --enforce-eager \ --tensor-parallel-size=2 \ --gpu-memory-utilization 1
but I encounted the following error:
File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/qwen2.py", line 420, in load_weights param = params_dict[name] KeyError: 'layers.0.self_attn.qkv_proj.weight'
my vllm version is 0.6.1.post2, is there a mistake?