SafeAILab / EAGLE

Official Implementation of EAGLE-1 (ICML'24) and EAGLE-2 (EMNLP'24)
https://arxiv.org/pdf/2406.16858
Apache License 2.0
819 stars 81 forks source link

Deploy EAGLE-Qwen2 in vllm #130

Open IvanDeng0 opened 1 month ago

IvanDeng0 commented 1 month ago

Thanks for your great work, I would like deploy Qwen2-7B-Instruct in vllm, my current command is:

python3 -m vllm.entrypoints.openai.api_server \ --host 0.0.0.0 \ --trust-remote-code \ --dtype half \ --max-model-len 32768 \ --port=7801 \ --disable-log-requests \ --model=/mnt/model/Qwen2-7B-Instruct \ --tokenizer-mode=auto \ --speculative-model=/mnt/model/EAGLE-Qwen2-7B-Instruct \ --use-v2-block-manager \ --num-speculative-tokens 2 \ --enforce-eager \ --tensor-parallel-size=2 \ --gpu-memory-utilization 1

but I encounted the following error: File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/qwen2.py", line 420, in load_weights param = params_dict[name] KeyError: 'layers.0.self_attn.qkv_proj.weight'

my vllm version is 0.6.1.post2, is there a mistake?

crownz248 commented 1 month ago

Same issue! Have you solve it now?

IvanDeng0 commented 1 month ago

Not yet, I tried to change architecture from 'qwen2' to 'eagle' in config.json but encountered another error

crownz248 commented 1 month ago

@IvanDeng0 I have solve this issue, you need to convert the eagle model to adapt to vllm. You can refer to vllm/model_executor/models/eagle.py:L126. But I encountered another problem, because fc layer of eagle-qwen2 has bias attribute, but vllm only considers the case without bias, so I still cannot deploy eagle-qwen2 successfully

IvanDeng0 commented 1 month ago

@crownz248 Thanks(I tried, but eagle seems not support TP > 1), I noticed your other issue, have you tried convert the qwen2 weight to llama weight to deploy on vllm? (maybe refer to https://github.com/Minami-su/character_AI_open/blob/main/Qwen2_llamafy_Mistralfy/llamafy_qwen_v2.py)

And, have you compared the performance (e.g. TTFT, TPOT) of llama with naive inference?