[Badcase]: 抢占式实例部署qwen2.5-72B成功，调用失败 - Githubissues

QwenLM / Qwen2.5

Qwen2.5 is the large language model series developed by Qwen team, Alibaba Cloud.

10.57k stars 648 forks source link

[Badcase]: 抢占式实例部署qwen2.5-72B成功，调用失败 #1088

Open ZX1998-12 opened 2 weeks ago

ZX1998-12 commented 2 weeks ago

Model Series

Qwen2.5

What are the models used?

qwen2.5-72B

What is the scenario where the problem happened?

抢占式实例部署qwen2.5-72B成功，调用失败

Is this badcase known and can it be solved using avaiable techniques?

[X] I have followed the GitHub README.
[X] I have checked the Qwen documentation and cannot find a solution there.
[X] I have checked the documentation of the related framework and cannot find useful information.
[X] I have searched the issues and there is not a similar one.

Information about environment

部署指令：vllm serve /home/Qwen2.5/Qwen2.5-72B-Instruct --port 6666 --host 0.0.0.0 --tensor-parallel-size 4 --served-model-name Qwen2.5-72B --enforce-eager

部署成功但是调用失败截图 lQDPKILMLK0gXBHNAeLNAtCwCCqx7-_WCbIHIQzau6d_AA_720_482

应该是和MQLLMEngine交互数据超时了，但是不知道解决办法

Description

Steps to reproduce

This happens to Qwen2.5-xB-Instruct-xxx and xxx. The badcase can be reproduced with the following steps:

...
...

The following example input & output can be used:

system: ...
user: ...
...

Expected results

The results are expected to be ...

Attempts to fix

I have tried several ways to fix this, including:

adjusting the sampling parameters, but ...
prompt engineering, but ...

Anything else helpful for investigation

I find that this problem also happens to ...

jklj077 commented 2 weeks ago

for vllm internal errors, I advised you to raise issues at https://github.com/vllm-project/vllm/issues