使用vllm部署qwen1.5后推理回答结果会出现截断

zhangshuyx commented 2 months ago

输入参数为：

{
  "model": "Qwen1_5_72B_Chat",
  "messages": [{"role": "user","content": "请给出一篇500字的中学作文，讲述海边游玩的经历"}],
  "max_tokens": 2000, 
  "stop": []
}

也尝试了不同的参数，但是不管输出内容长短都会截断，导致回答不完整最明显的是这个回答，我要求列出1到20的所有数字输出也是截断：

输入参数为：

{
  "model": "Qwen1_5_72B_Chat",
  "messages": [{"role": "user","content": "请列出1到20的所有数字"}],
  "max_tokens": 1200, 
  "stop":["<|im_end|>", "<|endoftext|>", "<|im_start|>"],
  "stream":"False",
  "temperature":0.7
}

jklj077 commented 2 months ago

the completion_tokens in usage did not match the number of tokens in choices[0].messages.content. it appears some tokens are lost. could you try reporting the issue to vllm?

yoke233 commented 1 month ago

最新的 vllm==0.5.3.post1 遇到同样问题, 只能回退到之前使用的版本vllm==0.4.0.post1

github-actions[bot] commented 1 week ago

This issue has been automatically marked as inactive due to lack of recent activity. Should you believe it remains unresolved and warrants attention, kindly leave a comment on this thread.

QwenLM / Qwen2

使用vllm部署qwen1.5后推理回答结果会出现截断 #752