微调完成后使用llama_factory的vllm和qwen官方的vllm部署方式启动返回的不一样

lxb0425 commented 2 months ago

是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

[X] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

[X] 我已经搜索过FAQ | I have searched FAQ

当前行为 | Current Behavior

大佬我使用llama_factory 微调成功后使用llama_factory 的vllm与使用qwen官方文档推荐的vllm方式部署返回不一样 llama_factory vllm部署的返回都很正常从没出过问题千问官方vllm部署的总是有些问题回复的效果很差几乎乱回答如下图

大概什么原因啊

期望行为 | Expected Behavior

期望返回都很正常

复现方法 | Steps To Reproduce

No response

运行环境 | Environment

- OS:
- Python: 3.10
- Transformers:
- PyTorch:
- CUDA 12.2
- vllm 0.3.3

备注 | Anything else?

No response

jklj077 commented 1 month ago

Hi, can you clarify on the difference between deployment of "vllm from llama_factory" and "vllm from Qwen's official documentation"?

Based on the shared screenshot, it appears that you are using a custom frontend. As vllm is not fully compatible with Qwen(1.0) models (unaware of the chat template and the stop token ids), the frontend has to at least pass stop_token_ids to the API created by vllm. Or, you could use fastchat+vllm as introduced in the README. If you are using Qwen1.5, plain vllm should work fine.

jklj077 commented 1 month ago

As Qwen1.0 is no longer actively maintained, we kindly ask to you migrate to Qwen1.5 and direct your related question there. Thanks for you cooperation.

QwenLM / Qwen