Closed Davidgzx closed 6 months ago
Please be advised that this is the Qwen(1.0) repository, and it appears you are using Qwen1.5 models.
请注意,这是Qwen(1.0)的存储库,而您希望使用的是Qwen1.5的模型。
Firstly, it's crucial to distinguish between base models (e.g., Qwen-7B) and chat models (e.g., Qwen-7B-Chat), as they are distinct model types. Base models only support text continuation, whereas chat models facilitate conversation through a specific template - in the case of Qwen, ChatML format is adopted for its chat models. Generally, for chat models, you need to apply this template to your input at some point during the process.
首先,必须明确区分基模型(例如Qwen-7B)和对话模型(例如Qwen-7B-Chat),它们是不同类型的模型。基础模型仅支持文本续写功能,而对话模型通过一种特殊的模板(Qwen-Chat采用ChatML格式)来支持对话。通常情况下,对于对话模型,需要在某个环节将此模板应用于输入内容。
Secondly, there are notable differences between Qwen 1.0 and Qwen 1.5, with each employing different methods to implement the chat template.
其次,Qwen 1.0和Qwen 1.5之间也存在显著差异,并且各自采用了不同的方法来应用对话模板。
Qwen 1.0 models relied on custom code which necessitated trust_remote_code=True
, and the QwenTokenizer
in this version did not support the new apply_chat_template
method. For chat models in Qwen 1.0, input token IDs were manually structured according to the template and then passed into either model.generate
or llm.generate
. This approach also ensured control token injection was avoided.
Qwen 1.0版本的模型使用了自定义代码,要求设置trust_remote_code=True
,并且QwenTokenizer
不支持新的apply_chat_template
方法。在这种版本中,对于对话模型,需按照模板手动构建输入的token ID,然后传递给model.generate
或llm.generate
方法进行生成。这一做法同时确保了控制token不会被随意注入。
As the transformers
library and its ecosystem have evolved, a de facto standard has emerged within the community (including vLLM, FastChat, and others). This standard involves constructing chat model inputs as text first, followed by encoding the text into token IDs. The transformers
library now includes an apply_chat_template
method in tokenizer classes to accommodate this practice. Therefore, Qwen2Tokenizer
adheres to this trend, which explains why the line tokenizer.apply_chat_template
appears frequently in current implementations.
随着transformers
库及其生态系统的不断发展,社区内(包括vLLM、FastChat等项目)已经形成了一种_事实上的_标准做法:先将对话模型的输入构造为文本形式,然后将其编码为token ID。为了满足这种需求,transformers
库已在其tokenizer类中添加了apply_chat_template
方法。因此,Qwen2Tokenizer
紧跟这一方案。这就是现在到处可以看到tokenizer.apply_chat_template
这一行代码的原因。
In all, as of the current date, you should use apply_chat_template
for latest chat models to enjoy the benefits from the broad support from the community.
总之,对于对话模型和最新版本的模型,您应使用apply_chat_template
构造模型输入,以享受广泛的社区支持所带来的便利。
是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?
该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?
当前行为 | Current Behavior
在https://qwen.readthedocs.io/en/latest/deployment/vllm.html 文档和 qwen-wrapper代码中,输入是被chatml template包装一下 在这篇(modelscope)[https://developer.aliyun.com/article/1380325] 中以及所有https://github.com/vllm-project/vllm/tree/main/examples的例子中,都是直接使用vllm的llm generate进行推理,没有使用模型对应的chat模版
我该如何正确在千问模型上使用vllm呢。
期望行为 | Expected Behavior
No response
复现方法 | Steps To Reproduce
No response
运行环境 | Environment
备注 | Anything else?
No response