QwenLM / Qwen2

Qwen2 is the large language model series developed by Qwen team, Alibaba Cloud.
7.43k stars 454 forks source link

Qwen1.5 系列模型相较于Qwen1.0 从model.chat 变成了model.generate,那之前的history参数呢?我部署到本地的Qwen1.5-7B-chat都不能实现交互式回答问题,会遗忘掉之前的对话记录 #134

Closed chengxiang123aa closed 3 months ago

chengxiang123aa commented 6 months ago

Qwen1.5 系列模型相较于Qwen1.0 从model.chat 变成了model.generate,那之前的history参数呢?我部署到本地的Qwen1.5-7B-chat都不能实现交互式回答问题,会遗忘掉之前的对话记录

jklj077 commented 6 months ago

Please note that there is a misunderstanding about the operation of Qwen models. The model.chat function is not a standard API provided by the transformers library. Instead, it formats multiple messages into a template recognizable by the chat model. A comparable functionality has been implemented in the tokenizer classes through the apply_chat_template method. Relevant examples can be found in the README file. Kindly review the README for more information.

请注意,关于Qwen模型的工作方式存在一个误区。model.chat函数并非transformers库提供的标准API。它的功能是将多条消息格式化为聊天模型能够理解的模板。类似的功能已在tokenizer类中通过apply_chat_template方法实现。相关的示例在README文件中有提供,请参阅README以获取更多信息。

LucienShui commented 6 months ago

请参考此代码,推荐参考文档使用 vllm、llama.cpp、ollama 等方式起一个 OpenAI API 格式的服务,再使用此代码。

非 OpenAI API 格式的代码同理。

from openai import OpenAI

client = OpenAI(base_url="http://your-qwen-api-server:8000/v1", api_key="test")  # 必须填一个 api_key,否则报错

def main():
    messages = []
    while True:
        messages.append({"role": "user", "content": input(("\n" * 2 if messages else "") + "user: ")})
        stream = client.chat.completions.create(model="", messages=messages, stream=True)
        content = ''
        for chunk in stream:
            delta = chunk.choices[0].delta.content or ""
            print(delta, end="", flush=True)
            content += delta
        messages.append({"role": "assistant", "content": content})

if __name__ == '__main__':
    main()