THUDM / GLM-4

GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型
Apache License 2.0
5.37k stars 441 forks source link

glm4-9b-chat和LoRA微调模型merge之后,使用vLLM推理,工具调用功能报错。 #607

Open Jimmy-L99 opened 1 month ago

Jimmy-L99 commented 1 month ago

System Info / 系統信息

Who can help? / 谁可以帮助到您?

@sixsixcoder @zr

Information / 问题信息

Reproduction / 复现过程

1. LoRA微调

使用LLaMA-Factory,自定义数据集、yaml文件,llamafactory-cli train进行LoRA微调。

2.glm4-9b-chat和LoRA进行merge

使用LLaMA-Factory,llamafactory-cli export 得到merge模型。

3.工具调用测试代码

使用官方提供的openai_api_server.py, vLLM推理。

工具测试部分代码:

tools = {
    "weather": weather,
}

# 绑定工具
llm_with_tools = llm.bind_tools(list(tools.values()))

context = []
def process_query(query):

    global context
    # 将用户的查询添加到上下文中
    context.append({"role": "user", "content": query})

    # 调用 LLM
    response = llm_with_tools.invoke(context)
    print(response)

    if response.tool_calls:
        # 如果有工具调用,则执行工具调用
        tool_call = response.tool_calls[0]
        tool_name = tool_call["name"]
        tool = tools[tool_name]

        # 获取工具调用的参数并解包传递给工具函数
        tool_arguments = tool_call["args"]
        tool_result = tool(**tool_arguments) 

        # 将工具结果添加到上下文
        context.append({"role": "system", "content": f"你可以通过工具得到实时的天气信息,工具得到的结果是:\n\n{tool_result}\n\n,这个结果绝对准确,你可以直接使用该结果进行表述。"})

        # 工具调用后的上下文继续传递给 LLM,以生成最终响应
        response = llm.invoke(context)

    # 将 LLM 的响应添加到上下文中
    context.append({"role": "assistant", "content": response.content})

    return response.content

#测试
query_1 = "今天深圳的天气怎么样"
response_1 = process_query(query_1)
print(response_1)

4.模型测试

测试如下,用base和两种不同的lora调用方式测试

glm4-9b-chat模型

merge模型

glm4-9b-chat模型+lora_request参数


tools result:

深圳目前时刻的天气是多云, 温度为26.0℃, 湿度为38.0%, 风向为东北, 风力为≤3级

LLM_response: 今天深圳的天气是多云,气温约为26摄氏度,相对湿度为38%,风向为东北风,风力较弱,风速不超过3级。

- LoRA merge模型

输出结果: INFO: 172.16.21.155:36244 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error ERROR: Exception in ASGI application Traceback (most recent call last): File "/root/anaconda3/envs/glm4_9b-chat-128k_vLLM/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 401, in run_asgi result = await app( # type: ignore[func-returns-value] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/envs/glm4_9b-chat-128k_vLLM/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 60, in call return await self.app(scope, receive, send) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/envs/glm4_9b-chat-128k_vLLM/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in call await super().call(scope, receive, send) File "/root/anaconda3/envs/glm4_9b-chat-128k_vLLM/lib/python3.11/site-packages/starlette/applications.py", line 113, in call await self.middleware_stack(scope, receive, send) File "/root/anaconda3/envs/glm4_9b-chat-128k_vLLM/lib/python3.11/site-packages/starlette/middleware/errors.py", line 187, in call raise exc File "/root/anaconda3/envs/glm4_9b-chat-128k_vLLM/lib/python3.11/site-packages/starlette/middleware/errors.py", line 165, in call await self.app(scope, receive, _send) File "/root/anaconda3/envs/glm4_9b-chat-128k_vLLM/lib/python3.11/site-packages/starlette/middleware/cors.py", line 85, in call await self.app(scope, receive, send) File "/root/anaconda3/envs/glm4_9b-chat-128k_vLLM/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 62, in call await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) File "/root/anaconda3/envs/glm4_9b-chat-128k_vLLM/lib/python3.11/site-packages/starlette/_exception_handler.py", line 62, in wrapped_app raise exc File "/root/anaconda3/envs/glm4_9b-chat-128k_vLLM/lib/python3.11/site-packages/starlette/_exception_handler.py", line 51, in wrapped_app await app(scope, receive, sender) File "/root/anaconda3/envs/glm4_9b-chat-128k_vLLM/lib/python3.11/site-packages/starlette/routing.py", line 715, in call await self.middleware_stack(scope, receive, send) File "/root/anaconda3/envs/glm4_9b-chat-128k_vLLM/lib/python3.11/site-packages/starlette/routing.py", line 735, in app await route.handle(scope, receive, send) File "/root/anaconda3/envs/glm4_9b-chat-128k_vLLM/lib/python3.11/site-packages/starlette/routing.py", line 288, in handle await self.app(scope, receive, send) File "/root/anaconda3/envs/glm4_9b-chat-128k_vLLM/lib/python3.11/site-packages/starlette/routing.py", line 76, in app await wrap_app_handling_exceptions(app, request)(scope, receive, send) File "/root/anaconda3/envs/glm4_9b-chat-128k_vLLM/lib/python3.11/site-packages/starlette/_exception_handler.py", line 62, in wrapped_app raise exc File "/root/anaconda3/envs/glm4_9b-chat-128k_vLLM/lib/python3.11/site-packages/starlette/_exception_handler.py", line 51, in wrapped_app await app(scope, receive, sender) File "/root/anaconda3/envs/glm4_9b-chat-128k_vLLM/lib/python3.11/site-packages/starlette/routing.py", line 73, in app response = await f(request) ^^^^^^^^^^^^^^^^ File "/root/anaconda3/envs/glm4_9b-chat-128k_vLLM/lib/python3.11/site-packages/fastapi/routing.py", line 301, in app raw_response = await run_endpoint_function( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/envs/glm4_9b-chat-128k_vLLM/lib/python3.11/site-packages/fastapi/routing.py", line 212, in run_endpoint_function return await dependant.call(**values) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/ljm/ChatGLM4/GLM-4/api_server/openai_api_server.py", line 389, in create_chat_completion async for response in generate_stream_glm4(gen_params): File "/root/ljm/ChatGLM4/GLM-4/api_server/openai_api_server.py", line 205, in generate_stream_glm4 inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/envs/glm4_9b-chat-128k_vLLM/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 1844, in apply_chat_template rendered_chat = compiled_template.render( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/envs/glm4_9b-chat-128k_vLLM/lib/python3.11/site-packages/jinja2/environment.py", line 1304, in render self.environment.handle_exception() File "/root/anaconda3/envs/glm4_9b-chat-128k_vLLM/lib/python3.11/site-packages/jinja2/environment.py", line 939, in handle_exception raise rewrite_traceback_stack(source=source) File "