eosphoros-ai / DB-GPT

AI Native Data App Development framework with AWEL(Agentic Workflow Expression Language) and Agents
https://docs.dbgpt.site
MIT License
12.31k stars 1.61k forks source link

[Bug] glm4-9b-chat The officially deployed OpenAI api agent used to replace openai cannot obtain the output of the model normally (the output content does not seem to make use of the information given by the database) #1636

Open chuangzhidan opened 2 weeks ago

chuangzhidan commented 2 weeks ago

Search before asking

Operating system information

Linux

Python version information

=3.11

DB-GPT version

main

Related scenes

Installation Information

Device information

GPU ,2 *A800

Models information

embedding model:bge-large1.5 model: proxyllm 风格,glm4-9b-chat

What happened

我模仿openai的api风格,部署了一个本地模型的API服务,然使用chatgpt的proxyllm的名字,url使用我部署的url。然后, 提问数据库"copilot":你可以看到哪些表 结果输出如下,反复试了,都不成功:

很抱歉,由于我作为一个AI,无法直接访问外部数据库或文件系统中的具体内容。因此,我没有能力“看到”任何实际的表格数据。

不过,我可以帮助您理解如何识别和描述一个典型的数据库中可能包含的表以及它们的基本信息。以下是一些常见的数据库表的例子及其一般信息:

  1. 用户表(Users)
    • 描述:存储所有用户的个人信息。
    • 字段:
      • 用户ID(UserID):唯一标识每个用户的数字或字符串。
      • 姓名(Name):用户的姓名。
      • 电子邮件地址(Email):用户的电子邮件。
      • 密码(Password):加密后的密码。
      • 注册日期(RegistrationDate):用户注册账户的日期和时间。

·····

  1. 供应商表(Suppliers)
    • 描述:列出与公司合作的供应商信息。
    • 字段:
      • 供应商ID(SupplierID):唯一标识每个供应商的数字。
      • 公司名称(CompanyName):供应商的公司名。
      • 联系人(ContactPerson):负责联系的人。
      • 电话号码(PhoneNumber):联系电话。
      • 邮箱地址(Email):电子邮箱。

请注意,这些只是示例性的表和信息字段,实际数据库中的表结构和字段可能会有所不同,取决于具体的业务需求和设计。如果您有特定的数据库或者想要了解特定类型的表结构,可以提供更多的上下文,我会尽力给出更准确的描述。

但是这不是模型能力的问题,因为使用直接curl访问,是正常的,模型可以输出数据库表格的信息。 dbgpt后台的信息如下: llm_adapter:

model prompt:

system: 你是一个数据智能分析专家,会写sql,写代码,调用工具。 以下是从数据库表中提取出来的信息。它们描述了表的名称、列的名称和注释/属性,可能的约束、索引键的信息以及人工和对各自表的大致描述。 ['recommend(id, create_time, update_time, code_enum_type (SQL辅助枚举类型), recommend (推荐))', 'feedback(id, create_time, update_time, user_id (用户ID), username (用户名称), content (反馈内容), question (提问), answer (回答))', 'conversation_content(id, create_time, update_time, conversation_id (会话ID), conversation_mode (会话模式: 知识库--1 插件模式--2 代码解析--3), question (问题), answer (答案), is_context (是否是上下文), is_prompt (是否是提示词), model_answer (模型答案), content_type (内容类型), knowledge_pattern (知识库模式), result_flag (是否返回结果), is_async (是否为异步会话), knowledge_addr (路径))', 'conversation(id, create_time, update_time, user_id (用户ID), conversation_name (会话名称))', 'prompt(id, create_time, update_time, prompt_title (提示词标题), prompt_content (提示词内容), is_enable (是否启用)), and index keys: prompt_title(prompt_title)']

请根据你能看到的信息,回答下面用户的问题(一步步思考和行动,先推理后给出答案): 你可以看到哪些表,请全部列举出来,并告诉我每张表格的信息

要求:如果无法从提供的内容中获取答案,则要告知用户;对于和你的任务无关的请求如闲聊需要拒绝,引到客户到任务上来

human: 你可以看到哪些表,请全部列举出来,并告诉我每张表格的信息

async stream output:

2024-06-14 18:17:40 gptai dbgpt.model.proxy.llms.chatgpt[1917986] INFO Send request to openai(1.32.0), payload: {'stream': True, 'model': 'gpt-3.5-turbo', 'temperature': 0.6, 'max_tokens': 1024}

messages: [{'role': 'system', 'content': "\n你是一个数据智能分析专家,会写sql,写代码,调用工具。\n以下是从数据库表中提取出来的信息。它们描述了表的名称、列的名称和注释/属性,可能的约束、索引键的信息以及人工和对各自表的大致描述。\n['recommend(id, create_time, update_time, code_enum_type (SQL辅助枚举类型), recommend (推荐))', 'feedback(id, create_time, update_time, user_id (用户ID), username (用户名称), content (反馈内容), question (提问), answer (回答))', 'conversation_content(id, create_time, update_time, conversation_id (会话ID), conversation_mode (会话模式: 知识库--1 插件模式--2 代码解析--3), question (问题), answer (答案), is_context (是否是上下文), is_prompt (是否是提示词), model_answer (模型答案), content_type (内容类型), knowledge_pattern (知识库模式), result_flag (是否返回结果), is_async (是否为异步会话), knowledge_addr (路径))', 'conversation(id, create_time, update_time, user_id (用户ID), conversation_name (会话名称))', 'prompt(id, create_time, update_time, prompt_title (提示词标题), prompt_content (提示词内容), is_enable (是否启用)), and index keys: prompt_title(prompt_title)']\n\n请根据你能看到的信息,回答下面用户的问题(一步步思考和行动,先推理后给出答案):\n你可以看到哪些表,请全部列举出来,并告诉我每张表格的信息\n\n要求:如果无法从提供的内容中获取答案,则要告知用户;对于和你的任务无关的请求如闲聊需要拒绝,引到客户到任务上来\n"}, {'role': 'user', 'content': '你可以看到哪些表,请全部列举出来,并告诉我每张表格的信息'}]

对了,数据库的prompt我做了如上修改,其它没变。 还有数据库的表格不止以上几个,但是不知道为什么{table_info}只有一半的表格信息。 22379c323229c3b6cf8ac2faebce883d

What you expected to happen

希望可以本地部署的服务正常替换OpenAI的api,api-key这种风格

How to reproduce

b67fd3380c67e05b88e225c9260d5560

LANGUAGE=zh

***

PROXY_SERVER (openai interface | chatGPT proxy service), use chatGPT as your LLM.

if your server can visit openai, please set PROXY_SERVER_URL=https://api.openai.com/v1/chat/completions

else if you have a chatgpt proxy server, you can set PROXY_SERVER_URL={your-proxy-serverip:port/xxx}

***

PROXY_API_KEY={your-openai-sk}

PROXY_SERVER_URL=https://api.openai.com/v1/chat/completions

PROXY_API_KEY=EMPTY PROXY_SERVER_URL=http://10.0.18.15:8008/v1/chat/completions

Additional context

No response

Are you willing to submit PR?

fangyinc commented 2 weeks ago

Please check the relevant logs in the model inference backend, may be here

chuangzhidan commented 2 weeks ago

Please check the relevant logs in the model inference backend, may be here

遗憾,屏幕端里什么也没有?:

(dbgpt_env) root@gptai:/media/data/xgp/repo/GLM-4/basic_demo# python openai_api_server.py Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. 2024-06-14 16:17:31,563 INFO worker.py:1753 -- Started a local Ray instance. INFO 06-14 16:17:33 llm_engine.py:161] Initializing an LLM engine (v0.5.0) with config: model='/media/data/llm/glm-4-9b-chat', speculative_config=None, tokenizer='/mediasion=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=8192, download_dir=None, load_format=Lozation=None, enforce_eager=True, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines' Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. WARNING 06-14 16:17:34 tokenizer.py:126] Using a slow tokenizer. This might cause a significant slowdown. Consider using a fast tokenizer instead. INFO 06-14 16:17:45 utils.py:623] Found nccl from library libnccl.so.2 INFO 06-14 16:17:45 pynccl.py:65] vLLM is using nccl==2.20.5 (RayWorkerWrapper pid=1828154) INFO 06-14 16:17:45 utils.py:623] Found nccl from library libnccl.so.2 (RayWorkerWrapper pid=1828154) INFO 06-14 16:17:45 pynccl.py:65] vLLM is using nccl==2.20.5 INFO 06-14 16:17:45 custom_all_reduce_utils.py:179] reading GPU P2P access cache from /root/.config/vllm/gpu_p2p_access_cache_for_0,1.json WARNING 06-14 16:17:45 custom_all_reduce.py:179] Custom allreduce is disabled because your platform lacks GPU P2P capability or P2P test failed. To silence this warning, (RayWorkerWrapper pid=1828154) INFO 06-14 16:17:45 custom_all_reduce_utils.py:179] reading GPU P2P access cache from /root/.config/vllm/gpu_p2p_access_cache_for_0,1.json (RayWorkerWrapper pid=1828154) WARNING 06-14 16:17:45 custom_all_reduce.py:179] Custom allreduce is disabled because your platform lacks GPU P2P capability or P2P test ftly. INFO 06-14 16:17:49 model_runner.py:159] Loading model weights took 8.8289 GB (RayWorkerWrapper pid=1828154) INFO 06-14 16:17:50 model_runner.py:159] Loading model weights took 8.8289 GB INFO 06-14 16:17:54 distributed_gpu_executor.py:56] # GPU blocks: 3288, # CPU blocks: 13107 INFO: Started server process [1822231] INFO: Waiting for application startup. INFO: Application startup complete. INFO: Uvicorn running on http://10.0.18.15:8008 (Press CTRL+C to quit) INFO 06-14 16:19:24 metrics.py:341] Avg prompt throughput: 0.5 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU INFO: 192.168.10.150:20001 - "POST /v1/chat/completions HTTP/1.1" 200 OK INFO 06-14 16:19:29 metrics.py:341] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 17.5 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU INFO 06-14 16:19:34 metrics.py:341] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 20.1 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU INFO 06-14 16:19:39 metrics.py:341] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 21.1 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU INFO 06-14 18:08:18 metrics.py:341] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU INFO: 192.168.10.150:7181 - "POST /v1/chat/completions HTTP/1.1" 200 OK INFO 06-14 18:08:23 metrics.py:341] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 24.2 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU INFO 06-14 18:11:24 metrics.py:341] Avg prompt throughput: 1.1 tokens/s, Avg generation throughput: 0.3 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU INFO: 192.168.10.150:20001 - "POST /v1/chat/completions HTTP/1.1" 200 OK INFO 06-14 18:12:42 metrics.py:341] Avg prompt throughput: 4.4 tokens/s, Avg generation throughput: 0.6 tokens/s, Running: 1 req

fangyinc commented 2 weeks ago

Aha,, can you try other cases, such as 'Chat Data'?

chuangzhidan commented 2 weeks ago

Aha,, can you try other cases, such as 'Chat Data'?

i did, didn't work as well.sad:

2024-06-14 18:23:19 gptai dbgpt.app.scene.base_chat[1917986] INFO Request: ModelRequest(model='chatgpt_proxyllm', messages=[ModelMessage(role='system', content='\n请根据用户选择的数据库和该库的部分可用表结构定义来回答用户问题.\n数据库名type (SQL辅助枚举类型), recommend (推荐))\', \'feedback(id, create_time, update_time, user_id (用户ID), username (用户名称), content (反馈内容), question (提问),d (会话ID), conversation_mode (会话模式: 知识库--1 插件模式--2 代码解析--3), question (问题), answer (答案), is_context (是否是上下文), is_prompt (是否是提示词),ult_flag (是否返回结果), is_async (是否为异步会话), knowledge_addr (路径))\', \'conversation(id, create_time, update_time, user_id (用户ID), conversation_name (会content (提示词内容), is_enable (是否启用)), and index keys: prompt_title(prompt_title)\']\n\n约束:\n 1. 请根据用户问题理解用户意图,使用给出表结构定义创建一指定了他希望获得的具体数据行数,否则始终将查询限制为最多 50 个结果。\n 3. 只能使用表结构信息中提供的表来生成 sql,如果无法根据提供的表结构中生成 sql ,请说:“弄错表和列的关系\n 5. 请检查SQL的正确性,并保证正确的情况下优化查询性能\n 6.请从如下给出的展示方式种选择最优的一种用以进行数据渲染,将类型名称放入返回要求格: response_line_chart:used to display comparative trend analysis data\nresponse_pie_chart:suitable for scenarios such as proportion and distribution statistics\nresponse_scatter_plot:Suitable for exploring relationships between variables, detecting outliers, etc.\nresponse_bubble_chart:Suitable for relationships between chart:Suitable for hierarchical structure representation, category proportion display and highlighting key categories, etc.\nresponse_area_chart:Suitable for visa change trends, etc.\nresponse_heatmap:Suitable for visual analysis of time series data, large-scale data sets, distribution of classified data, etc.\n用户问题:JSON格式回复:\n "{\n \"thoughts\": \"thoughts summary to say to user\",\n \"sql\": \"SQL Query to run\",\n \"display_type\": \"Data ound_index=0), ModelMessage(role='human', content='你可以看到哪些表,请全部列举出来,并告诉我每张表格的信息', round_index=0)], temperature=0.5, max_new_tokens=10df-45b8-a367-6d4815334dfb:74f33c95-f479-4afc-ad3b-3d3dfd5cbdad', context=ModelRequestContext(stream=False, cache_enable=False, user_name=None, sys_code=None, confd5cbdad', chat_mode='chat_with_db_execute', chat_param=None, extra={}, request_id=None)) 2024-06-14 18:23:19 gptai dbgpt.core.awel.runner.local_runner[1917986] INFO Begin run workflow from end operator, id: b2290465-ea47-4c1e-9da4-9b8dca4fbdc0, runne> 2024-06-14 18:23:19 gptai dbgpt.core.awel.operators.common_operator[1917986] INFO branch_input_ctxs 0 result None, is_empty: False 2024-06-14 18:23:19 gptai dbgpt.core.awel.operators.common_operator[1917986] INFO Skip node name llm_model_cache_node 2024-06-14 18:23:19 gptai dbgpt.core.awel.operators.common_operator[1917986] INFO branch_input_ctxs 1 result True, is_empty: False 2024-06-14 18:23:19 gptai dbgpt.core.awel.runner.local_runner[1917986] INFO Skip node name llm_model_cache_node, node id 05dffc68-b616-4673-b545-ebe7250b9329 2024-06-14 18:23:19 gptai dbgpt.model.adapter.base[1917986] INFO Message version is v2 2024-06-14 18:23:19 gptai dbgpt.model.cluster.worker.default_worker[1917986] INFO current generate stream function is asynchronous stream function llm_adapter:

model prompt:

system: 请根据用户选择的数据库和该库的部分可用表结构定义来回答用户问题. 数据库名: copilot 表结构定义: ['recommend(id, create_time, update_time, code_enum_type (SQL辅助枚举类型), recommend (推荐))', 'feedback(id, create_time, update_time, user_id (用户ID), usecontent(id, create_time, update_time, conversation_id (会话ID), conversation_mode (会话模式: 知识库--1 插件模式--2 代码解析--3), question (问题), answer (答案), type (内容类型), knowledge_pattern (知识库模式), result_flag (是否返回结果), is_async (是否为异步会话), knowledge_addr (路径))', 'conversation(id, create_time, u update_time, prompt_title (提示词标题), prompt_content (提示词内容), is_enable (是否启用)), and index keys: prompt_title(prompt_title)']

约束:

  1. 请根据用户问题理解用户意图,使用给出表结构定义创建一个语法正确的 mysql sql,如果不需要sql,则直接回答用户问题。
  2. 除非用户在问题中指定了他希望获得的具体数据行数,否则始终将查询限制为最多 50 个结果。
  3. 只能使用表结构信息中提供的表来生成 sql,如果无法根据提供的表结构中生成 sql ,请说:“提供的表结构信息不足以生成 sql 查询。” 禁止随意捏造信息。
  4. 请注意生成SQL时不要弄错表和列的关系
  5. 请检查SQL的正确性,并保证正确的情况下优化查询性能 6.请从如下给出的展示方式种选择最优的一种用以进行数据渲染,将类型名称放入返回要求格式的name参数值种,如果找不到最合适的则使用'Table'作为展示方式,可用数据展示 response_pie_chart:suitable for scenarios such as proportion and distribution statistics response_table:suitable for display with many display columns or non-numeric columns response_scatter_plot:Suitable for exploring relationships between variables, detecting outliers, etc. response_bubble_chart:Suitable for relationships between multiple variables, highlighting outliers or special situations, etc. response_donut_chart:Suitable for hierarchical structure representation, category proportion display and highlighting key categories, etc. response_area_chart:Suitable for visualization of time series data, comparison of multiple groups of data, analysis of data change trends, etc. response_heatmap:Suitable for visual analysis of time series data, large-scale data sets, distribution of classified data, etc. 用户问题: 你可以看到哪些表,请全部列举出来,并告诉我每张表格的信息 请一步步思考并按照以下JSON格式回复: "{\n \"thoughts\": \"thoughts summary to say to user\",\n \"sql\": \"SQL Query to run\",\n \"display_type\": \"Data display method\"\n}" 确保返回正确的json并且可以被Python json.loads方法解析.

human: 你可以看到哪些表,请全部列举出来,并告诉我每张表格的信息

async stream output:

2024-06-14 18:23:19 gptai dbgpt.model.proxy.llms.chatgpt[1917986] INFO Send request to openai(1.32.0), payload: {'stream': True, 'model': 'gpt-3.5-turbo', 'tempe

messages: [{'role': 'system', 'content': '\n请根据用户选择的数据库和该库的部分可用表结构定义来回答用户问题.\n数据库名:\n copilot\n表结构定义:\n [\'recommend(id, creak(id, create_time, update_time, user_id (用户ID), username (用户名称), content (反馈内容), question (提问), answer (回答))\', \'conversation_content(id, create_t件模式--2 代码解析--3), question (问题), answer (答案), is_context (是否是上下文), is_prompt (是否是提示词), model_answer (模型答案), content_type (内容类型), knnowledge_addr (路径))\', \'conversation(id, create_time, update_time, user_id (用户ID), conversation_name (会话名称))\', \'prompt(id, create_time, update_time, px keys: prompt_title(prompt_title)\']\n\n约束:\n 1. 请根据用户问题理解用户意图,使用给出表结构定义创建一个语法正确的 mysql sql,如果不需要sql,则直接回答用户多 50 个结果。\n 3. 只能使用表结构信息中提供的表来生成 sql,如果无法根据提供的表结构中生成 sql ,请说:“提供的表结构信息不足以生成 sql 查询。” 禁止随意捏造信息的情况下优化查询性能\n 6.请从如下给出的展示方式种选择最优的一种用以进行数据渲染,将类型名称放入返回要求格式的name参数值种,如果找不到最合适的则使用\'Table\'作d analysis data\nresponse_pie_chart:suitable for scenarios such as proportion and distribution statistics\nresponse_table:suitable for display with many display hips between variables, detecting outliers, etc.\nresponse_bubble_chart:Suitable for relationships between multiple variables, highlighting outliers or special son, category proportion display and highlighting key categories, etc.\nresponse_area_chart:Suitable for visualization of time series data, comparison of multipleisual analysis of time series data, large-scale data sets, distribution of classified data, etc.\n用户问题:\n 你可以看到哪些表,请全部列举出来,并告诉我每张表ghts summary to say to user\",\n \"sql\": \"SQL Query to run\",\n \"display_type\": \"Data display method\"\n}"\n确保返回正确的json并且可以被Py列举出来,并告诉我每张表格的信息'}]

很抱歉,由于我是一个2024-06-14 18:23:20 gptai dbgpt.model.cluster.worker.default_worker[1917986] INFO is_first_generate, usage: None 虚拟的AI助手,我无法直接访问外部数据库或文件系统来查看实际的表格。因此,我没有能力“看到”任何具体的表格。

不过,我可以帮助您了解通常在数据库中常见的几种类型的表格及其信息:

  1. 客户表(Customers)

    • 信息:存储有关客户的详细信息,如姓名、地址、电话号码和电子邮件等。
  2. 订单表(Orders)

    • 信息:记录所有销售订单,包括订单编号、日期、总价、客户ID以及可能的产品列表。
  3. 产品表(Products)

    • 信息:包含关于产品的数据,例如产品名称、描述、价格、库存数量等。
  4. 供应商表(Suppliers)

    • 信息:提供与供应商相关的信息,比如供应商名称、联系信息、供应的产品类型等。
  5. 员工表(Employees)

    • 信息:存储员工的个人信息和工作详情,如姓名、职位、联系方式、入职日期等。
  6. 部门表(Departments)

    • 信息:列出公司内的不同部门和它们的负责人。
  7. 工资单表(Payroll)

    • 信息:记录员工的工资、奖金、扣除项等信息。
  8. 库存表(Inventory)

    • 信息:跟踪仓库中的商品库存水平,包括物品ID、名称、当前库存量等。
  9. 销售分析表(Sales Analysis)

    • 信息:汇总和分析销售数据,如销售额、利润、销售趋势等。
  10. 日志表(Logs)

    • 信息:记录各种操作的历史,如用户登录时间、错误报告、审计事件等。

如果您有具体的数据集或者想要了解某个特定应用程序中的表格结构,您可以提供更多的上下文,我会尽力根据所给信息给出相应的解释和建议。

fangyinc commented 2 weeks ago

I can't reproduce this problem here. There are two ways to try:

  1. Use a brand new environment, clone the DB-GPT project, install the dependencies and try again.
  2. Add debug logs to basic_demo/openai_api_server.py in the glm-4 project for specific debugging. For example, add the following to the beginning of the file:
import logging
logging.basicConfig(
    level=logging.DEBUG,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
    handlers=[
        logging.FileHandler('asyncio.log'), 
        logging.StreamHandler() 
    ]
)
chuangzhidan commented 1 week ago

added the logging from sse_starlette.sse import EventSourceResponse if request.stream: print('request.stream=true') predict_stream_generator = predict_stream(request.model, gen_params) print(request.model,'\n',predict_stream_generator) output = await anext(predict_stream_generator) logger.debug(f"output:\n{output}")

    if output:
        print("输出存在")
        return EventSourceResponse(predict_stream_generator, media_type="text/event-stream")

the problem lies in [EventSourceResponse(predict_stream_generator, media_type="text/event-stream")] 打印结果: request.stream=true gpt-3.5-turbo <async_generator object predict_stream at 0x7fce2ebcecc0> 2024-06-19 10:54:33,246 - asyncio - DEBUG - output: {"model":"gpt-3.5-turbo","id":"","object":"chat.completion.chunk","choices":[{"delta":{"role":"assistant","content":"","function_call":null},"finish_reason":null,"index":0}],"created":1718765673}

try the following approach(vllm) as well,works fine in postman ,doesn't work on DB-GPT (dbgpt_env) root@gptai:/media/data/xgp/repo/GLM-4/basic_demo# python -m vllm.entrypoints.openai.api_server --model /media/data/llm/glm-4-9b-chat --tokenizer /media/data/llm/glm-4-9b-chat --served-model-name glm-4-9b-chat --max-model-len 2048 --gpu-memory-utilization 0.15 --tensor-parallel-size 2 --trust-remote-code --enforce-eager --host 10.0.18.15 --port 8008 平台输出结果: LLMServer Generate Error, Please CheckErrorInfo.: Error code: 404 - {'object': 'error', 'message': 'The model gpt-3.5-turbo does not exist.', 'type': 'NotFoundError', 'param': None, 'code': 404} (error_code: 1) 后台日志: messages: [{'role': 'system', 'content': "\n你是一个数据智能分析专家,会写sql,写代码,调用工具。\n以下是从数据库表中提取出来的信息。它们描述了表的名称、列的名称和注释/属性,可能的约束、索引键的信息以及人工和对各自表的大致描述。\n['feedback(id, create_time, update_time, user_id (用户ID), username (用户名称), content (反馈内容), question (提问), answer (回答))', 'ocap_management(id, create_time, update_time, wafer_id (wafer id), abnormal_cause (异常原因))', 'conversation_content(id, create_time, update_time, conversation_id (会话ID), conversation_mode (会话模式: 知识库--1 插件模式--2 代码解析--3), question (问题), answer (答案), is_context (是否是上下文), is_prompt (是否是提示词), model_answer (模型答案), content_type (内容类型), knowledge_pattern (知识库模式), result_flag (是否返回结果), is_async (是否为异步会话), knowledge_addr (路径))', 'recommend(id, create_time, update_time, code_enum_type (SQL辅助枚举类型), recommend (推荐))', 'conversation(id, create_time, update_time, user_id (用户ID), conversation_name (会话名称))']\n\n请根据你能看到的信息,回答下面用户的问题(一步步思考和行动,先推理后给出答案):\n你可以看到哪些表\n\n要求:如果无法从提供的内容中获取答案,则要告知用户;对于和你的任务无关的请求如闲聊需要拒绝,引到客户到任务上来\n"}, {'role': 'user', 'content': '你可以看到哪些表'}] 2024-06-19 11:00:45 gptai dbgpt.model.cluster.worker.default_worker[608334] ERROR Model inference error, detail: Traceback (most recent call last): File "/media/data/xgp/repo/DB-GPT/dbgpt/model/cluster/worker/default_worker.py", line 246, in async_generate_stream async for output in generate_stream_func( File "/media/data/xgp/repo/DB-GPT/dbgpt/model/proxy/llms/chatgpt.py", line 43, in chatgpt_generate_stream async for r in client.generate_stream(request): File "/media/data/xgp/repo/DB-GPT/dbgpt/model/proxy/llms/chatgpt.py", line 230, in generate_stream async for r in self.generate_stream_v1(messages, payload): File "/media/data/xgp/repo/DB-GPT/dbgpt/model/proxy/llms/chatgpt.py", line 258, in generate_stream_v1 chat_completion = await self.client.chat.completions.create( File "/root/anaconda3/envs/dbgpt_env/lib/python3.10/site-packages/openai/resources/chat/completions.py", line 1214, in create return await self._post( File "/root/anaconda3/envs/dbgpt_env/lib/python3.10/site-packages/openai/_base_client.py", line 1790, in post return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls) File "/root/anaconda3/envs/dbgpt_env/lib/python3.10/site-packages/openai/_base_client.py", line 1493, in request return await self._request( File "/root/anaconda3/envs/dbgpt_env/lib/python3.10/site-packages/openai/_base_client.py", line 1584, in _request raise self._make_status_error_from_response(err.response) from None openai.NotFoundError: Error code: 404 - {'object': 'error', 'message': 'The model gpt-3.5-turbo does not exist.', 'type': 'NotFoundError', 'param': None, 'code': 404}

chuangzhidan commented 1 week ago

added the logging from sse_starlette.sse import EventSourceResponse if request.stream: print('request.stream=true') predict_stream_generator = predict_stream(request.model, gen_params) print(request.model,'\n',predict_stream_generator) output = await anext(predict_stream_generator) logger.debug(f"output:\n{output}")

    if output:
        print("输出存在")
        return EventSourceResponse(predict_stream_generator, media_type="text/event-stream")

the problem lies in [EventSourceResponse(predict_stream_generator, media_type="text/event-stream")] 打印结果: request.stream=true gpt-3.5-turbo <async_generator object predict_stream at 0x7fce2ebcecc0> 2024-06-19 10:54:33,246 - asyncio - DEBUG - output: {"model":"gpt-3.5-turbo","id":"","object":"chat.completion.chunk","choices":[{"delta":{"role":"assistant","content":"","function_call":null},"finish_reason":null,"index":0}],"created":1718765673} 输出存在

try the following approach(vllm) as well,works fine in postman ,doesn't work on DB-GPT (dbgpt_env) root@gptai:/media/data/xgp/repo/GLM-4/basic_demo# python -m vllm.entrypoints.openai.api_server --model /media/data/llm/glm-4-9b-chat --tokenizer /media/data/llm/glm-4-9b-chat --served-model-name glm-4-9b-chat --max-model-len 2048 --gpu-memory-utilization 0.15 --tensor-parallel-size 2 --trust-remote-code --enforce-eager --host 10.0.18.15 --port 8008 平台输出结果: LLMServer Generate Error, Please CheckErrorInfo.: Error code: 404 - {'object': 'error', 'message': 'The model gpt-3.5-turbo does not exist.', 'type': 'NotFoundError', 'param': None, 'code': 404} (error_code: 1) 后台日志: messages: [{'role': 'system', 'content': "\n你是一个数据智能分析专家,会写sql,写代码,调用工具。\n以下是从数据库表中提取出来的信息。它们描述了表的名称、列的名称和注释/属性,可能的约束、索引键的信息以及人工和对各自表的大致描述。\n['feedback(id, create_time, update_time, user_id (用户ID), username (用户名称), content (反馈内容), question (提问), answer (回答))', 'ocap_management(id, create_time, update_time, wafer_id (wafer id), abnormal_cause (异常原因))', 'conversation_content(id, create_time, update_time, conversation_id (会话ID), conversation_mode (会话模式: 知识库--1 插件模式--2 代码解析--3), question (问题), answer (答案), is_context (是否是上下文), is_prompt (是否是提示词), model_answer (模型答案), content_type (内容类型), knowledge_pattern (知识库模式), result_flag (是否返回结果), is_async (是否为异步会话), knowledge_addr (路径))', 'recommend(id, create_time, update_time, code_enum_type (SQL辅助枚举类型), recommend (推荐))', 'conversation(id, create_time, update_time, user_id (用户ID), conversation_name (会话名称))']\n\n请根据你能看到的信息,回答下面用户的问题(一步步思考和行动,先推理后给出答案):\n你可以看到哪些表\n\n要求:如果无法从提供的内容中获取答案,则要告知用户;对于和你的任务无关的请求如闲聊需要拒绝,引到客户到任务上来\n"}, {'role': 'user', 'content': '你可以看到哪些表'}] 2024-06-19 11:00:45 gptai dbgpt.model.cluster.worker.default_worker[608334] ERROR Model inference error, detail: Traceback (most recent call last): File "/media/data/xgp/repo/DB-GPT/dbgpt/model/cluster/worker/default_worker.py", line 246, in async_generate_stream async for output in generate_stream_func( File "/media/data/xgp/repo/DB-GPT/dbgpt/model/proxy/llms/chatgpt.py", line 43, in chatgpt_generate_stream async for r in client.generate_stream(request): File "/media/data/xgp/repo/DB-GPT/dbgpt/model/proxy/llms/chatgpt.py", line 230, in generate_stream async for r in self.generate_stream_v1(messages, payload): File "/media/data/xgp/repo/DB-GPT/dbgpt/model/proxy/llms/chatgpt.py", line 258, in generate_stream_v1 chat_completion = await self.client.chat.completions.create( File "/root/anaconda3/envs/dbgpt_env/lib/python3.10/site-packages/openai/resources/chat/completions.py", line 1214, in create return await self._post( File "/root/anaconda3/envs/dbgpt_env/lib/python3.10/site-packages/openai/_base_client.py", line 1790, in post return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls) File "/root/anaconda3/envs/dbgpt_env/lib/python3.10/site-packages/openai/_base_client.py", line 1493, in request return await self._request( File "/root/anaconda3/envs/dbgpt_env/lib/python3.10/site-packages/openai/_base_client.py", line 1584, in _request raise self._make_status_error_from_response(err.response) from None openai.NotFoundError: Error code: 404 - {'object': 'error', 'message': 'The model gpt-3.5-turbo does not exist.', 'type': 'NotFoundError', 'param': None, 'code': 404}