QwenLM / Qwen2

Qwen2 is the large language model series developed by Qwen team, Alibaba Cloud.
7.43k stars 454 forks source link

[BUG] openai.InternalServerError: Error code: 500 - {'error': {'code': 500, 'message': 'Unsupported param: tools', 'type': 'server_error'}} #822

Closed JavanTang closed 1 month ago

JavanTang commented 2 months ago

是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?

该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?

当前行为 | Current Behavior

I'm using the Qwen2-7B-Instruct-AWQ model, launched using the llama.cpp docker, with the api style openai.

I would then like to use the Phidata dependency, which uses function calling, however it seems that Qwen2-7B-Instruct-AWQ does not support function calling.

Error message:

(smart-meeting-nlp) (base) ➜  telecom git:(smart-meeting) ✗ /home/tangzhifeng/miniconda3/envs/smart-meeting-nlp/bin/python /home/tangzhifeng/telecom/examples/phidata_rag.py
╭──────────┬───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Message  │ What is the capital of France?                                                                                                │
├──────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Response │ The capital of France is Paris.                                                                                               │
│ (0.2s)   │                                                                                                                               │
╰──────────┴───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
DEBUG    Debug logs enabled                                                                                                                 
DEBUG    Debug logs enabled                                                                                                                 
DEBUG    *********** Assistant Run Start: 123 ***********                                                                                   
DEBUG    Loaded memory for user: 123                                                                                                        
DEBUG    Function get_chat_history added to LLM.                                                                                            
DEBUG    Function search_knowledge_base added to LLM.                                                                                       
DEBUG    ---------- OpenAI Response Start ----------                                                                                        
DEBUG    ============== system ==============                                                                                               
DEBUG    You are a helpful Assistant called 'AutoRAG' and your goal is to assist the user in the best way possible.                         
         You must follow these instructions carefully:                                                                                      
         <instructions>                                                                                                                     
         1. Given a user query, first ALWAYS search your knowledge base using the `search_knowledge_base` tool to see if you have relevant  
         information.                                                                                                                       
         2. If you dont find relevant information in your knowledge base, use the `duckduckgo_search` tool to search the internet.          
         3. If you need to reference the chat history, use the `get_chat_history` tool.                                                     
         4. If the users question is unclear, ask clarifying questions to get more information.                                             
         5. Carefully read the information you have gathered and provide a clear and concise answer to the user.                            
         6. Do not use phrases like 'based on my knowledge' or 'depending on the information'.                                              
         7. Use markdown to format your answers.                                                                                            
         8. The current time is 2024-07-11 18:03:07.330254                                                                                  
         </instructions>                                                                                                                    
DEBUG    ============== user ==============                                                                                                 
DEBUG    What is the capital of France?                                                                                                     
⠋ Working...
Traceback (most recent call last):
  File "/home/tangzhifeng/telecom/examples/phidata_rag.py", line 95, in <module>
    assistant_1.print_response("What is the capital of France?", markdown=True)
  File "/home/tangzhifeng/miniconda3/envs/smart-meeting-nlp/lib/python3.9/site-packages/phi/assistant/assistant.py", line 1470, in print_response
    for resp in self.run(message=message, messages=messages, stream=True, **kwargs):
  File "/home/tangzhifeng/miniconda3/envs/smart-meeting-nlp/lib/python3.9/site-packages/phi/assistant/assistant.py", line 890, in _run
    for response_chunk in self.llm.response_stream(messages=llm_messages):
  File "/home/tangzhifeng/miniconda3/envs/smart-meeting-nlp/lib/python3.9/site-packages/phi/llm/openai/chat.py", line 616, in response_stream
    for response in self.invoke_stream(messages=messages):
  File "/home/tangzhifeng/miniconda3/envs/smart-meeting-nlp/lib/python3.9/site-packages/phi/llm/openai/chat.py", line 218, in invoke_stream
    yield from self.get_client().chat.completions.create(
  File "/home/tangzhifeng/miniconda3/envs/smart-meeting-nlp/lib/python3.9/site-packages/openai/_utils/_utils.py", line 277, in wrapper
    return func(*args, **kwargs)
  File "/home/tangzhifeng/miniconda3/envs/smart-meeting-nlp/lib/python3.9/site-packages/openai/resources/chat/completions.py", line 643, in create
    return self._post(
  File "/home/tangzhifeng/miniconda3/envs/smart-meeting-nlp/lib/python3.9/site-packages/openai/_base_client.py", line 1266, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
  File "/home/tangzhifeng/miniconda3/envs/smart-meeting-nlp/lib/python3.9/site-packages/openai/_base_client.py", line 942, in request
    return self._request(
  File "/home/tangzhifeng/miniconda3/envs/smart-meeting-nlp/lib/python3.9/site-packages/openai/_base_client.py", line 1031, in _request
    return self._retry_request(
  File "/home/tangzhifeng/miniconda3/envs/smart-meeting-nlp/lib/python3.9/site-packages/openai/_base_client.py", line 1079, in _retry_request
    return self._request(
  File "/home/tangzhifeng/miniconda3/envs/smart-meeting-nlp/lib/python3.9/site-packages/openai/_base_client.py", line 1031, in _request
    return self._retry_request(
  File "/home/tangzhifeng/miniconda3/envs/smart-meeting-nlp/lib/python3.9/site-packages/openai/_base_client.py", line 1079, in _retry_request
    return self._request(
  File "/home/tangzhifeng/miniconda3/envs/smart-meeting-nlp/lib/python3.9/site-packages/openai/_base_client.py", line 1046, in _request
    raise self._make_status_error_from_response(err.response) from None
openai.InternalServerError: Error code: 500 - {'error': {'code': 500, 'message': 'Unsupported param: tools', 'type': 'server_error'}}

期望行为 | Expected Behavior

I would expect it to work properly with function calling, but of course I'm not sure if it's a problem with Qwen2-7B-Instruct-AWQ or llama.cpp.

复现方法 | Steps To Reproduce

  1. start qwen2-7b-instruct-q5_k_m.gguf
    docker run -p 8080:8080 -v /home/tangzhifeng/MODELZOOS:/models --gpus all ghcr.io/ggerganov/llama.cpp:server-cuda -m models/Qwen/Qwen2-7B-Instruct-GGUF/qwen2-7b-instruct-q5_k_m.gguf -c 2048 --host 0.0.0.0 --port 8080
  2. pip install phidata
  3. start phidata/pgvector:16
    docker run -d \
    -e POSTGRES_DB=ai \
    -e POSTGRES_USER=ai \
    -e POSTGRES_PASSWORD=ai \
    -e PGDATA=/var/lib/postgresql/data/pgdata \
    -v pgvolume:/var/lib/postgresql/data \
    -p 5532:5432 \
    --name pgvector \
    phidata/pgvector:16
  4. run code:
    
    from typing import Optional

from phi.assistant import Assistant from phi.knowledge import AssistantKnowledge from phi.llm.openai import OpenAIChat from phi.tools.duckduckgo import DuckDuckGo from phi.embedder.openai import OpenAIEmbedder from phi.vectordb.pgvector import PgVector2 from phi.storage.assistant.postgres import PgAssistantStorage

db_url = "postgresql+psycopg://ai:ai@localhost:5532/ai"

def get_auto_rag_assistant( base_url: str, llm_model: str, api_key: str, user_id: Optional[str] = None, run_id: Optional[str] = None, debug_mode: bool = True, ) -> Assistant: """Get an Auto RAG Assistant."""

return Assistant(
    name="auto_rag_assistant",
    run_id=run_id,
    user_id=user_id,
    llm=OpenAIChat(base_url=base_url, model=llm_model, api_key=api_key),
    storage=PgAssistantStorage(
        table_name="auto_rag_assistant_openai", db_url=db_url
    ),
    knowledge_base=AssistantKnowledge(
        vector_db=PgVector2(
            db_url=db_url,
            collection="auto_rag_documents_openai",
            embedder=OpenAIEmbedder(
                model="text-embedding-3-small", dimensions=1536
            ),
        ),
        # 3 references are added to the prompt
        num_documents=3,
    ),
    description="You are a helpful Assistant called 'AutoRAG' and your goal is to assist the user in the best way possible.",
    instructions=[
        "Given a user query, first ALWAYS search your knowledge base using the `search_knowledge_base` tool to see if you have relevant information.",
        "If you dont find relevant information in your knowledge base, use the `duckduckgo_search` tool to search the internet.",
        "If you need to reference the chat history, use the `get_chat_history` tool.",
        "If the users question is unclear, ask clarifying questions to get more information.",
        "Carefully read the information you have gathered and provide a clear and concise answer to the user.",
        "Do not use phrases like 'based on my knowledge' or 'depending on the information'.",
    ],
    # Show tool calls in the chat
    show_tool_calls=True,
    # This setting gives the LLM a tool to search the knowledge base for information
    search_knowledge=True,
    # This setting gives the LLM a tool to get chat history
    read_chat_history=True,
    markdown=True,
    # Adds chat history to messages
    add_chat_history_to_messages=True,
    add_datetime_to_instructions=True,
    debug_mode=debug_mode,
)

if name == "main":

assistant = Assistant(
    llm=OpenAIChat(
        base_url="http://127.0.0.1:8000/v1",
        model="Qwen2-7B-Instruct-AWQ",
        api_key="token-abc123",
        stop=["<|im_end|>"],
        temperature=0.001,
    ),
)
assistant.print_response("What is the capital of France?", markdown=True)

assistant_1 = get_auto_rag_assistant(
    "http://localhost:8000/v1",
    "Qwen2-7B-Instruct-AWQ",
    "token-abc123",
    user_id="123",
    run_id="123",
)

assistant_2 = get_auto_rag_assistant(
    "http://localhost:8000/v1",
    "Qwen2-7B-Instruct-AWQ",
    "token-abc123",
    user_id="123",
    run_id="1234",
)

assistant_1.print_response("What is the capital of France?", markdown=True)
assistant_1.print_response(
    "不对应该是北京,我重新再问问你,What is the capital of France?"
)
assistant_2.print_response("What is the capital of France?", markdown=True)


### 运行环境 | Environment

_No response_

### 备注 | Anything else?

_No response_
jklj077 commented 1 month ago

The API created by llama.cpp does not support tool calls.