是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?
[X] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions
该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?
[X] 我已经搜索过FAQ | I have searched FAQ
当前行为 | Current Behavior
I'm using the Qwen2-7B-Instruct-AWQ model, launched using the llama.cpp docker, with the api style openai.
I would then like to use the Phidata dependency, which uses function calling, however it seems that Qwen2-7B-Instruct-AWQ does not support function calling.
Error message:
(smart-meeting-nlp) (base) ➜ telecom git:(smart-meeting) ✗ /home/tangzhifeng/miniconda3/envs/smart-meeting-nlp/bin/python /home/tangzhifeng/telecom/examples/phidata_rag.py
╭──────────┬───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Message │ What is the capital of France? │
├──────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Response │ The capital of France is Paris. │
│ (0.2s) │ │
╰──────────┴───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
DEBUG Debug logs enabled
DEBUG Debug logs enabled
DEBUG *********** Assistant Run Start: 123 ***********
DEBUG Loaded memory for user: 123
DEBUG Function get_chat_history added to LLM.
DEBUG Function search_knowledge_base added to LLM.
DEBUG ---------- OpenAI Response Start ----------
DEBUG ============== system ==============
DEBUG You are a helpful Assistant called 'AutoRAG' and your goal is to assist the user in the best way possible.
You must follow these instructions carefully:
<instructions>
1. Given a user query, first ALWAYS search your knowledge base using the `search_knowledge_base` tool to see if you have relevant
information.
2. If you dont find relevant information in your knowledge base, use the `duckduckgo_search` tool to search the internet.
3. If you need to reference the chat history, use the `get_chat_history` tool.
4. If the users question is unclear, ask clarifying questions to get more information.
5. Carefully read the information you have gathered and provide a clear and concise answer to the user.
6. Do not use phrases like 'based on my knowledge' or 'depending on the information'.
7. Use markdown to format your answers.
8. The current time is 2024-07-11 18:03:07.330254
</instructions>
DEBUG ============== user ==============
DEBUG What is the capital of France?
⠋ Working...
Traceback (most recent call last):
File "/home/tangzhifeng/telecom/examples/phidata_rag.py", line 95, in <module>
assistant_1.print_response("What is the capital of France?", markdown=True)
File "/home/tangzhifeng/miniconda3/envs/smart-meeting-nlp/lib/python3.9/site-packages/phi/assistant/assistant.py", line 1470, in print_response
for resp in self.run(message=message, messages=messages, stream=True, **kwargs):
File "/home/tangzhifeng/miniconda3/envs/smart-meeting-nlp/lib/python3.9/site-packages/phi/assistant/assistant.py", line 890, in _run
for response_chunk in self.llm.response_stream(messages=llm_messages):
File "/home/tangzhifeng/miniconda3/envs/smart-meeting-nlp/lib/python3.9/site-packages/phi/llm/openai/chat.py", line 616, in response_stream
for response in self.invoke_stream(messages=messages):
File "/home/tangzhifeng/miniconda3/envs/smart-meeting-nlp/lib/python3.9/site-packages/phi/llm/openai/chat.py", line 218, in invoke_stream
yield from self.get_client().chat.completions.create(
File "/home/tangzhifeng/miniconda3/envs/smart-meeting-nlp/lib/python3.9/site-packages/openai/_utils/_utils.py", line 277, in wrapper
return func(*args, **kwargs)
File "/home/tangzhifeng/miniconda3/envs/smart-meeting-nlp/lib/python3.9/site-packages/openai/resources/chat/completions.py", line 643, in create
return self._post(
File "/home/tangzhifeng/miniconda3/envs/smart-meeting-nlp/lib/python3.9/site-packages/openai/_base_client.py", line 1266, in post
return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
File "/home/tangzhifeng/miniconda3/envs/smart-meeting-nlp/lib/python3.9/site-packages/openai/_base_client.py", line 942, in request
return self._request(
File "/home/tangzhifeng/miniconda3/envs/smart-meeting-nlp/lib/python3.9/site-packages/openai/_base_client.py", line 1031, in _request
return self._retry_request(
File "/home/tangzhifeng/miniconda3/envs/smart-meeting-nlp/lib/python3.9/site-packages/openai/_base_client.py", line 1079, in _retry_request
return self._request(
File "/home/tangzhifeng/miniconda3/envs/smart-meeting-nlp/lib/python3.9/site-packages/openai/_base_client.py", line 1031, in _request
return self._retry_request(
File "/home/tangzhifeng/miniconda3/envs/smart-meeting-nlp/lib/python3.9/site-packages/openai/_base_client.py", line 1079, in _retry_request
return self._request(
File "/home/tangzhifeng/miniconda3/envs/smart-meeting-nlp/lib/python3.9/site-packages/openai/_base_client.py", line 1046, in _request
raise self._make_status_error_from_response(err.response) from None
openai.InternalServerError: Error code: 500 - {'error': {'code': 500, 'message': 'Unsupported param: tools', 'type': 'server_error'}}
期望行为 | Expected Behavior
I would expect it to work properly with function calling, but of course I'm not sure if it's a problem with Qwen2-7B-Instruct-AWQ or llama.cpp.
复现方法 | Steps To Reproduce
start qwen2-7b-instruct-q5_k_m.gguf
docker run -p 8080:8080 -v /home/tangzhifeng/MODELZOOS:/models --gpus all ghcr.io/ggerganov/llama.cpp:server-cuda -m models/Qwen/Qwen2-7B-Instruct-GGUF/qwen2-7b-instruct-q5_k_m.gguf -c 2048 --host 0.0.0.0 --port 8080
from phi.assistant import Assistant
from phi.knowledge import AssistantKnowledge
from phi.llm.openai import OpenAIChat
from phi.tools.duckduckgo import DuckDuckGo
from phi.embedder.openai import OpenAIEmbedder
from phi.vectordb.pgvector import PgVector2
from phi.storage.assistant.postgres import PgAssistantStorage
return Assistant(
name="auto_rag_assistant",
run_id=run_id,
user_id=user_id,
llm=OpenAIChat(base_url=base_url, model=llm_model, api_key=api_key),
storage=PgAssistantStorage(
table_name="auto_rag_assistant_openai", db_url=db_url
),
knowledge_base=AssistantKnowledge(
vector_db=PgVector2(
db_url=db_url,
collection="auto_rag_documents_openai",
embedder=OpenAIEmbedder(
model="text-embedding-3-small", dimensions=1536
),
),
# 3 references are added to the prompt
num_documents=3,
),
description="You are a helpful Assistant called 'AutoRAG' and your goal is to assist the user in the best way possible.",
instructions=[
"Given a user query, first ALWAYS search your knowledge base using the `search_knowledge_base` tool to see if you have relevant information.",
"If you dont find relevant information in your knowledge base, use the `duckduckgo_search` tool to search the internet.",
"If you need to reference the chat history, use the `get_chat_history` tool.",
"If the users question is unclear, ask clarifying questions to get more information.",
"Carefully read the information you have gathered and provide a clear and concise answer to the user.",
"Do not use phrases like 'based on my knowledge' or 'depending on the information'.",
],
# Show tool calls in the chat
show_tool_calls=True,
# This setting gives the LLM a tool to search the knowledge base for information
search_knowledge=True,
# This setting gives the LLM a tool to get chat history
read_chat_history=True,
markdown=True,
# Adds chat history to messages
add_chat_history_to_messages=True,
add_datetime_to_instructions=True,
debug_mode=debug_mode,
)
if name == "main":
assistant = Assistant(
llm=OpenAIChat(
base_url="http://127.0.0.1:8000/v1",
model="Qwen2-7B-Instruct-AWQ",
api_key="token-abc123",
stop=["<|im_end|>"],
temperature=0.001,
),
)
assistant.print_response("What is the capital of France?", markdown=True)
assistant_1 = get_auto_rag_assistant(
"http://localhost:8000/v1",
"Qwen2-7B-Instruct-AWQ",
"token-abc123",
user_id="123",
run_id="123",
)
assistant_2 = get_auto_rag_assistant(
"http://localhost:8000/v1",
"Qwen2-7B-Instruct-AWQ",
"token-abc123",
user_id="123",
run_id="1234",
)
assistant_1.print_response("What is the capital of France?", markdown=True)
assistant_1.print_response(
"不对应该是北京,我重新再问问你,What is the capital of France?"
)
assistant_2.print_response("What is the capital of France?", markdown=True)
是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?
该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?
当前行为 | Current Behavior
I'm using the
Qwen2-7B-Instruct-AWQ
model, launched using the llama.cpp docker, with the api style openai.I would then like to use the Phidata dependency, which uses function calling, however it seems that
Qwen2-7B-Instruct-AWQ
does not support function calling.Error message:
期望行为 | Expected Behavior
I would expect it to work properly with function calling, but of course I'm not sure if it's a problem with
Qwen2-7B-Instruct-AWQ
orllama.cpp
.复现方法 | Steps To Reproduce
qwen2-7b-instruct-q5_k_m.gguf
phidata/pgvector:16
from phi.assistant import Assistant from phi.knowledge import AssistantKnowledge from phi.llm.openai import OpenAIChat from phi.tools.duckduckgo import DuckDuckGo from phi.embedder.openai import OpenAIEmbedder from phi.vectordb.pgvector import PgVector2 from phi.storage.assistant.postgres import PgAssistantStorage
db_url = "postgresql+psycopg://ai:ai@localhost:5532/ai"
def get_auto_rag_assistant( base_url: str, llm_model: str, api_key: str, user_id: Optional[str] = None, run_id: Optional[str] = None, debug_mode: bool = True, ) -> Assistant: """Get an Auto RAG Assistant."""
if name == "main":