[BUG]: Agent calls are sent to LLM provider twice

Propheticus commented 6 months ago

How are you running AnythingLLM?

Docker (local)

What happened?

Using Docker latest image build (b3f0a413afc2bf3efec4042a158203c76d2b14279adbd29753a37239be1925fc)

"org.opencontainers.image.created": "2024-05-11T00:34:10.621Z",
 "org.opencontainers.image.description": "The all-in-one Desktop \u0026 Docker AI application with full RAG and AI Agent capabilities.",
  "org.opencontainers.image.licenses": "MIT",
  "org.opencontainers.image.ref.name": "ubuntu",
   "org.opencontainers.image.revision": "948ac8a3ddd61919b1f333c544b3b593d6b61343",

Using the agent call feature, results in 2 calls to the LLM provider and twice a function call in JSON is replied back.

This one agent call, was received in LM studio (and responded to) 2 times. LMStudio server logs of POST requests:


[2024-05-11 13:44:57.360] [INFO] Received POST request to /v1/chat/completions with body: {
  "model": "lmstudio-community/Meta-Llama-3-8B-Instruct-BPE-fix-GGUF/Meta-Llama-3-8B-Instruct-Q8_0.gguf",
  "temperature": 0,
  "messages": [
    {
      "content": "You are a program which picks the most optimal function and parameters to call.\n      DO NOT HAVE TO PICK A FUNCTION IF IT WILL NOT HELP ANSWER OR FULFILL THE USER'S QUERY.\n      When a function is selection, respond in JSON with no additional text.\n      When there is no relevant function to call - return with a regular chat text response.\n      Your task is to pick a **single** function that we will use to call, if any seem useful or relevant for the user query.\n\n      All JSON responses should have two keys.\n      'name': this is the name of the function name to call. eg: 'web-scraper', 'rag-memory', etc..\n      'arguments': this is an object with the function properties to invoke the function.\n      DO NOT INCLUDE ANY OTHER KEYS IN JSON RESPONSES.\n\n      Here are the available tools you can use an examples of a query and response so you can understand how each one works.\n      -----------\nFunction name: rag-memory\nFunction Description: Search against local documents for context that is relevant to the query or store a snippet of text into memory for retrieval later. Storing information should only be done when the user specifically requests for information to be remembered or saved to long-term memory. You should use this tool before search the internet for information. Do not use this tool unless you are explicity told to 'remember' or 'store' information.\nFunction parameters in JSON format:\n{\n    \"action\": {\n        \"type\": \"string\",\n        \"enum\": [\n            \"search\",\n            \"store\"\n        ],\n        \"description\": \"The action we want to take to search for existing similar context or storage of new context.\"\n    },\n    \"content\": {\n        \"type\": \"string\",\n        \"description\": \"The plain text to search our local documents with or to store in our vector database.\"\n    }\n}\nQuery: \"What is AnythingLLM?\"\nJSON: {\"action\":\"search\",\"content\":\"What is AnythingLLM?\"}\nQuery: \"What do you know about Plato's motives?\"\nJSON: {\"action\":\"search\",\"content\":\"What are the facts about Plato's motives?\"}\nQuery: \"Remember that you are a robot\"\nJSON: {\"action\":\"store\",\"content\":\"I am a robot, the user told me that i am.\"}\nQuery: \"Save that to memory please.\"\nJSON: {\"action\":\"store\",\"content\":\"<insert summary of conversation until now>\"}\n-----------\n-----------\nFunction name: document-summarizer\nFunction Description: Can get the list of files available to search with descriptions and can select a single file to open and summarize.\nFunction parameters in JSON format:\n{\n    \"action\": {\n        \"type\": \"string\",\n        \"enum\": [\n            \"list\",\n            \"summarize\"\n        ],\n        \"description\": \"The action to take. 'list' will return all files available with their filename and descriptions. 'summarize' will open and summarize the file by the a document name.\"\n    },\n    \"document_filename\": {\n        \"type\": \"string\",\n        \"x-nullable\": true,\n        \"description\": \"The file name of the document you want to get the full content of.\"\n    }\n}\nQuery: \"Summarize example.txt\"\nJSON: {\"action\":\"summarize\",\"document_filename\":\"example.txt\"}\nQuery: \"What files can you see?\"\nJSON: {\"action\":\"list\",\"document_filename\":null}\nQuery: \"Tell me about readme.md\"\nJSON: {\"action\":\"summarize\",\"document_filename\":\"readme.md\"}\n-----------\n-----------\nFunction name: web-scraping\nFunction Description: Scrapes the content of a webpage or online resource from a provided URL.\nFunction parameters in JSON format:\n{\n    \"url\": {\n        \"type\": \"string\",\n        \"format\": \"uri\",\n        \"description\": \"A complete web address URL including protocol. Assumes https if not provided.\"\n    }\n}\nQuery: \"What is useanything.com about?\"\nJSON: {\"uri\":\"https://useanything.com\"}\nQuery: \"Scrape https://example.com\"\nJSON: {\"uri\":\"https://example.com\"}\n-----------\n-----------\nFunction name: web-browsing\nFunction Description: Searches for a given query using a search engine to get better results for the user query.\nFunction parameters in JSON format:\n{\n    \"query\": {\n        \"type\": \"string\",\n        \"description\": \"A search query.\"\n    }\n}\nQuery: \"Who won the world series today?\"\nJSON: {\"query\":\"Winner of today's world series\"}\nQuery: \"What is AnythingLLM?\"\nJSON: {\"query\":\"AnythingLLM\"}\nQuery: \"Current AAPL stock price\"\nJSON: {\"query\":\"AAPL stock price today\"}\n-----------\n\n\n      Now pick a function if there is an appropriate one to use given the last user message and the given conversation so far.",
      "role": "system"
    },
    {
      "content": "@agent can you web browse the current time and date in <redacted>?",
      "role": "user"
    }
  ]
}

[2024-05-11 13:44:59.067] [INFO] [LM STUDIO SERVER] [lmstudio-community/Meta-Llama-3-8B-Instruct-BPE-fix-GGUF/Meta-Llama-3-8B-Instruct-Q8_0.gguf] Generated prediction: {
  "id": "chatcmpl-t4flyse8xt5kdwqai6axs",
  "object": "chat.completion",
  "created": 1715427897,
  "model": "lmstudio-community/Meta-Llama-3-8B-Instruct-BPE-fix-GGUF/Meta-Llama-3-8B-Instruct-Q8_0.gguf",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "I've selected the **web-browsing** function for this query.\n\nHere's the JSON response:\n\n{\"name\": \"web-browsing\", \"arguments\": {\"query\": \"current time in <redacted>\"}}"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 43,
    "completion_tokens": 42,
    "total_tokens": 85
  }
}
...

[2024-05-11 13:44:59.593] [INFO] Received POST request to /v1/chat/completions with body: {
  "model": "lmstudio-community/Meta-Llama-3-8B-Instruct-BPE-fix-GGUF/Meta-Llama-3-8B-Instruct-Q8_0.gguf",
  "temperature": 0,
  "messages": [
    {
      "content": "You are a program which picks the most optimal function and parameters to call.\n      DO NOT HAVE TO PICK A FUNCTION IF IT WILL NOT HELP ANSWER OR FULFILL THE USER'S QUERY.\n      When a function is selection, respond in JSON with no additional text.\n      When there is no relevant function to call - return with a regular chat text response.\n      Your task is to pick a **single** function that we will use to call, if any seem useful or relevant for the user query.\n\n      All JSON responses should have two keys.\n      'name': this is the name of the function name to call. eg: 'web-scraper', 'rag-memory', etc..\n      'arguments': this is an object with the function properties to invoke the function.\n      DO NOT INCLUDE ANY OTHER KEYS IN JSON RESPONSES.\n\n      Here are the available tools you can use an examples of a query and response so you can understand how each one works.\n      -----------\nFunction name: rag-memory\nFunction Description: Search against local documents for context that is relevant to the query or store a snippet of text into memory for retrieval later. Storing information should only be done when the user specifically requests for information to be remembered or saved to long-term memory. You should use this tool before search the internet for information. Do not use this tool unless you are explicity told to 'remember' or 'store' information.\nFunction parameters in JSON format:\n{\n    \"action\": {\n        \"type\": \"string\",\n        \"enum\": [\n            \"search\",\n            \"store\"\n        ],\n        \"description\": \"The action we want to take to search for existing similar context or storage of new context.\"\n    },\n    \"content\": {\n        \"type\": \"string\",\n        \"description\": \"The plain text to search our local documents with or to store in our vector database.\"\n    }\n}\nQuery: \"What is AnythingLLM?\"\nJSON: {\"action\":\"search\",\"content\":\"What is AnythingLLM?\"}\nQuery: \"What do you know about Plato's motives?\"\nJSON: {\"action\":\"search\",\"content\":\"What are the facts about Plato's motives?\"}\nQuery: \"Remember that you are a robot\"\nJSON: {\"action\":\"store\",\"content\":\"I am a robot, the user told me that i am.\"}\nQuery: \"Save that to memory please.\"\nJSON: {\"action\":\"store\",\"content\":\"<insert summary of conversation until now>\"}\n-----------\n-----------\nFunction name: document-summarizer\nFunction Description: Can get the list of files available to search with descriptions and can select a single file to open and summarize.\nFunction parameters in JSON format:\n{\n    \"action\": {\n        \"type\": \"string\",\n        \"enum\": [\n            \"list\",\n            \"summarize\"\n        ],\n        \"description\": \"The action to take. 'list' will return all files available with their filename and descriptions. 'summarize' will open and summarize the file by the a document name.\"\n    },\n    \"document_filename\": {\n        \"type\": \"string\",\n        \"x-nullable\": true,\n        \"description\": \"The file name of the document you want to get the full content of.\"\n    }\n}\nQuery: \"Summarize example.txt\"\nJSON: {\"action\":\"summarize\",\"document_filename\":\"example.txt\"}\nQuery: \"What files can you see?\"\nJSON: {\"action\":\"list\",\"document_filename\":null}\nQuery: \"Tell me about readme.md\"\nJSON: {\"action\":\"summarize\",\"document_filename\":\"readme.md\"}\n-----------\n-----------\nFunction name: web-scraping\nFunction Description: Scrapes the content of a webpage or online resource from a provided URL.\nFunction parameters in JSON format:\n{\n    \"url\": {\n        \"type\": \"string\",\n        \"format\": \"uri\",\n        \"description\": \"A complete web address URL including protocol. Assumes https if not provided.\"\n    }\n}\nQuery: \"What is useanything.com about?\"\nJSON: {\"uri\":\"https://useanything.com\"}\nQuery: \"Scrape https://example.com\"\nJSON: {\"uri\":\"https://example.com\"}\n-----------\n-----------\nFunction name: web-browsing\nFunction Description: Searches for a given query using a search engine to get better results for the user query.\nFunction parameters in JSON format:\n{\n    \"query\": {\n        \"type\": \"string\",\n        \"description\": \"A search query.\"\n    }\n}\nQuery: \"Who won the world series today?\"\nJSON: {\"query\":\"Winner of today's world series\"}\nQuery: \"What is AnythingLLM?\"\nJSON: {\"query\":\"AnythingLLM\"}\nQuery: \"Current AAPL stock price\"\nJSON: {\"query\":\"AAPL stock price today\"}\n-----------\n\n\n      Now pick a function if there is an appropriate one to use given the last user message and the given conversation so far.",
      "role": "system"
    },
    {
      "content": "@agent can you web browse the current time and date in <redacted>?",
      "role": "user"
    }
  ]
}

[2024-05-11 13:45:00.672] [INFO] [LM STUDIO SERVER] [lmstudio-community/Meta-Llama-3-8B-Instruct-BPE-fix-GGUF/Meta-Llama-3-8B-Instruct-Q8_0.gguf] Generated prediction: {
  "id": "chatcmpl-t5uh7zgqm2jhck23pbpro",
  "object": "chat.completion",
  "created": 1715427899,
  "model": "lmstudio-community/Meta-Llama-3-8B-Instruct-BPE-fix-GGUF/Meta-Llama-3-8B-Instruct-Q8_0.gguf",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "I've selected the **web-browsing** function for this query.\n\nHere's the JSON response:\n\n{\"name\": \"web-browsing\", \"arguments\": {\"query\": \"current time in <redacted>\"}}"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 43,
    "completion_tokens": 42,
    "total_tokens": 85
  }
}

Docker logs of AnythingLLM container:

024-05-11 13:44:59 [AgentLLM - lmstudio-community/Meta-Llama-3-8B-Instruct-BPE-fix-GGUF/Meta-Llama-3-8B-Instruct-Q8_0.gguf] Valid tool call found - running web-browsing.
2024-05-11 13:44:59 [AgentHandler] [debug]: @agent is attempting to call `web-browsing` tool
2024-05-11 13:45:00 [AgentLLM - lmstudio-community/Meta-Llama-3-8B-Instruct-BPE-fix-GGUF/Meta-Llama-3-8B-Instruct-Q8_0.gguf] Function tool with exact arguments has already been called this stack.
2024-05-11 13:45:00 [AgentLLM - lmstudio-community/Meta-Llama-3-8B-Instruct-BPE-fix-GGUF/Meta-Llama-3-8B-Instruct-Q8_0.gguf] Will assume chat completion without tool call inputs.

Are there known steps to reproduce?

No response

timothycarambat commented 6 months ago

The way that agent calling works is not 1<>1 text to action. There are possibilities where an agent can call the same action twice, call the actions forever in a loop, and other repetitions like that. In fact, on many of the agent tools we have a built-in de-duplicator that prevents this.

You even get this behavior with GPT, so its just an LLM thing. Given the temp is set to 0 this makes sense the requests look the same. That is just an artififact of tool calling, some tools dont have a de-duplicator on them and i believe web-search is one of them

Propheticus commented 6 months ago

By agent you mean the LLM assuming that role and picking actions? I'm trying to understand, but the issue here seems that the original (very first) query and system instruction telling it to assume the agent role is sent twice. It's not the LLM/agent going in a circle. It did not receive the output of the web browsing and again call a web browse, it received the first "you are a program which picks the...." system prompt again. Due to the temp being 0 it's expected the reply is the same.

fmg-tomdifulvio commented 4 months ago

Hi, is there a solution for this? Currently agent calls for Bing search are not working due to the same issue using Azure OpenAI as the LLM.

freakshock88 commented 1 month ago

It seems this bug also prevents me from using SearXNG agent in combination with Azure OpenAI.

Mintplex-Labs / anything-llm