Closed majacinka closed 9 months ago
I'm experiencing the same issues with a few models too. Mostly llama2 and openhermes. It appears the local model seems to "lose track" of what it's supposed to be doing and providing no action after the thought, but I could be wrong.
I tried running a few 13B models - Llama 2 and Vicuna. I assumed that the bigger model = better results but that wasn't the case. I think that "losing track" is a right way to describe the issue. It looks like local model totally forgets about all the prompts and starts looping.
same problem with phi-2
I'm having the same problem with Mistral and Openhermes. CrewAI stops with the following output: Task output: Agent stopped due to iteration limit or time limit.
It seems to work with OpenChat though!
Happening to me too. OpenAI's APIs are running fine but running with local Ollama models fails after a certain point with that exact error.
It seems to work with OpenChat though!
I tried the Stock Analysis example with OpenChat and it goes off the rails pretty quickly. Any suggestions?
It seems to work with OpenChat though!
I tried the Stock Analysis example with OpenChat and it goes off the rails pretty quickly. Any suggestions?
I tried Mixtral and its the only one that does work with crewai bit consistent when compared to others, even having agents for just code generation with codellama failed so to my knowledge only 8X7b works 5/10
It seems to work with OpenChat though!
I tried the Stock Analysis example with OpenChat and it goes off the rails pretty quickly. Any suggestions?
Have you tried the new instragram_post example? That seems to work for me.
I've modified my model to handle num_ctx=16384
(running on a RTX 3090), no issues since.
It seems to work with OpenChat though!
I tried the Stock Analysis example with OpenChat and it goes off the rails pretty quickly. Any suggestions?
Have you tried the new instragram_post example? That seems to work for me.
I have not tried the instagram_post example as I have no use for it. I'm very interested in the stock analysis agent, though I still haven't had any success getting it to work well with a local model.
I've modified my model to handle
num_ctx=16384
(running on a RTX 3090), no issues since.
Which model are you running?
Hi everyone, thank you all for replying and sharing your experiences. I wanted to share my observations and maybe somebody might find them helpful and save up some time.
Over the last 10 days, I've experimented with 15 different models. My laptop has 16 GB RAM, my goal for my agents was to scrape data from a particular subreddit and to turn that data into simple, short newsletter written in layman words.
Of those 15 models, only 2 were able to accomplish the task: GPT4 and Llama 2 13B (base model).
Models I've played with that have failed were:
I have tried to tweak my prompts, I've played with modelfile by setting all kinds of parameters, but the only conclusion that I came up with is: more parameters = more reasoning.
The reason why agents failed is because they either:
I have one more theory but I can't test it due to insufficient RAM my laptop. I wonder if models with 7B p but context window of 16K tokens would be able to perform the task. In other words, would bigger context window = more reasoning?
Hello, I have experienced the same issue with Openhermes before, but since I configured the temperture to 0.1, it works great.
I was having looping problem before as well, but with Gemini Pro, with temperature at 0.6, all issues gone.
Hey folks, finally catching up to this!
Indeed smaller models do struggle with certain tools, specially more complex ones, I think there is room for us to optimize the crew prompts a bit maybe, I'll look into that, but in the end of the day smaller models do struggle with cognition.
I'm collecting data to fine tune these models into agentic models that will be trained to behave more like agents, this should provide way more reliability in even small models.
I think a good next action here might be to mention the best models on our new docs, and doing some test on slightly changing the prompts for smaller models, I'll take a look at that, meanwhile I'm closing this one, but open to re-open if there are requests :)
I had success running a simple crew with one function. Benchmarks of the different models and if they worked with function calling is below. Hopefully, this helps someone! All testing was done using LM Studio as the API Server.
I've modified my model to handle
num_ctx=16384
(running on a RTX 3090), no issues since.Which model are you running?
sorry, I didn't check my email bc vacations.
OpenHermes.
I found another cause of this bug. When the task prompt is too strong, it changes some important (internal) keywords, like
The agent will fail to parse the text.
For example, I have a task to ask the agent to use a markdown header to denote headers. It transformed Action:
into **Action:**
To resolve this, I add the following to all of my task prompts:
These keywords must never be translated and transformed:
- Action:
- Thought:
- Action Input:
because they are part of the thinking process instead of the output.
@kingychiu that worked. I'm running TheBloke/dolphin-2.2.1-mistral-7B-GGUF
on LMStudio.
With the @kingychiu hack, I've got Error executing tool. Missing exact 3 pipe (|) separated values.
I had to add Action Input should be formatted as coworker|task|context
.
allow_delegation=True,
llm=Ollama(model="codellama:34b")
With the @kingychiu hack, I've got
Error executing tool. Missing exact 3 pipe (|) separated values.
I had to addAction Input should be formatted as coworker|task|context
.allow_delegation=True, llm=Ollama(model="codellama:34b")
I think these are other internal keywords being overwritten? It is interesting and concerning that the task can hack the thinking system, haha.
@kingychiu - ad you added this as part of the Task?
@kingychiu - ad you added this as part of the Task?
Yes. In my task, I was asking the agent to output the result in markdown format with headers. Then to fix this issue, I had to ask the agent to NOT change the format of some keywords.
I found another cause of this bug. When the task prompt is too strong, it changes some important (internal) keywords, like
- Action:
- Thought:
- Action Input:
The agent will fail to parse the text.
For example, I have a task to ask the agent to use a markdown header to denote headers. It transformed
Action:
into**Action:**
To resolve this, I add the following to all of my task prompts:
These keywords must never be translated and transformed: - Action: - Thought: - Action Input: because they are part of the thinking process instead of the output.
@kingychiu could you actually please tell where exactly did you changed the above , In my case model is failed to follow the instructions so it never returns the output into that format which agent tool expects
I am searching the issues after having a problem in esentially the same spot and had a slightly different idea. This is what I am seeing:
> Entering new CrewAgentExecutor chain...
Thought: I need to find the latest news about Columbia President Minouche Shafik.
Action: Search the internet
Action Input: {'search_query': 'Columbia University President Minouche ShafikI apologize for the mistake. Let me retry with a correct Action Input.
Thought: I need to find the latest news about Columbia President Minouche Shafik.
Action: Search the internet
Action Input: {"search_query": "Minouche Shafik Columbia University^CTraceback (most recent call last):
what i think is happening around "Action Input" is that the quotes or the close curly brackets are getting escaped incorrectly by the less powerful (i.e. open sourced) models. In looking for places that might cause this sort of failure I found these three locations in the codebase that might be at issue:
https://github.com/joaomdmoura/crewAI/blob/main/src/crewai/tools/tool_output_parser.py
https://github.com/joaomdmoura/crewAI/blob/main/src/crewai/agents/parser.py
or more specifically here:
https://github.com/joaomdmoura/crewAI/blob/main/src/crewai/agents/parser.py#L43
an invalid usage of regex might be the cause of these many issues. ChatGPT suggested that this regex might work better:
regex = (
r"Action\s*\d*\s*:[\s]*(.*?)[\s]*Action\s*\d*\s*Input\s*\d*\s*:[\s]*(.*)"
)
I can see the regex is now updated to the suggested version. I am still seeing this behaviour with a custom tool to execute some code.
I printed text
and action_input
from parser.py and realised that like in the example above for some reason "}
was getting cut-off so I simply added this:
tool_input = tool_input.strip('"').strip('`') #also added this as my model (llama3) was adding them at times
tool_input += '"}'
Now it works, I don't know why!
num_ctx=16384
@danielgen could you post a code sample showing exactly how you are using this? i am using a 3090 as well. thx
@contractorwolf you tagged me by mistake, you meant to tag kyuumeitai based on your quote
num_ctx=16384
@danielgen could you post a code sample showing exactly how you are using this? i am using a 3090 as well. thx
Hey there, I did it in a modelfile with Ollama, in fact, I did it in the Open WebUI hehe, but you can do it with commands
In the end I give up on crewAI for my project, right now I'm using a combination of n8n and flowise in a more manual, not so magical way.
I found another cause of this bug. When the task prompt is too strong, it changes some important (internal) keywords, like
- Action:
- Thought:
- Action Input:
The agent will fail to parse the text. For example, I have a task to ask the agent to use a markdown header to denote headers. It transformed
Action:
into**Action:**
To resolve this, I add the following to all of my task prompts:These keywords must never be translated and transformed: - Action: - Thought: - Action Input: because they are part of the thinking process instead of the output.
@kingychiu could you actually please tell where exactly did you changed the above , In my case model is failed to follow the instructions so it never returns the output into that format which agent tool expects
Thanks @kingychiu , it worked perfectly for my sql agent with REACT type.
I found another cause of this bug. When the task prompt is too strong, it changes some important (internal) keywords, like
- Action:
- Thought:
- Action Input:
The agent will fail to parse the text. For example, I have a task to ask the agent to use a markdown header to denote headers. It transformed
Action:
into**Action:**
To resolve this, I add the following to all of my task prompts:These keywords must never be translated and transformed: - Action: - Thought: - Action Input: because they are part of the thinking process instead of the output.
@kingychiu could you actually please tell where exactly did you changed the above , In my case model is failed to follow the instructions so it never returns the output into that format which agent tool expects
Hi @jaideep11061982 ,
I faced the issue with my agent created using create_sql_agent.
` sql_agent = create_sql_agent( AGENT_MODEL, db=db, top_k=100,
# handle_parsing_errors=True,
agent_executor_kwargs=dict(handle_parsing_errors=True),
prefix=SQL_AGENT_PREFIX,
agent_type=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
verbose=True)
`
and I've modified the original SQL Prefix which was being used with some more instructions.
The original SQL Prefix is https://v01.api.js.langchain.com/variables/langchain_agents_toolkits_sql.SQL_PREFIX.html.
and regarding the solution provided by @kingychiu , I added
''' These keywords must never be translated and transformed:
to my SQL Prefix and the issue has been resolved.
My agent keep running into this error whenever I use any of the models locally (I tried llama2, openhermes, starling and Mistral). The only model that didn't run into this problem is Mistral.
Very often this error is followed by another error: "Error executing tool. Missing exact 3 pipe (|) separated values. For example,
coworker|task|information
."Whenever any of these two errors appeared, I wouldn't be able to get any valid output. I was experimenting with simple examples like internet scraping with DuckDuckGo and custom reddit scraping tool. Also, worth mentioning, I don't have these problems when I use openai.