majacinka commented 10 months ago

My agent keep running into this error whenever I use any of the models locally (I tried llama2, openhermes, starling and Mistral). The only model that didn't run into this problem is Mistral.

Very often this error is followed by another error: "Error executing tool. Missing exact 3 pipe (|) separated values. For example, coworker|task|information."

Whenever any of these two errors appeared, I wouldn't be able to get any valid output. I was experimenting with simple examples like internet scraping with DuckDuckGo and custom reddit scraping tool. Also, worth mentioning, I don't have these problems when I use openai.

ChaseRichardsonGit commented 10 months ago

I'm experiencing the same issues with a few models too. Mostly llama2 and openhermes. It appears the local model seems to "lose track" of what it's supposed to be doing and providing no action after the thought, but I could be wrong.

majacinka commented 10 months ago

I tried running a few 13B models - Llama 2 and Vicuna. I assumed that the bigger model = better results but that wasn't the case. I think that "losing track" is a right way to describe the issue. It looks like local model totally forgets about all the prompts and starts looping.

lxkaka commented 10 months ago

same problem with phi-2

Henry-Brinkman commented 10 months ago

I'm having the same problem with Mistral and Openhermes. CrewAI stops with the following output: Task output: Agent stopped due to iteration limit or time limit.

Henry-Brinkman commented 10 months ago

It seems to work with OpenChat though!

mntolia commented 10 months ago

Happening to me too. OpenAI's APIs are running fine but running with local Ollama models fails after a certain point with that exact error.

ChaseRichardsonGit commented 10 months ago

It seems to work with OpenChat though!

I tried the Stock Analysis example with OpenChat and it goes off the rails pretty quickly. Any suggestions?

badboysm890 commented 10 months ago

It seems to work with OpenChat though!

I tried the Stock Analysis example with OpenChat and it goes off the rails pretty quickly. Any suggestions?

I tried Mixtral and its the only one that does work with crewai bit consistent when compared to others, even having agents for just code generation with codellama failed so to my knowledge only 8X7b works 5/10

Henry-Brinkman commented 10 months ago

It seems to work with OpenChat though!

I tried the Stock Analysis example with OpenChat and it goes off the rails pretty quickly. Any suggestions?

Have you tried the new instragram_post example? That seems to work for me.

kyuumeitai commented 10 months ago

I've modified my model to handle num_ctx=16384 (running on a RTX 3090), no issues since.

ChaseRichardsonGit commented 10 months ago

It seems to work with OpenChat though!

I tried the Stock Analysis example with OpenChat and it goes off the rails pretty quickly. Any suggestions?

Have you tried the new instragram_post example? That seems to work for me.

I have not tried the instagram_post example as I have no use for it. I'm very interested in the stock analysis agent, though I still haven't had any success getting it to work well with a local model.

ChaseRichardsonGit commented 10 months ago

I've modified my model to handle num_ctx=16384 (running on a RTX 3090), no issues since.

Which model are you running?

majacinka commented 10 months ago

Hi everyone, thank you all for replying and sharing your experiences. I wanted to share my observations and maybe somebody might find them helpful and save up some time.

Over the last 10 days, I've experimented with 15 different models. My laptop has 16 GB RAM, my goal for my agents was to scrape data from a particular subreddit and to turn that data into simple, short newsletter written in layman words.

Of those 15 models, only 2 were able to accomplish the task: GPT4 and Llama 2 13B (base model).

Models I've played with that have failed were:

Gemini Pro
Mistral 7B
Mistral 7B instruct
phi-2
Open Chat 3.5 7B
Nous Hermes 7B
Open Hermes 2.5 7B
Starling 7B
Llama 2 13B chat
Llama 2 13B text
Llama 2 7B
Llama 2 7B text
Llama 2 7B chat

I have tried to tweak my prompts, I've played with modelfile by setting all kinds of parameters, but the only conclusion that I came up with is: more parameters = more reasoning.

The reason why agents failed is because they either:

didn't understand that they need to use the scraping tool and would instead use their training data to write the newsletter OR
they would scrape the data and instead of writing the newsletter, they would start reacting to the scraped data. e.g. if the scraped data mentions a new python library, agents would totally forget about the newsletter and would try to write a python script

I have one more theory but I can't test it due to insufficient RAM my laptop. I wonder if models with 7B p but context window of 16K tokens would be able to perform the task. In other words, would bigger context window = more reasoning?

wingchiutong commented 10 months ago

Hello, I have experienced the same issue with Openhermes before, but since I configured the temperture to 0.1, it works great.

I was having looping problem before as well, but with Gemini Pro, with temperature at 0.6, all issues gone.

joaomdmoura commented 9 months ago

Hey folks, finally catching up to this!

Indeed smaller models do struggle with certain tools, specially more complex ones, I think there is room for us to optimize the crew prompts a bit maybe, I'll look into that, but in the end of the day smaller models do struggle with cognition.

I'm collecting data to fine tune these models into agentic models that will be trained to behave more like agents, this should provide way more reliability in even small models.

I think a good next action here might be to mention the best models on our new docs, and doing some test on slightly changing the prompts for smaller models, I'll take a look at that, meanwhile I'm closing this one, but open to re-open if there are requests :)

ChaseRichardsonGit commented 9 months ago

I had success running a simple crew with one function. Benchmarks of the different models and if they worked with function calling is below. Hopefully, this helps someone! All testing was done using LM Studio as the API Server.

Model Benchmarks

kyuumeitai commented 9 months ago

I've modified my model to handle num_ctx=16384 (running on a RTX 3090), no issues since.

Which model are you running?

sorry, I didn't check my email bc vacations.

OpenHermes.

kingychiu commented 9 months ago

I found another cause of this bug. When the task prompt is too strong, it changes some important (internal) keywords, like

Action:
Thought:
Action Input:

The agent will fail to parse the text.

For example, I have a task to ask the agent to use a markdown header to denote headers. It transformed Action: into **Action:**

To resolve this, I add the following to all of my task prompts:

These keywords must never be translated and transformed:
- Action:
- Thought:
- Action Input:
because they are part of the thinking process instead of the output.

owaisafaq commented 9 months ago

@kingychiu that worked. I'm running TheBloke/dolphin-2.2.1-mistral-7B-GGUF on LMStudio.

jeanjerome commented 9 months ago

With the @kingychiu hack, I've got Error executing tool. Missing exact 3 pipe (|) separated values. I had to add Action Input should be formatted as coworker|task|context.

allow_delegation=True,
llm=Ollama(model="codellama:34b")

kingychiu commented 9 months ago

With the @kingychiu hack, I've got Error executing tool. Missing exact 3 pipe (|) separated values. I had to add Action Input should be formatted as coworker|task|context.
allow_delegation=True,
llm=Ollama(model="codellama:34b")

I think these are other internal keywords being overwritten? It is interesting and concerning that the task can hack the thinking system, haha.

AssetDK commented 9 months ago

@kingychiu - ad you added this as part of the Task?

kingychiu commented 8 months ago

@kingychiu - ad you added this as part of the Task?

Yes. In my task, I was asking the agent to output the result in markdown format with headers. Then to fix this issue, I had to ask the agent to NOT change the format of some keywords.

jaideep11061982 commented 8 months ago

I found another cause of this bug. When the task prompt is too strong, it changes some important (internal) keywords, like

Action:

Thought:

Action Input:

The agent will fail to parse the text.

For example, I have a task to ask the agent to use a markdown header to denote headers. It transformed Action: into **Action:**

To resolve this, I add the following to all of my task prompts:
These keywords must never be translated and transformed:
- Action:
- Thought:
- Action Input:
because they are part of the thinking process instead of the output.

@kingychiu could you actually please tell where exactly did you changed the above , In my case model is failed to follow the instructions so it never returns the output into that format which agent tool expects

contractorwolf commented 6 months ago

I am searching the issues after having a problem in esentially the same spot and had a slightly different idea. This is what I am seeing:


> Entering new CrewAgentExecutor chain...
Thought: I need to find the latest news about Columbia President Minouche Shafik.

Action: Search the internet

Action Input: {'search_query': 'Columbia University President Minouche ShafikI apologize for the mistake. Let me retry with a correct Action Input.

Thought: I need to find the latest news about Columbia President Minouche Shafik.
Action: Search the internet
Action Input: {"search_query": "Minouche Shafik Columbia University^CTraceback (most recent call last):

what i think is happening around "Action Input" is that the quotes or the close curly brackets are getting escaped incorrectly by the less powerful (i.e. open sourced) models. In looking for places that might cause this sort of failure I found these three locations in the codebase that might be at issue:

https://github.com/joaomdmoura/crewAI/blob/main/src/crewai/tools/tool_output_parser.py

https://github.com/joaomdmoura/crewAI/blob/main/src/crewai/agents/parser.py

or more specifically here:

https://github.com/joaomdmoura/crewAI/blob/main/src/crewai/agents/parser.py#L43

an invalid usage of regex might be the cause of these many issues. ChatGPT suggested that this regex might work better:

regex = (
    r"Action\s*\d*\s*:[\s]*(.*?)[\s]*Action\s*\d*\s*Input\s*\d*\s*:[\s]*(.*)"
)

danielgen commented 6 months ago

I can see the regex is now updated to the suggested version. I am still seeing this behaviour with a custom tool to execute some code.

I printed text and action_input from parser.py and realised that like in the example above for some reason "} was getting cut-off so I simply added this:

tool_input = tool_input.strip('"').strip('`') #also added this as my model (llama3) was adding them at times
tool_input += '"}'

Now it works, I don't know why!

contractorwolf commented 6 months ago

num_ctx=16384

@danielgen could you post a code sample showing exactly how you are using this? i am using a 3090 as well. thx

danielgen commented 6 months ago

@contractorwolf you tagged me by mistake, you meant to tag kyuumeitai based on your quote

kyuumeitai commented 6 months ago

num_ctx=16384

@danielgen could you post a code sample showing exactly how you are using this? i am using a 3090 as well. thx

Hey there, I did it in a modelfile with Ollama, in fact, I did it in the Open WebUI hehe, but you can do it with commands

In the end I give up on crewAI for my project, right now I'm using a combination of n8n and flowise in a more manual, not so magical way.

Arslan-Mehmood1 commented 2 months ago

I found another cause of this bug. When the task prompt is too strong, it changes some important (internal) keywords, like

Action:

Thought:

Action Input:

The agent will fail to parse the text. For example, I have a task to ask the agent to use a markdown header to denote headers. It transformed Action: into **Action:** To resolve this, I add the following to all of my task prompts:
These keywords must never be translated and transformed:
- Action:
- Thought:
- Action Input:
because they are part of the thinking process instead of the output.
@kingychiu could you actually please tell where exactly did you changed the above , In my case model is failed to follow the instructions so it never returns the output into that format which agent tool expects

Thanks @kingychiu , it worked perfectly for my sql agent with REACT type.

Arslan-Mehmood1 commented 2 months ago

I found another cause of this bug. When the task prompt is too strong, it changes some important (internal) keywords, like

Action:

Thought:

Action Input:

The agent will fail to parse the text. For example, I have a task to ask the agent to use a markdown header to denote headers. It transformed Action: into **Action:** To resolve this, I add the following to all of my task prompts:
These keywords must never be translated and transformed:
- Action:
- Thought:
- Action Input:
because they are part of the thinking process instead of the output.
@kingychiu could you actually please tell where exactly did you changed the above , In my case model is failed to follow the instructions so it never returns the output into that format which agent tool expects

Hi @jaideep11061982 ,

I faced the issue with my agent created using create_sql_agent.

` sql_agent = create_sql_agent( AGENT_MODEL, db=db, top_k=100,

handling_errors=True,

# handle_parsing_errors=True,
agent_executor_kwargs=dict(handle_parsing_errors=True),
prefix=SQL_AGENT_PREFIX,
agent_type=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
verbose=True)

`

and I've modified the original SQL Prefix which was being used with some more instructions.

The original SQL Prefix is https://v01.api.js.langchain.com/variables/langchain_agents_toolkits_sql.SQL_PREFIX.html.

and regarding the solution provided by @kingychiu , I added

''' These keywords must never be translated and transformed:

Action:
Thought:
Action Input: because they are part of the thinking process instead of the output. ''''

to my SQL Prefix and the issue has been resolved.

crewAIInc / crewAI

Invalid Format: Missing 'Action:' after 'Thought #103

handling_errors=True,