Crew fails to deliver same output as chatgpt given the same prompt and model

When running my crew I do not get the results I expect, despite having written clear prompts with examples, chain-of-thought and clarity in task and context and expected output. As an experiment, I asked chatgpt with gpt-4o to perform the same task, pasting in the prompt and input data (approximately 7 500 words) to work in manually. Surprisingly, the output was much better. When I pasted next task into chatgpt asking it to continue working on the previous output, it followed my instructions and delivered my expected results.

My overall task is to extract questions and answers from a text and then rewrite the questions and answers to clarify them. So I have one agent to just extract q&a from the input text (~7000 words) and one agent to improve the formulation of the q&a (~4000 words).

When analyzing the output from my agents in the terminal, seems as if the first agent and task's output is not quite up to expected standard. Multiple q&a are missed as if the agent does no go through the entire text. The subsequent task does not make any changes to the q&a, as if it did not follow its instruction at all. According to the documentation, in the sequential process:, task execution follows the predefined order in the task list, with the output of one task serving as context for the next.

At first I believed it was an issue with the prompt engineering, however, since chatgpt managed to follow my instructions I am not convinced that it is the prompt that poorly written. Are there any parameter settings I might be missing here? Now I am using default settings for max_tokens etc.

I am using a sequential process where I have 2 tasks and 2 agents. Each agent is a specialist for each task. I am using gpt-4o as model. I am running: crewai 0.41.1

crewAIInc / crewAI

Crew fails to deliver same output as chatgpt given the same prompt and model #1024