crewAIInc / crewAI

Framework for orchestrating role-playing, autonomous AI agents. By fostering collaborative intelligence, CrewAI empowers agents to work together seamlessly, tackling complex tasks.
https://crewai.com
MIT License
20.91k stars 2.9k forks source link

OpenAI Rate Limit Error #32

Closed stoltzmaniac closed 9 months ago

stoltzmaniac commented 10 months ago

For those with limited OpenAI access, the rate limit is hit in the "stock_analysis" example at: https://github.com/joaomdmoura/crewAI-examples?tab=readme-ov-file

*Error: *

Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for gpt-4 in organization org-XXXX on tokens_usage_based per min: Limit 10000, Used 7760, Requested 2270. Please try again in 180ms. Visit https://platform.openai.com/account/rate-limits to learn more..
Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for gpt-4 in organization org-XXXX on tokens_usage_based per min: Limit 10000, Used 8208, Requested 2276. Please try again in 2.904s. Visit https://platform.openai.com/account/rate-limits to learn more..
Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for gpt-4 in organization org-XXXX on tokens_usage_based per min: Limit 10000, Used 8137, Requested 2500. Please try again in 3.822s. Visit https://platform.openai.com/account/rate-limits to learn more..
Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for gpt-4 in organization org-XXXX on tokens_usage_based per min: Limit 10000, Used 8440, Requested 2281. Please try again in 4.326s. Visit https://platform.openai.com/account/rate-limits to learn more..

Is there a good place within crewAI to add handling for this? If so, please link to the class/function and I will give it a shot. Thanks!

joaomdmoura commented 10 months ago

There is no clear way of doing it now, it has the default retry logic provided by langchain, but maybe we could add something to support that, what would be the ideal behavior, read the retry instruction, wait for that and then pick it up again? Also what version of crewAI are you using, the new v0.1.14 should use the newest version of the openAI lib, idk if they have some treatment around that themselves, but they might.

sokoow commented 10 months ago

One simple way to deal with this would be to add 5-10 second wait somewhere random in the list of tasks that crewai do, this'll hold off till openai allows a bigger limit. Pro-way to deal with this would be to catch that exception RateLimitError somehow and do exponential backoff, but as you wrote, this is langchain logic, so might be harder. I'm just also wondering if langchain params have anything that might be tweaked.

sokoow commented 10 months ago

@joaomdmoura can you see if you could use LimitAwaitChatOpenAI method from https://github.com/alex4321/langchain-openai-limiter? would that make sense?

joaomdmoura commented 10 months ago

One simple way to deal with this would be to add 5-10 second wait somewhere random in the list of tasks that crewai do, this'll hold off till openai allows a bigger limit. Pro-way to deal with this would be to catch that exception RateLimitError somehow and do exponential backoff, but as you wrote, this is langchain logic, so might be harder. I'm just also wondering if langchain params have anything that might be tweaked.

My only problem with this is that it would impact everyone, even people that don't incur on rate limiting but could be an option. Oh I'll check the project you share def looks interesting, I was assuming that was something langchain would have by default but we could mimic that yeah

paixaop commented 10 months ago

One suggestion would be to parse the error message itself and extract how long OpenAI is telling the user to wait. Example error message:

Rate limit reached for gpt-4 in organization org-XXXXXXX on tokens_usage_based per min: Limit 10000, Used 6256, Requested 4254. Please try again in 3.06s. Visit https://platform.openai.com/account/rate-limits to learn more.

So the next request should come after 3.06 seconds in this case

sokoow commented 10 months ago

@paixaop I think that the library https://github.com/alex4321/langchain-openai-limiter and methods like LimitAwaitOpenAIEmbeddings do exactly what you're describing

sudo-install-MW commented 10 months ago

Anything work around to for the rate limiter? I am not able to run any of the examples to try this framework out :(

sartian commented 10 months ago

@paixaop I think that the library https://github.com/alex4321/langchain-openai-limiter and methods like LimitAwaitOpenAIEmbeddings do exactly what you're describing

I don't have time to do a deep dive into this, but it looks like this line in crewai is the default OpenAPI factory

https://github.com/joaomdmoura/crewAI/blob/f102c2e7dd40e5f035baa7c2cddf9f4b84413fcf/crewai/agent.py#L56

    llm: Optional[Any] = Field(
       default_factory=lambda: ChatOpenAI(
           temperature=0.7,
           model_name="gpt-4",
       ),
       description="Language model that will run the agent.",
   ) 

If you go back to their docs for agent setup they have this example:

# Define your agents with roles and goals
researcher = Agent(
  role='Senior Research Analyst',
  goal='Uncover cutting-edge developments in AI and data science in',
  backstory="""You work at a leading tech think tank.
  Your expertise lies in identifying emerging trends.
  You have a knack for dissecting complex data and presenting
  actionable insights.""",
  verbose=True,
  allow_delegation=False,
  tools=[search_tool]
  # You can pass an optional llm attribute specifying what mode you wanna use.
  # It can be a local model through Ollama / LM Studio or a remote
  # model like OpenAI, Mistral, Antrophic of others (https://python.langchain.com/docs/integrations/llms/)
  #
  # Examples:
  # llm=ollama_llm # was defined above in the file
  # llm=ChatOpenAI(model_name="gpt-3.5", temperature=0.7)
)

is this example they show how to manually set llm. To mirror what they are doing internally we could do:

llm=ChatOpenAI(model_name="gpt-4", temperature=0.7)

so, if I understand code correctly, you might be able to wrap the created agent? I've not time to test this but maybe something like this could work...

llm=LimitAwaitChatOpenAI(
  chat_openai=ChatOpenAI(model_name="gpt-4", temperature=0.7),
  limit_await_timeout=60.0,
  limit_await_sleep=0.1,
)

I'll give it a try later to see if it works, if not maybe this will get some folks closer... -MM

mindwellsolutions commented 10 months ago

@sartian That would be amazing if limit_await_timeout=60 works. I need to be able to set a wait timeout, but I haven't hit my limit yet to be able to test if it works or not. Did you get any results from your tests? Much thanks in advance

sokoow commented 10 months ago

@sartian I just tried, and it seems like the limiting package is broken in a way that it requires an older version of openapi package, opened them an issue there: https://github.com/alex4321/langchain-openai-limiter/issues/4

sokoow commented 9 months ago

I can confirm that this is working now

joaomdmoura commented 9 months ago

I need to add new docs around this but great to know its working