BerriAI / litellm

Python SDK, Proxy Server to call 100+ LLM APIs using the OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]
https://docs.litellm.ai/docs/
Other
12.31k stars 1.43k forks source link

[Feature]: 429 Too Many Request #2707

Closed UncleBob2 closed 4 months ago

UncleBob2 commented 5 months ago

The Feature

I am currently using Claude V3 with AutoGen Studio. Is there a way to slow down the request?

5 Requests per minute (RPM) 25,000 Tokens per minute (TPM) 300,000 Tokens per day (TPD)

Motivation, pitch

INFO: 127.0.0.1:59331 - "POST /chat/completions HTTP/1.1" 429 Too Many Requests Traceback (most recent call last): File "/Users/autoGenStudio/.pyenv/versions/3.11-dev/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/litellm/main.py", line 1164, in completion response = anthropic.completion( ^^^^^^^^^^^^^^^^^^^^^ File "/Users/autoGenStudio/.pyenv/versions/3.11-dev/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/litellm/llms/anthropic.py", line 213, in completion raise AnthropicError( litellm.llms.anthropic.AnthropicError: {"type":"error","error":{"type":"rate_limit_error","message":"Number of request tokens has exceeded your rate limit (https://docs.anthropic.com/claude/reference/rate-limits). Please reduce the the prompt length or the maximum tokens requested, or try again later. You may also contact sales at https://www.anthropic.com/contact-sales to discuss your options for a rate limit increase."}}

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/Users/autoGenStudio/.pyenv/versions/3.11-dev/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/litellm/main.py", line 319, in acompletion response = await loop.run_in_executor(None, func_with_context) # type: ignore ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/autoGenStudio/.pyenv/versions/3.11-dev/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, *self.kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/autoGenStudio/.pyenv/versions/3.11-dev/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/litellm/utils.py", line 2770, in wrapper raise e File "/Users/autoGenStudio/.pyenv/versions/3.11-dev/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/litellm/utils.py", line 2667, in wrapper result = original_function(args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/autoGenStudio/.pyenv/versions/3.11-dev/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/litellm/main.py", line 2094, in completion raise exception_type( ^^^^^^^^^^^^^^^ File "/Users/autoGenStudio/.pyenv/versions/3.11-dev/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/litellm/utils.py", line 8257, in exception_type raise e File "/Users/autoGenStudio/.pyenv/versions/3.11-dev/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/litellm/utils.py", line 7186, in exception_type raise RateLimitError( litellm.exceptions.RateLimitError: AnthropicException - {"type":"error","error":{"type":"rate_limit_error","message":"Number of request tokens has exceeded your rate limit (https://docs.anthropic.com/claude/reference/rate-limits). Please reduce the the prompt length or the maximum tokens requested, or try again later. You may also contact sales at https://www.anthropic.com/contact-sales to discuss your options for a rate limit increase."}}

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/Users/autoGenStudio/.pyenv/versions/3.11-dev/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/litellm/proxy/proxy_server.py", line 3191, in chat_completion responses = await asyncio.gather( ^^^^^^^^^^^^^^^^^^^^^ File "/Users/autoGenStudio/.pyenv/versions/3.11-dev/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/litellm/utils.py", line 3230, in wrapper_async raise e File "/Users/autoGenStudio/.pyenv/versions/3.11-dev/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/litellm/utils.py", line 3062, in wrapper_async result = await original_function(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/autoGenStudio/.pyenv/versions/3.11-dev/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/litellm/main.py", line 327, in acompletion raise exception_type( ^^^^^^^^^^^^^^^ File "/Users/autoGenStudio/.pyenv/versions/3.11-dev/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/litellm/utils.py", line 8257, in exception_type raise e File "/Users/autoGenStudio/.pyenv/versions/3.11-dev/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/litellm/utils.py", line 7186, in exception_type raise RateLimitError( litellm.exceptions.RateLimitError: AnthropicException - AnthropicException - {"type":"error","error":{"type":"rate_limit_error","message":"Number of request tokens has exceeded your rate limit (https://docs.anthropic.com/claude/reference/rate-limits). Please reduce the the prompt length or the maximum tokens requested, or try again later. You may also contact sales at https://www.anthropic.com/contact-sales to discuss your options for a rate limit increase."}} INFO: 127.0.0.1:59331 - "POST /chat/completions HTTP/1.1" 429 Too Many Requests

Twitter / LinkedIn details

No response

thugbobby commented 5 months ago

same question,did you solve?

UncleBob2 commented 5 months ago

same question,did you solve?

it works intermittently if I keep my messages shorter. I have purchased additional Anthropic credit; however, I believe that this can be solved with throttling.

krrishdholakia commented 5 months ago

what would you want to happen here? @UncleBob2 @thugbobby

UncleBob2 commented 5 months ago

what would you want to happen here? @UncleBob2 @thugbobby

@krrishdholakia It is a simple solution - add a delay of 30 seconds each time when sending in messages to anthropic api.

Manouchehri commented 5 months ago

@UncleBob2: Which Claude model(s) are you using?

If you're using claude-3-sonnet-20240229 or claude-3-haiku-20240307, you can load balance across 4 different AWS Bedrock regions, 2 Vertex AI regions, and then Anthropic themselves, for a total of 7 different providers.

For claude-3-opus-20240229, you're currently limited to using just Anthropic and Vertex AI (so 2 in total).

UncleBob2 commented 4 months ago

@Manouchehri I am using claude-3-haiku because it is cheaper and I manually edited the litellm code to add a delay. You may consider giving users the same option.