anthropics / anthropic-sdk-python

MIT License
1.48k stars 172 forks source link

rate_limit_error (429) on paid account for only a dozen async requests #496

Closed baogorek closed 5 months ago

baogorek commented 6 months ago

This Async example below works if you cut the number of messages down to, say 4. I've had it work with 6, even. But with 12 async calls to the haiku model, I get the 429 error:

RateLimitError: Error code: 429 - {'type': 'error', 'error': {'type': 'rate_limit_error', 'message': 'Number of concurrent connections has exceeded your rate limit. Please try again later or contact sales at https://www.anthropic.com/contact-sales to discuss your options for a rate limit increase.'}}

I'm on a paid plan and know I have at least 1,000 calls per minute. (It might even be 4,000).

This has been replicated with another user's account. I am not especially good at async programming, but I believe I should be able to send out as many queries as my limit.

from anthropic import AsyncAnthropic
import asyncio

anthropicAsync = AsyncAnthropic(
    api_key=api_key,  # your API key
)

async def send_message(content):
    response = await anthropicAsync.messages.create(
        model="claude-3-haiku-20240307",
        max_tokens=300,
        messages=[{"role": "user", "content": content}]
    )
    return response

async def runAsyncLLM():
    message1 = "How does a court case get to the Supreme Court?"
    message2 = "What is the role of a Supreme Court justice?"

    responses = await asyncio.gather(
        send_message(message1),
        send_message(message2),
        send_message(message2),
        send_message(message2),
        send_message(message2),
        send_message(message2),
        send_message(message1),
        send_message(message2),
        send_message(message2),
        send_message(message2),
        send_message(message2),
        send_message(message2)

    )
    return responses

responses = asyncio.run(runAsyncLLM())

print(responses[0].content[0].text.strip())
print("\n---------------\n")
print(responses[1].content[0].text.strip())
rattrayalex commented 5 months ago

Anthropic has many rate-limits in place, not just calls per minute. This is standard in APIs. This particular limit is specifically number of concurrent requests. You are correct that sending fewer of these at once will resolve the problem.

baogorek commented 5 months ago

Thanks for the clarification. Just note that this has failed with as few as 4 requests sent out asynchronously. Nothing like it happens with the OpenAI equivalent. I would suggest some sort of warning, somewhere.