Rate Limiting - Feature Request - use other models for basic questions?

ggoosen commented 2 weeks ago

Any change you can build in better handleing for rate limiting example below. Error in tool response: Error code: 429 - {'type': 'error', 'error': {'type': 'rate_limit_error', 'message': 'Number of request tokens has exceeded your per-minute rate limit (https://docs.anthropic.com/en/api/rate-limits); see the response headers for current usage. Please reduce the prompt length or the maximum tokens requested, or try again later. You may also │ │ contact sales at https://www.anthropic.com/contact-sales to discuss your options for a rate limit increase.'}}

The tool can surely figure out the token parsed each time and then keep a running total of that and then slow its repsonses down. I used Claude to modify the script to do this and it worked well.

In addition, i prompted a question up front to get the daily usage and establish if it ran alraedy that way it can stop when it knows the limit has been reached.

One other idea, im not sure if the rate limits are model specific but maybe the script can use different models for differnt functions, like basic checks can go to Haiku, document updates can go to Opus etc.. That way we can get better usage out of Sonnet 3.5 with the daily limit?

SupercaliG commented 2 weeks ago

This is what RouteLLM integration could do. It has the ability to route general questions and planning to a local model while using claude for complex tasks. This is a great idea though. Rate limiting is kicking our asses.

ggoosen commented 2 weeks ago

This is what RouteLLM integration could do. It has the ability to route general questions and planning to a local model while using claude for complex tasks. This is a great idea though. Rate limiting is kicking our asses.

just had a quick look at RouteLLM, that does seem to be a good fit with a lot of work done there already, although i suspect there would need to be a major re-write to levergae multiple LLM's as the code base is built to intepret claude.

On a side note, in testing today i see my rates have been moved up to 2.5m per day dont know if thats everyone or just my account. I also see there is some rudementry updates to implemt waiting for rate limiting. Although that needs to be extended to differentiate between per min rate limits and per day rate limits.

seoplanotx commented 2 weeks ago

I have run into the same issue. When our projects get complex, it's easy to send 2.5 million tokens to Sonnet in a day.

rate limits, etc: (https://docs.anthropic.com/en/api/rate-limits)

desy0305 commented 2 weeks ago

10 interactions and tokens are gone, some local RAG databse . this error 429 is poping up even when small chunks are in the query, seems all chronology is part of every prompt sent, my day starts with 700k tokens consumed without single prompt submited, "nice"

Doriandarko commented 2 weeks ago

Tomorrow you will be able to set the model you want for different actions

Doriandarko / claude-engineer

Rate Limiting - Feature Request - use other models for basic questions? #62