Open TheMemeticist opened 2 weeks ago
Thank you for bringing this up!
To address the rate-limiting issue, we could implement a queue handler that retries the last failed request after the rate limit cools down. Using the rate-limit headers provided by the API can help in setting up a structured retry mechanism. Here’s a brief outline of how it would work:
1- Rate-Limit Information: The following headers provide useful details for managing retries:
2- Queue Handler Logic: The handler would queue any failed requests due to rate limits and check these headers to determine when to retry. Specifically, it could use the retry-after value as a timer to trigger the next attempt, ensuring no additional rate-limit errors occur.
This setup would allow OpenHands to automatically reattempt failed requests without manual intervention, improving stability when handling bursts of requests.
Rate limits are a problem some workaround should be made for this imo.
Openhands has user configurable retries for rate limits. Please take a look at config.template.toml
file, the relevant settings are in the [llm]
section:
# Number of retries to attempt when an operation fails with the LLM.
# Increase this value to allow more attempts before giving up
#num_retries = 8
# Maximum wait time (in seconds) between retry attempts
# This caps the exponential backoff to prevent excessively long
#retry_max_wait = 120
# Minimum wait time (in seconds) between retry attempts
# This sets the initial delay before the first retry
#retry_min_wait = 15
# Multiplier for exponential backoff calculation
# The wait time increases by this factor after each failed attempt
# A value of 2.0 means each retry waits twice as long as the previous one
#retry_multiplier = 2.0
You can customize them in the config.toml
file, or, if you're running with the docker app, you can add them with -e
and the corresponding env var (uppercase, and with the LLM_ prefix). For example, -e LLM_RETRY_MIN_WAIT=20
.
What we don't currently do, is read the API headers and adapt to them. We just do what the user configures there. Personally I had to make the values more lenient for Anthropic...
I think you're right we should, and we have a PR on it, but we haven't got it ready yet. 😅
Feature Request: User-Side Rate Limiter for Rate Limit Management in API Requests
Feature Summary: Implement a user-side rate limiter to manage and control the frequency of requests sent to the Anthropic API. This feature would help prevent rate limit errors (e.g.,
litellm.RateLimitError: AnthropicException - {"type":"rate_limit_error"}
) by dynamically adjusting the rate of requests based on the current usage and limit thresholds provided in the API response headers.Problem Statement: Currently, the application may encounter rate limit errors when the number of request tokens exceeds the daily limit set by Anthropic. This results in unexpected interruptions to agent functionality, causing users to experience downtime and delays. Users are unable to continue their workflows smoothly and are often unaware of their current usage until the error occurs.
Proposed Solution: The solution involves implementing a user-side rate limiter that will:
Feature Details:
Benefits:
Potential Challenges: