Open ruuda opened 4 years ago
I like it! As you mentioned, we run into this often.
This is great idea! There is another case where we need to do retries, but where Opnieuw is not well suited. We occasionally need to wait for a file to be available a destination URL. This could take as much as 20 minutes and results in 404
until the resource has arrived. We could use a similar mechanic for that as we have for rate limits. We could call the exception to sleep on something like BackoffAndRetryException
.
In terms of implementation I think differences between these two are pretty subtle:
@throttle
@retry(retry_on_exceptions=CONNECTION_ERRORS_AND_INTERNAL_SERVER_ERRORS)
def call_external_service() -> Result:
result = do_external_thing()
if result.is_rate_limit():
raise RateLimitError(sleep_seconds=result.suggested_sleep_time_seconds)
@retry(retry_on_exceptions=CONNECTION_ERRORS_AND_INTERNAL_SERVER_ERRORS)
@throttle
def call_external_service() -> Result:
result = do_external_thing()
if result.is_rate_limit():
raise RateLimitError(sleep_seconds=result.suggested_sleep_time_seconds)
Perhaps it makes sense to expose the behavior for both retries and sleeps in a single decorator as well. That way end users don't have to think about decorator ordering.
In terms of implementation I think differences between these two are pretty subtle:
That’s a good point! You definitely want the throttle to go on the outside, otherwise the retry window for connection errors expires quickly. We could have a combined decorator that internally uses @retry
for the inner retries and then applies the rate limiting on top.
To me this issue has a pretty clear solution direction, except for:
I think it would even be possible to give the decorator some local state, so it can automatically measure success rate across multiple calls to call_external_service.
@ruuda What would you want it to be able to measure? Also, how should it report these measurements? Do we want it to simply log messages about these statistics or do we define some protocol so we can add callbacks that write things to the database?
So it “learns” to throttle itself to avoid rate limits, even when the exact way limiting works is a black box. I think it could be useful, but this might be too much magic, maybe it is better to have some kind of limiter class that can be used in conjunction with a retry decorator to estimate a delay, rather than building it into the decorator itself.
Oh that is a very cool approach to learning undocumented rate limits! We could definitely use something like as general case for endpoints where we run into occasional rate limit errors. It is curious your example fails here:
Call 9 succeeded
Observed success rate: 1.0 Hz.
Sleeping 1.0 s to avoid rate limit.
Call 10 failed: Too soon, only one call per second allowed.
I think two cases where we see a lot of problems now are not covered very well with this approach:
It is curious your example fails here
I think it’s because I print the values rounded to one decimal, but the actual time it slept is slightly below 1s.
I think two cases where we see a lot of problems now are not covered very well with this approach
Yeah, after thinking about it some more this definitely should not go in the decorator, at best it can be a module to help determine how long to wait, but in almost all of our use cases we need to have special handling for the particular way of rate limiting anyway.
I was writing an example for our new Delta API, and 1/3 of the code is related to slowing down on rate limiting, which clutters the example a lot. I will extract that part and open a pull request here.
In places where we use
@retry
, we often also use it to retry on rate limit errors, usually by detecting the rate limiting and raising aRetryException
. This mostly works, but it is suboptimal for a few reasons:I think Opnieuw would be a good place to add such functionality. Then we could decorate our calls like:
So a new decorator (I called it
@throttle
here, better names are welcome) would take care of rate limiting, and it would trigger onRateLimitError
.RateLimitError
could communicate any information about when the limit will reset back to the decorator.I think it would even be possible to give the decorator some local state, so it can automatically measure success rate across multiple calls to
call_external_service
.@jochemb @wesleybowman what do you think?