channable / opnieuw

One weird trick to make your code more reliable
https://tech.channable.com/posts/2020-02-05-opnieuw.html
BSD 3-Clause "New" or "Revised" License
289 stars 11 forks source link

Add special support for rate limiting #10

Open ruuda opened 4 years ago

ruuda commented 4 years ago

In places where we use @retry, we often also use it to retry on rate limit errors, usually by detecting the rate limiting and raising a RetryException. This mostly works, but it is suboptimal for a few reasons:

I think Opnieuw would be a good place to add such functionality. Then we could decorate our calls like:

@throttle
@retry(retry_on_exceptions=CONNECTION_ERRORS_AND_INTERNAL_SERVER_ERRORS)
def call_external_service() -> Result:
    result = do_external_thing()
    if result.is_rate_limit():
        raise RateLimitError(sleep_seconds=result.suggested_sleep_time_seconds)

So a new decorator (I called it @throttle here, better names are welcome) would take care of rate limiting, and it would trigger on RateLimitError.

I think it would even be possible to give the decorator some local state, so it can automatically measure success rate across multiple calls to call_external_service.

@jochemb @wesleybowman what do you think?

wesleybowman commented 4 years ago

I like it! As you mentioned, we run into this often.

jochemb commented 4 years ago

This is great idea! There is another case where we need to do retries, but where Opnieuw is not well suited. We occasionally need to wait for a file to be available a destination URL. This could take as much as 20 minutes and results in 404 until the resource has arrived. We could use a similar mechanic for that as we have for rate limits. We could call the exception to sleep on something like BackoffAndRetryException.

In terms of implementation I think differences between these two are pretty subtle:

@throttle
@retry(retry_on_exceptions=CONNECTION_ERRORS_AND_INTERNAL_SERVER_ERRORS)
def call_external_service() -> Result:
    result = do_external_thing()
    if result.is_rate_limit():
        raise RateLimitError(sleep_seconds=result.suggested_sleep_time_seconds)

@retry(retry_on_exceptions=CONNECTION_ERRORS_AND_INTERNAL_SERVER_ERRORS)
@throttle
def call_external_service() -> Result:
    result = do_external_thing()
    if result.is_rate_limit():
        raise RateLimitError(sleep_seconds=result.suggested_sleep_time_seconds)

Perhaps it makes sense to expose the behavior for both retries and sleeps in a single decorator as well. That way end users don't have to think about decorator ordering.

ruuda commented 4 years ago

In terms of implementation I think differences between these two are pretty subtle:

That’s a good point! You definitely want the throttle to go on the outside, otherwise the retry window for connection errors expires quickly. We could have a combined decorator that internally uses @retry for the inner retries and then applies the rate limiting on top.

jochemb commented 4 years ago

To me this issue has a pretty clear solution direction, except for:

I think it would even be possible to give the decorator some local state, so it can automatically measure success rate across multiple calls to call_external_service.

@ruuda What would you want it to be able to measure? Also, how should it report these measurements? Do we want it to simply log messages about these statistics or do we define some protocol so we can add callbacks that write things to the database?

ruuda commented 4 years ago
This is very roughly what I had in mind ```python import time from typing import Callable, List, NamedTuple F = Callable[[int], int] class Call(NamedTuple): begin_second: float success: bool def estimate_delay_seconds(calls: List[Call]) -> float: """You can go very deep statistically on this, but this will do for the proof of concept.""" delay_seconds_good: List[float] = [] delay_seconds_bad: List[float] = [0.0] last_success_second = calls[0].begin_second for call in calls[1:]: if call.success: delay_seconds_good.append(call.begin_second - last_success_second) last_success_second = call.begin_second else: delay_seconds_bad.append(call.begin_second - last_success_second) if len(delay_seconds_good) > 1: return 0.5 * max(delay_seconds_bad) + 0.5 * min(delay_seconds_good) else: return 0.1 + max(delay_seconds_bad) * 2.0 def throttle(f: F) -> F: calls: List[Call] = [] def apply(x: int) -> int: if len(calls) >= 2: successes = [call for call in calls if call.success] last_success_second = successes[-1].begin_second if len(successes) > 0 else calls[0].begin_second delay_seconds = estimate_delay_seconds(calls) next_call_second = last_success_second + delay_seconds sleep_seconds = max(0.0, next_call_second - time.monotonic()) if sleep_seconds > 0.0: print(f'Observed success rate: {1.0 / delay_seconds:.1f} Hz.') print(f'Sleeping {sleep_seconds:.1f} s to avoid rate limit.') time.sleep(sleep_seconds) begin_second = time.monotonic() try: result = f(x) calls.append(Call(begin_second, success=True)) return result except: calls.append(Call(begin_second, success=False)) raise return apply last_call_second: float = time.monotonic() @throttle def succ_limited(x: int) -> int: global last_call_second now_second = time.monotonic() if now_second - last_call_second > 1.0: last_call_second = now_second return x + 1 else: raise Exception('Too soon, only one call per second allowed.') for i in range(20): try: succ_limited(i) print(f'Call {i} succeeded') except Exception as exc: print(f'Call {i} failed: {exc}') ``` Output ``` Call 0 failed: Too soon, only one call per second allowed. Call 1 failed: Too soon, only one call per second allowed. Observed success rate: 10.0 Hz. Sleeping 0.1 s to avoid rate limit. Call 2 failed: Too soon, only one call per second allowed. Observed success rate: 3.3 Hz. Sleeping 0.2 s to avoid rate limit. Call 3 failed: Too soon, only one call per second allowed. Observed success rate: 1.4 Hz. Sleeping 0.4 s to avoid rate limit. Call 4 failed: Too soon, only one call per second allowed. Observed success rate: 0.7 Hz. Sleeping 0.8 s to avoid rate limit. Call 5 succeeded Observed success rate: 0.7 Hz. Sleeping 1.5 s to avoid rate limit. Call 6 succeeded Observed success rate: 0.9 Hz. Sleeping 1.1 s to avoid rate limit. Call 7 succeeded Observed success rate: 1.1 Hz. Sleeping 0.9 s to avoid rate limit. Call 8 failed: Too soon, only one call per second allowed. Observed success rate: 1.0 Hz. Sleeping 0.1 s to avoid rate limit. Call 9 succeeded Observed success rate: 1.0 Hz. Sleeping 1.0 s to avoid rate limit. Call 10 failed: Too soon, only one call per second allowed. Observed success rate: 1.0 Hz. Sleeping 0.0 s to avoid rate limit. Call 11 failed: Too soon, only one call per second allowed. Observed success rate: 1.0 Hz. Sleeping 0.0 s to avoid rate limit. Call 12 failed: Too soon, only one call per second allowed. Observed success rate: 1.0 Hz. Sleeping 0.0 s to avoid rate limit. Call 13 failed: Too soon, only one call per second allowed. Observed success rate: 1.0 Hz. Sleeping 0.0 s to avoid rate limit. Call 14 succeeded Observed success rate: 1.0 Hz. Sleeping 1.0 s to avoid rate limit. Call 15 succeeded Observed success rate: 1.0 Hz. Sleeping 1.0 s to avoid rate limit. Call 16 succeeded Observed success rate: 1.0 Hz. Sleeping 1.0 s to avoid rate limit. Call 17 succeeded Observed success rate: 1.0 Hz. Sleeping 1.0 s to avoid rate limit. Call 18 succeeded Observed success rate: 1.0 Hz. Sleeping 1.0 s to avoid rate limit. Call 19 succeeded ```

So it “learns” to throttle itself to avoid rate limits, even when the exact way limiting works is a black box. I think it could be useful, but this might be too much magic, maybe it is better to have some kind of limiter class that can be used in conjunction with a retry decorator to estimate a delay, rather than building it into the decorator itself.

jochemb commented 4 years ago

Oh that is a very cool approach to learning undocumented rate limits! We could definitely use something like as general case for endpoints where we run into occasional rate limit errors. It is curious your example fails here:

Call 9 succeeded
Observed success rate: 1.0 Hz.
Sleeping 1.0 s to avoid rate limit.
Call 10 failed: Too soon, only one call per second allowed.

I think two cases where we see a lot of problems now are not covered very well with this approach:

ruuda commented 4 years ago

It is curious your example fails here

I think it’s because I print the values rounded to one decimal, but the actual time it slept is slightly below 1s.

I think two cases where we see a lot of problems now are not covered very well with this approach

Yeah, after thinking about it some more this definitely should not go in the decorator, at best it can be a module to help determine how long to wait, but in almost all of our use cases we need to have special handling for the particular way of rate limiting anyway.

ruuda commented 4 years ago

I was writing an example for our new Delta API, and 1/3 of the code is related to slowing down on rate limiting, which clutters the example a lot. I will extract that part and open a pull request here.