Closed Mr0grog closed 8 months ago
Most of the work here has been done. Rate limits are now stored internally as a dict indexed by URL path prefix (so they are pretty flexible). I need to decide whether to update the public API to take rate limits set that way as well — see discussion in https://github.com/edgi-govdata-archiving/wayback/issues/137#issuecomment-1845935565:
WaybackSession(rate_limits={
'/web/timemap': _utils.RateLimit.make_limit(timemap_calls_per_second),
'/cdx': _utils.RateLimit.make_limit(search_calls_per_second),
'/': _utils.RateLimit.make_limit(memento_calls_per_second),
})
This significantly refactors how we manage rate limiting with the goal of making it more clear, flexible, and accurate going forward. It encompasses a few major changes:
[x] Set limits based on actual values from Internet Archive sysadmins. We previously determined rate limits via a combination of ad-hoc conversations with Internet Archive staff, our own testing, and feedback from users. I’ve made special effort to reach out to Internet Archive staff and get explicit feedback about both the hard limits and their desired default behavior (80% of hard limits).
[x] Replace
rate_limited()
with aRateLimit
class. This allows separate sessions to share limits or operate with independent limits in a clearer way. Previously, multiple sessions with different limits would still interact in complicated ways. Limits previously acted more like “how long do I wait after the previous request (from any session, even if it had a different limit).” Now sessions with different limit objects operate completely independently, and sessions that need to share limits can share instances ofRateLimit
.The default limits are globally shared
RateLimit
objects, so multiple sessions without explicitly set limits will still coordinate their limiting together just like they used to.[x] Move rate limit application from the client to the session. We previously applied the rate limits from the client inside the
search()
andget_memento()
methods. Instead, we’ll apply them insideWaybackSession.send()
. This ensures that they are always applied (harder to make mistakes) and also fixes an issue where we did no limit following redirects in mementos. The rate limits also belong to the session, so the session is really the right place to apply them anyway.[x] Update release notes draft.
Fixes #137.
Some of this also hopefully paves the way for an even bigger refactor of
WaybackSession
in #58.