edgi-govdata-archiving / wayback

A Python API to the Internet Archive Wayback Machine
https://wayback.readthedocs.io/en/stable/
BSD 3-Clause "New" or "Revised" License
61 stars 12 forks source link

Winter cleaning for rate limiting #147

Closed Mr0grog closed 8 months ago

Mr0grog commented 9 months ago

This significantly refactors how we manage rate limiting with the goal of making it more clear, flexible, and accurate going forward. It encompasses a few major changes:

Fixes #137.

Some of this also hopefully paves the way for an even bigger refactor of WaybackSession in #58.

Mr0grog commented 9 months ago

Most of the work here has been done. Rate limits are now stored internally as a dict indexed by URL path prefix (so they are pretty flexible). I need to decide whether to update the public API to take rate limits set that way as well — see discussion in https://github.com/edgi-govdata-archiving/wayback/issues/137#issuecomment-1845935565:

WaybackSession(rate_limits={
    '/web/timemap': _utils.RateLimit.make_limit(timemap_calls_per_second),
    '/cdx': _utils.RateLimit.make_limit(search_calls_per_second),
    '/': _utils.RateLimit.make_limit(memento_calls_per_second),
})