WaybackSession was originally a subclass of requests.Session, and that turns out to have been a poor decision (see https://github.com/edgi-govdata-archiving/wayback/pull/152#issuecomment-1858800671). We wanted to make customizing HTTP behavior relatively straightforward, but using a subclass-able version of requests.Session wound up baking problematic parts of requests into our API. In particular, requests is not thread-safe and this setup makes it hard to wrap with thread-safe code. We also unintentionally broke the promise of easy subclassing when we added complex retry and rate limiting behavior.
This is a major, breaking change designed to largely remove requests from our public API:
WaybackSession is now its own, standalone class instead of a subclass of requests.Session. It handles stuff like rate limiting, retries, etc.
WaybackHttpAdapter handles making a single HTTP request and is the interface through which we use the requests library.
It has a very small API (2 methods: .request() sends an HTTP request; .close() closes all its network connections).
The .request() method returns a WaybackHttpResponse object, which models just the parts of the response we need. The WaybackHttpResponse class is an abstract base class that represents the API that the rest of Wayback needs, and internally we have a WaybackRequestsResponse subclass that wraps a response object from Requests in a thread-safe way.
Ideally, the small API will make it easy for us to build an alternate implementation with urllib3, httpx, etc, or just to add thread-safe handling of requests. It might also be a better way to let users customize HTTP behavior than the old approach, although you can’t yet supply a custom subclass to WaybackClient or WaybackSession.
❗ Open question: I don’t think that WaybackSession should continue to exist. It makes the API complex and it lets the details of requests leak out a little bit through exception handling. Any feedback on this is welcome (cc @danielballan). Some ideas:
Move its functionality directly into WaybackHttpAdapter.
Upside: this gets all the requests stuff — including exception handling — tucked away into one place.
Downside: Safely customizing WaybackHttpAdapter gets more complex because you have to work around all that machinery. It’s not as straightforward as just overriding the .request() method. Some ways to address that:
Add a WaybackHttpAdapter._request() method (not the underscore) that you can override if you just want to customize the sending of a single HTTP request.
Re-architect that functionality into some kind of service/helper object that you can use from a customized WaybackHttpAdapter subclass. I think this might be a lot more work than it's worth, though.
Move its functionality into WaybackClient.
Upside: keeps all the things that are special about dealing with HTTP requests at a high level out of the lower-level adapter tooling. Keeps adapters as simple as possible.
Downside: Lets some exception handling that is specific to requests leak out into the rest of the package. Makes it harder to change that behavior. What if httpx or some other tool incorporates rate limiting in the future? It would be nice to let an adapter delegate those tasks to the lower-level tool it is acting as glue for.
I think I’m leaning towards some version of (1) here.
This paves the way towards fixing #58, but does not actually solve the issue (WaybackHttpAdapter is not yet thread-safe, and needs to use some of the tricks from #23).
WaybackSession
was originally a subclass ofrequests.Session
, and that turns out to have been a poor decision (see https://github.com/edgi-govdata-archiving/wayback/pull/152#issuecomment-1858800671). We wanted to make customizing HTTP behavior relatively straightforward, but using a subclass-able version ofrequests.Session
wound up baking problematic parts of requests into our API. In particular, requests is not thread-safe and this setup makes it hard to wrap with thread-safe code. We also unintentionally broke the promise of easy subclassing when we added complex retry and rate limiting behavior.This is a major, breaking change designed to largely remove requests from our public API:
WaybackSession
is now its own, standalone class instead of a subclass ofrequests.Session
. It handles stuff like rate limiting, retries, etc.WaybackHttpAdapter
handles making a single HTTP request and is the interface through which we use the requests library.It has a very small API (2 methods:
.request()
sends an HTTP request;.close()
closes all its network connections).The
.request()
method returns aWaybackHttpResponse
object, which models just the parts of the response we need. TheWaybackHttpResponse
class is an abstract base class that represents the API that the rest of Wayback needs, and internally we have aWaybackRequestsResponse
subclass that wraps a response object from Requests in a thread-safe way.Ideally, the small API will make it easy for us to build an alternate implementation with urllib3, httpx, etc, or just to add thread-safe handling of requests. It might also be a better way to let users customize HTTP behavior than the old approach, although you can’t yet supply a custom subclass to
WaybackClient
orWaybackSession
.❗ Open question: I don’t think that
WaybackSession
should continue to exist. It makes the API complex and it lets the details of requests leak out a little bit through exception handling. Any feedback on this is welcome (cc @danielballan). Some ideas:Move its functionality directly into
WaybackHttpAdapter
.WaybackHttpAdapter
gets more complex because you have to work around all that machinery. It’s not as straightforward as just overriding the.request()
method. Some ways to address that:WaybackHttpAdapter._request()
method (not the underscore) that you can override if you just want to customize the sending of a single HTTP request.WaybackHttpAdapter
subclass. I think this might be a lot more work than it's worth, though.Move its functionality into
WaybackClient
.I think I’m leaning towards some version of (1) here.
This paves the way towards fixing #58, but does not actually solve the issue (
WaybackHttpAdapter
is not yet thread-safe, and needs to use some of the tricks from #23).