fabiobatalha / crossrefapi

A python library that implements the Crossref API.
BSD 2-Clause "Simplified" License
280 stars 44 forks source link

Add a way to respect API rate limits and timeouts #66

Open AntonLydike opened 1 month ago

AntonLydike commented 1 month ago

According to the api docs, the response may contain the following headers to indicate a request to self-limit request rates:

X-Rate-Limit-Limit: 50
X-Rate-Limit-Interval: 1s

It would be neat if this API supported a mode to self-limit requests to conform to this, or allow for a way to signal these limits to an underlying user.

Happy to submit a patch, if this is a welcome feature.

Also, please let me know if something like this is already implemented here, then I'm happy to write some documentation!

fabiobatalha commented 1 month ago

Hello @AntonLydike

There is a polite mode in the API. In fact, this API has a synchronous approach so usually it never do lots of requests. One implementation that is attended to increase the API performance is to do requests in parallel using multiprocessing or something like that while iterating into pages.

You can review the polite mode.

fabiobatalha commented 1 month ago

I toke a look in the implementation and it seems to be broken, it should be improved for better performance.

Take a look at: https://github.com/fabiobatalha/crossrefapi/blob/53a0c773d022ee83e7bc86433798715ed5c891fe/crossref/restful.py#L58

AntonLydike commented 1 month ago

Basically, what I'm doing is sharing a single Works object between multiple threads. I implemented rate limiting on top of that, but I basically have to guess the current limits (which seems to vary daily, some days I get away with more requests/second than others).

It would be cool to have an internal method inside the API to handle this rate limiting even when used in a multi-threaded workload. (no need to do multiprocessing here as pythons multithreading works fine for IO bound workloads like this one).