fabiobatalha / crossrefapi

A python library that implements the Crossref API.
BSD 2-Clause "Simplified" License
280 stars 44 forks source link

Support for rate limiting #40

Closed strogonoff closed 2 years ago

strogonoff commented 2 years ago
  1. Does this library automatically apply throttling to comply with Crossref rate limits? I.e., as an extreme example, if non-Plus caller invokes doi() 100 times a second, would crossrefapi throttle outgoing requests and make the caller wait so as not to exceed Crossref API limits?

  2. If not, is there a way for the caller to programmatically find out rate limit currently in effect and throttle its doi() invocations accordingly?

See also: https://api.crossref.org/swagger-ui/index.html. It seems that Crossref signals current rate limits using HTTP headers.

Presumably, complying with rate limits is preferable and guarantees not running into any further limiting.

strogonoff commented 2 years ago

From my own digging, it looks like rate limiting is applied using sleep()[0]. So it does respect Crossref’s signals.

However, I believe this implies that to respect rate limits the library must be called from a single thread only. (I’m also not sure whether it will play nice in async environment, I think it should but I’m not familiar enough with async Python.)

It looks like callers that want to implement their own throttling could do this:

  1. Subclass HTTPRequest and override its _update_rate_limits and possibly do_http_request methods. It seems possible to use an async task queue and/or interact with a shared throttling timer in there.
  2. Subclass Works (or any other Endpoint subclass in use) and in its constructor override do_http_request with the above subclassed HTTPRequest implementation.

[0] https://github.com/fabiobatalha/crossrefapi/blob/master/crossref/restful.py#L95