fabiobatalha / crossrefapi

A python library that implements the Crossref API.
BSD 2-Clause "Simplified" License
280 stars 44 forks source link

Add cache functionality #50

Open pdvass opened 1 year ago

pdvass commented 1 year ago

The proposed changes are based on Crossref's REST API documentation recommendation of caching the responses.

NOTE: The cache that unittest creates is persistent. To tackle that, I have written .bat and .sh files to run develop and test from setup.py, then delete the cache created. If needed, I can provide them too.

fabiobatalha commented 1 year ago

Hello @pdvass

Thanks for the commit.

In my point of view, what we would cache is the http requests made to the Crossref API.

So, I think any cache implementation should be in the HTTPRequest class, in specific the do_http_request method.

Besides that, I'm not sure about the benefits to include a caching implementation in this library, once it is mostly used for data harvesting, and the chances to reuse or benefit from a cache is almost null, but maybe I could be wrong. Even though, it is fine for me to include a cache layer in the terms described above.

pdvass commented 1 year ago

Hello @fabiobatalha

Thank you for your feedback.

The main motive behind this PR is that I had to create a dataset that I didn't know from the beginning the size that it should be. My approach was to wait for the responses, convert them and then save them, but it was an expensive conversion for a bigger dataset. So, by caching I was able to save the responses and then add more, but not the same, if needed. Also, I find it easier to transfer it from device to device, to not rerun everything and save time.

As of the implementation details, this is how I managed to get it working, because I needed a way to choose a backend.