BridgesUNCC / bridges-python

Python client library for Bridges
http://bridgesuncc.github.io
MIT License
2 stars 4 forks source link

Cache data sources upon access #54

Open acbart opened 4 years ago

acbart commented 4 years ago

Hi, I was trying out some of the data sources, and I notice that some of them can take a while to run, while also requiring an active internet connection. I know this suggestion introduces further headaches, but perhaps you should consider setting up a cache for the non-real time datasets?

For the requests based datasets, this would be as trivial as adding in requests-cache:

import requests_cache
requests_cache.install_cache('bridges_datasets')

Along with some kind of helpful expire_cache() call for students to use if the remote data changes for whatever reason.

The SPARQLWrapper stuff would probably be a bit messier, since that's using urllib under the hood. But it probably wouldn't be too hard to just make a little decorator for it. Heck, you could probably even reuse the architecture for requests_cache and keep it all in one place.

If this seemed worthwhile, I'm willing to turn this into a Pull Request. But I wanted to get a sense of whether this is a worthwhile direction.

AlecGoncharow commented 4 years ago

This is a good idea, thank you. We will need to do a bit of exploration before we can say it is something we can use without unintended side effects.

As it stands we are already caching some of the OSM data internally. Can you elaborate on which ones were slow on your end so we can investigate a bit further?

krs-world commented 4 years ago

Cory, we are indeed caching some of the larger datasets like OpenStreetMap, the NOAA elevation map. Is there something more and better we should be doing?

acbart commented 4 years ago

It's a little tough to tell exactly what I was working on then, but I believe it was the WikiData dataset.

My perspective was that all datasets should be cached, with some clever mechanism for easily letting students clear out that local cache. I'm a little less worried about speed than I am about internet stability and the need to not worry about being connected and such. I was expecting something like what Sinbad does. There are headaches and issues, but it seemed like a worthwhile fight to me.