esgf2-us / intake-esgf

Programmatic access to the ESGF holdings
https://intake-esgf.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
10 stars 6 forks source link

Should we maintain a local database? #4

Open nocollier opened 10 months ago

nocollier commented 10 months ago

Currently we do not populate a local database when files are downloaded. Once to_dataset_dict() is called, we query the index node for file information and use the directory_format_template_ and other information in the response to build up a local path. Essentially we are using the remote index as our database which is only useful for querying if a specific file is present.

I chose this originally because no database means nothing to keep clean, nothing to maintain, no additional complexity. As downloads fail or are canceled by the user, the database could become corrupt and we simply avoid this by not having one. However, I am now experiencing some drawbacks. Having the local database could:

It strikes me now that the additional complexity is worth the added benefit. It is really not acceptable that you cannot find a file that exists locally on your system. So then the question is: which format makes the most sense?

Probably SQLite is a better choice given that we will be writing to the database in parallel. @mgrover1 thoughts?

mgrover1 commented 10 months ago

I like the idea of maintaining a SQLite database! I think this offers the features we need and satisfies the requirements here.

nocollier commented 8 months ago

I have restructured the queries to get file information and they may not take nearly as long as before. Hold off on this until we see how much of a pain point it remains.

bouweandela commented 8 months ago

You could also consider just using a cache instead of a full blown database, e.g. requests_cache is used by esgf-pyclient to cache searches. That also speeds up the searches that go over the internet, but takes very little effort to maintain and if it becomes corrupted for some reason it can just be deleted.

nocollier commented 8 months ago

Thanks for this suggestion! Will take a look.