climate-resource / bookshelf

A bookshelf of useful preprocessed data
https://climate-resource.github.io/bookshelf/
MIT License
2 stars 0 forks source link

Data backend for downloaded results #99

Open lewisjared opened 11 months ago

lewisjared commented 11 months ago

The problem

Some files are hard to download, either because they have to be downloaded manually or frequently fail. Typically I've been storing these results in an S3 bucket so that they are easier to download. This is a manual process and also publically exposes the data (if you know where to look).

Another option to evaluate is datalad. Datalad is used by the PRIMAP team for archiving previously downloaded files and looks like it could be useful here too.

Definition of "done"

Additional context

https://gitlab.com/climate-resource/bookshelf/bookshelf/-/merge_requests/45#note_1653967157

lewisjared commented 11 months ago

In GitLab by @mikapfl on Nov 22, 2023, 20:26

A long-form tutorial for datalad is the Datalad Handbook. However, it is a lot. Maybe a nice start for this use case here would be the provenance tracking Use Case. I think it tells you all the commands you need to run.

If you are confused about what datalad is and how it relates to git and git-annex, there's a table in the docs which helps to understand it.

lewisjared commented 11 months ago

In GitLab by @zebedee.nicholls on Nov 23, 2023, 04:19

We could use datalad more widely (for more than just input data checking too), let's see how we go