Open lewisjared opened 11 months ago
In GitLab by @mikapfl on Nov 22, 2023, 20:26
A long-form tutorial for datalad is the Datalad Handbook. However, it is a lot. Maybe a nice start for this use case here would be the provenance tracking Use Case. I think it tells you all the commands you need to run.
If you are confused about what datalad is and how it relates to git and git-annex, there's a table in the docs which helps to understand it.
In GitLab by @zebedee.nicholls on Nov 23, 2023, 04:19
We could use datalad more widely (for more than just input data checking too), let's see how we go
The problem
Some files are hard to download, either because they have to be downloaded manually or frequently fail. Typically I've been storing these results in an S3 bucket so that they are easier to download. This is a manual process and also publically exposes the data (if you know where to look).
Another option to evaluate is datalad. Datalad is used by the PRIMAP team for archiving previously downloaded files and looks like it could be useful here too.
Definition of "done"
Additional context
https://gitlab.com/climate-resource/bookshelf/bookshelf/-/merge_requests/45#note_1653967157