PTB-MR / mrpro

MR image reconstruction and processing.
https://ptb-mr.github.io/mrpro/
Apache License 2.0
17 stars 2 forks source link

Hosting data for example scripts #236

Open schuenke opened 7 months ago

schuenke commented 7 months ago

During the hackathon we decided to put the phantom data for the example scripts into a single zenodo DOI / dataset (or maybe split it into 2 datasets; one for siemens sequences and one for pulseq sequences).

However, I would suggest that we create one zenodo DOI / dataset for each example script. Maybe I'm just incapable, but when trying to add the seq-files for the pulseq .h5 files, I accidentally deleted all the available ISMRMRD files. Not sure if there is an option to simply ADD new files or if a new version of a dataset always requires to upload all files again, which is very error prone because you can easily forget a file...

Furthermore, using separate DOIs / datasets for the individual example scripts would allow to switch back to using zenodo_get() instead of `requests.get()', which is more cumbersome imo (we need to create a tmp file, create the request, write the data) and is also susceptible to timeouts (at least at PTB).

Maybe simply react with πŸ‘ if you agree to switch to separate DOIs for each example script.

ckolbPTB commented 7 months ago

I think it is a good idea to have smaller "chunks" of data with their own DOI but I am not sure that we can that easily separate data between notebooks. It is more likely that we will partially reuse data for different notebooks. If each file has its own DOI then things might become quite messy.

In the short term I would simply use !curl filename url to download individual files from a zenodo dataset. In the long term I think we will need a separate example_utils.py where we can collect general functions for downloading, viewing... the data needed by all notebooks.

fzimmermann89 commented 6 months ago

I still get a bit annoyed by tests randomy failing due to zenodo.

what about using LFS and the public PTB gitlab? https://docs.gitlab.com/ee/topics/git/lfs/

So creating a repository for the example data, enabling LFS, and storing the data there? It seems like LFS is installed in our instance (at least there is a toggle to disable it, I have not tested if it actually works...)