abcd-j / data-catalog

https://data.abcd-j.de
0 stars 1 forks source link

Scripts to generate a dataset file list #38

Closed tmheunis closed 3 months ago

tmheunis commented 3 months ago

This PR adds two scripts that serve to create file lists in the tabby format, which is in turn used by subsequent scripts to add file metadata to a catalog. The main script is create_tabby_filelist.py, which generalises the steps laid out in this issue: https://github.com/abcd-j/data-catalog/issues/22. The script can take an argument to specify in which way to generate a filelist, e.g. using glob for non-datalad datasets on a local filesystem, or tree which uses the datalad tree command for datalad datasets.

The second script is more of a custom script to add URLs to the filelist generated by the first script. These will be different for different datasets, but this specific script can serve as the basis for future changes that could perhaps be generalised over time.

The PR also includes a documentation update:

This PR can close https://github.com/abcd-j/data-catalog/issues/22.