Public-nEUro / DataCatalogue

lists datasets available in the PublicnEUro brain imaging repository
https://publicneuro-catalogue.netlify.app/
Creative Commons Zero v1.0 Universal
0 stars 1 forks source link

mapping BIDS files #2

Open CPernet opened 2 months ago

CPernet commented 2 months ago

@jsheunis is there a magic datalad tool to map files from a BIDS dir to the json file schema? allowing to 'automatically' (or almost) add all the json lines to the dataset one

jsheunis commented 2 months ago

Not exactly no. In the past I have used the bids-dataset extractor from datalad-neuroimaging, together with datalad-metalad, to extract dataset-level information from a BIDS-compliant dataset.

When it comes to file-level metadata, I have used datalad-metalad and the metalad_core extractor on the file level to get an array of file-specific metadata objects, together with the meta-conduct command to do this for all files in a datalad dataset.

Also, there would be essentially no difference between getting file-level metadata for a BIDS dataset vs any other collection of files.

More recently, we have developed a bunch of iterators in datalad-next as well as custom scripts that serve as helpers in such scenarios as you have. Here's an example script: https://github.com/abcd-j/data-catalog/blob/filelist/code/create_tabby_filelist.py

I suspect that that script can do what you want it to do with just a few changes, since you need the output format per file of:

{
    "type": "file",
    "dataset_id": "1234",
    "dataset_version": "abcd",
    "path": "file/path/relative/to/dataset/root",
    "url": "download-url-of-file-if-available",
    "contentbytesize": "size-of-file-if-available",
    "metadata_sources": {
        "sources": [
            {
                "source_name": "custom-source-name",
                "source_version": "custom-source-version"
            }
        ]
    }
}
CPernet commented 2 months ago

nice! I'll get onto that next, then - mapping all the files from the phantom dataset 🙏 every command runs smoothly, returns good error messages .. just me slow to catch up