distribits / distribits-2024-hackathon

1 stars 1 forks source link

Sprinkle more datalad (copy-file, registry, datacat) in the cohort_creator #4

Open Remi-Gau opened 7 months ago

Remi-Gau commented 7 months ago

The cohort creator is primarily a tool to facilitate getting a specific subset of participants from several BIDS raw and derivatives datalad datasets and combine them into a new dalalad dataset.

To function, the cohort_creator regularly indexes a large number of open datasets to keep track of some metadata (for example: what does this dataset contain? Does it have derivatives datasets and if so where are they?)

goals

Goal 1: use datalad copy-file to simplify some of the implementation of the cohort creator.

Goal 2: see if some of the indexing done in the CI of the cohort_creator could be simplified by mashing use of the datalad registry.

Goal 3: see if the datalad catalog can help users get an overview of the cohort data they have created.

links

Remi-Gau commented 7 months ago

Inputs

DatasetID   PortalURI
ds000001    https://github.com/OpenNeuroDatasets/ds000001.git
ds000002    https://github.com/OpenNeuroDatasets-JSONLD/ds000002.git
ds000200    https://github.com/OpenNeuroDatasets/ds000200
ds001226    https://github.com/OpenNeuroDatasets-JSONLD/ds001226
ds002799    https://github.com/OpenNeuroDatasets/ds002799
DatasetID   SubjectID   Age Sex Diagnosis   SessionID   SessionPath NumSessions Modality
ds000001    sub-03  26  Female  PD      n/a 1   [T1,T2,bold]
ds000001    sub-17  25  Female  CTL     n/a 1   [T1,T2,bold]
ds000002    sub-12  25  Female          n/a 1   [T1,T2,bold]
ds000002    sub-13  25  Female          n/a 1   [T1,T2,bold]
ds000200    sub-2001    n/a n/a         n/a 1   [T1,bold]
ds000200    sub-2008    n/a n/a         n/a 1   [T1,bold]
ds001226    sub-CON03   53  Female  CON ses-preop   ds001226/sub-CON03/ses-preop    1   [T1,dwi,bold]
ds001226    sub-CON03   53  Female  CON ses-postop  ds001226/sub-CON03/ses-postop   1   [T1,dwi,bold]
ds001226    sub-CON07   48  Female  CON ses-preop   ds001226/sub-CON07/ses-preop    1   [T1,dwi,bold]
ds002799    sub-292             ses-preop   ds002785/sub-292/ses-preop
ds002799    sub-294             ses-postop  ds002785/sub-294/ses-postop

CLI

cohort_creator install [-h] -d DATASET_LISTING [DATASET_LISTING ...]
                       [-p PARTICIPANT_LISTING] [-o OUTPUT_DIR]
                       [--dataset_types {raw,mriqc,fmriprep} [{raw,mriqc,fmriprep} ...]]
                       [--verbosity {0,1,2,3}]
                       [--generate_participant_listing]

cohort_creator get [-h] -d DATASET_LISTING [DATASET_LISTING ...]
                   [-p PARTICIPANT_LISTING] [-o OUTPUT_DIR]
                   [--dataset_types {raw,mriqc,fmriprep} [{raw,mriqc,fmriprep} ...]]
                   [--verbosity {0,1,2,3}]
                   [--datatypes {anat,func,fmap} [{anat,func,fmap} ...]]
                   [--space SPACE] [--task TASK]
                   [--bids_filter_file BIDS_FILTER_FILE] [--jobs JOBS]

cohort_creator copy [-h] -d DATASET_LISTING [DATASET_LISTING ...]
                    [-p PARTICIPANT_LISTING] [-o OUTPUT_DIR]
                    [--dataset_types {raw,mriqc,fmriprep} [{raw,mriqc,fmriprep} ...]]
                    [--verbosity {0,1,2,3}]
                    [--datatypes {anat,func,fmap} [{anat,func,fmap} ...]]
                    [--space SPACE] [--task TASK]
                    [--bids_filter_file BIDS_FILTER_FILE] [--skip_group_mriqc]

Outputs

├── sourcedata
│   ├── ds000001
│   ├── ds000001-fmriprep
│   ├── ds000001-mriqc
│   ├── ds000002
│   ├── ds000002-fmriprep
│   ├── ds000002-mriqc
│   ├── ds000200
│   ├── ds001226
│   ├── ds001226-fmriprep
│   └── ds001226-mriqc
│
├── study-ds000001
│   ├── derivatives
│   │   ├── fmriprep-21.0.1
│   │   │   ├── sub-03
│   │   │   │   └── anat
│   │   │   │       ├── sub-03_space-MNI152NLin2009cAsym_res-2_desc-preproc_T1w.json
│   │   │   │       └── sub-03_space-MNI152NLin2009cAsym_res-2_desc-preproc_T1w.nii.gz
│   │   │   ├── dataset_description.json
│   │   │   └── README.md
│   │   └── mriqc-0.16.1
│   │       ├── sub-03
│   │       │   └── anat
│   │       │       └── sub-03_T1w.json
│   │       ├── dataset_description.json
│   │       └── README.md
│   ├── sub-03
│   │   └── anat
│   │       └── sub-03_T1w.nii.gz
│   ├── dataset_description.json
│   ├── participants.tsv
│   └── README
│
├── study-ds000002
│   ├── derivatives
│   ├── sub-12
│   ├── sub-13
│   ├── dataset_description.json
│   ├── participants.tsv
│   └── README
│
├── study-ds000200
│   ├── sub-2001
│   ├── dataset_description.json
│   ├── participants.tsv
│   └── README
│
├── study-ds001226
│   ├── derivatives
│   ├── sub-CON03
│   ├── sub-CON07
│   ├── dataset_description.json
│   ├── participants.tsv
│   └── README
│
├── dataset_description.json
├── README.md
├── studies.json
└── studies.tsv