Add DatasetType="project" and rework existing "layout" example into a proper BIDS dataset

yarikoptic commented 1 week ago

Rationale 1 (major): BIDS standard already provides reasonable structure to formalize organization of various components of a neuroscientific data project: where to place code, original (source) data, derivaitve data, README, CHANGES. Many projects (e.g. nipoppy, YODA, etc) propose similar and often might be even "inspired" templates . If we explicitly allow for BIDS standard to prescribe study level organization, IMHO it would help many people and projects decide on how to organize their studies/projects.
Rationale 2: IMHO BIDS standard should describe only what standard prescribe and not recommend some potential "non-standardized" layouts. That is why I "reworked" that example into a legitimate BIDS dataset merely by adding dataset_description.json.

TODOs:

[x] provide more relation to existing approaches (attn @snastase)
[started] craft example(s) for bids-example : https://github.com/bids-standard/bids-examples/pull/451
- [ ] ensure bids-validator with modified schema passes its validation

snastase commented 1 week ago

Here are a few examples of "project"- (or "study-") level directory structures emerging from different initiatives. The fact that several smaller initiatives are arriving at similar but different solutions is strong motivation to provide a unifying solution. BIDS is the most widely accepted of these proposed solutions, and is therefore well-positioned to provide the "project-level" standard. The recursive structure of BIDS datasets (e.g. where BIDS derivatives dataset is stored within a BIDS dataset) is already well-suited for this purpose.

Example 1: Nipoppy

Nipoppy provides on solution to this problem, but introduces some structures that diverge from BIDS. Converting this project-level Nipoppy directory to BIDS format requires only minor changes: (1) move proc into BIDS code; (2) move tabular data directory into BIDS sourcedata; (3) nest the BIDS derivatives inside the bids directory; (4) include additional project-level metadata files.

Nipoppy (original)	BIDS (minimal)	BIDS (optimal)
```bash ├── proc/ │ └── global_config.json ├── tabular │ ├── manifest.csv │ ├── demographics/ │ ├── assessments/ │ └── bagel.csv ├── sourcedata/ ├── bids/ │ └── sub-001/ │ └── ses-A/ └── derivatives/ ├── fmriprep/ │ ├── 20.2.7/ │ └── 23.1.3/ ├── mriqc/ │ └── 23.1.0/ └── bagel.csv ```	```bash project-nipoppy/ ├── code/ │ └── global_config.json ├── sourcedata/ │ ├── tabular/ │ │ ├── manifest.csv │ │ ├── demographics/ │ │ ├── assessments/ │ │ └── bagel.csv │ └── raw/ │ └── sub-001/ │ └── ses-A/ ├── derivatives/ │ ├── fmriprep/ │ │ ├── 20.2.7/ │ │ └── 23.1.3/ │ ├── mriqc/ │ │ └── 23.1.0/ │ └── bagel.csv ├── README ├── dataset_description.json └── CHANGES ```	```bash project-nipoppy/ ├── code/ │ └── global_config.json ├── sourcedata/ │ ├── tabular/ │ │ ├── manifest.csv │ │ ├── demographics/ │ │ ├── assessments/ │ │ └── bagel.csv │ └── raw/ │ └── sub-001/ │ └── ses-A/ ├── derivatives/ │ ├── fmriprep-20.2.7/ │ ├── fmriprep-23.1.3/ │ ├── mriqc-23.1.0/ │ └── neurobagel-0.0.1/ │ └── bagel.csv ├── README ├── dataset_description.json └── CHANGES ```

Example 2: The Princeton Handbook for Reproducible Neuroimaging

In the Princeton Handbook for Reproducible Neuroimaging we pre-populate a project-level directory structure for code, data, etc—which will typically contain one or more BIDS datasets within it. This directory structure would be converted to a BIDS-compliant version by repositioning dicom and other data directories inside a sourcedata directory and adding the accompanying top-level metadata files.

Princeton (original)	BIDS (minimal)	BIDS (optimal)
```bash new_study_template/ ├── code/ │ ├── analysis/ │ ├── preprocessing/ │ └── task/ └── data/ ├── behavioral/ ├── bids/ │ ├── sub-001/ │ ├── sub-002/ │ ├── sub-003/ │ └── derivatives/ │ ├── deface/ │ ├── fmriprep/ │ ├── freesurfer/ │ └── mriqc/ ├── dicom/ └── work/ ```	```bash project-princeton/ ├── code/ │ ├── analysis/ │ ├── preprocessing/ │ └── task/ ├── sourcedata/ │ ├── behavior/ │ ├── raw/ │ │ ├── sub-001/ │ │ ├── sub-002/ │ │ ├── sub-003/ │ │ └── derivatives/ │ │ ├── deface/ │ │ ├── fmriprep/ │ │ ├── freesurfer/ │ │ └── mriqc/ │ ├── dicom/ │ └── work/ ├── README ├── dataset_description.json └── CHANGES ```	```bash project-princeton/ ├── code/ │ ├── analysis │ ├── preprocessing/ │ └── task/ ├── sourcedata/ │ ├── behavior/ │ ├── raw/ │ │ ├── sub-001/ │ │ ├── sub-002/ │ │ ├── sub-003/ │ └── dicom/ ├── derivatives/ │ ├── deface/ │ ├── fmriprep/ │ ├── freesurfer/ │ ├── mriqc/ │ └── work/ ├── README ├── dataset_description.json └── CHANGES ```

Example 3: YODA

YODA introduces a set of principles for best practices for data analysis. Here, we nest several of the top-level example directories (ci, docs, andenvs) into the code directory. None of these changes interfere with the YODA principles. A critical principal of YODA is that source data are referenced from within a derivative dataset. This recursive structure is now the default for BIDS Apps like fMRIPrep (as of version 20.2.1).

YODA (original)	BIDS (minimal)
```bash ├── ci/ │ └── .travis.yml ├── code/ │ ├── tests/ │ │ └── test_myscript.py │ └── myscript.py ├── docs/ │ ├── build/ │ └── source/ ├── envs/ │ └── Singularity ├── inputs/ │ └── data/ │ ├── dataset1/ │ │ └── datafile_a │ └── dataset2/ │ └── datafile_a ├── important_results/ │ └── figures/ ├── CHANGELOG.md ├── HOWTO.md └── README.md ```	```bash project-yoda/ ├── code/ │ ├── ci/ │ │ └── .travis.yml │ ├── tests/ │ │ └── test_myscript.py │ ├── envs/ │ │ └── Singularity │ ├── docs/ │ │ ├── build/ │ │ └── source/ │ ├── myscript.py │ └── HOWTO.md ├── sourcedata/ │ └── data/ │ ├── dataset-1/ │ │ └── datafile-a │ └── dataset-2/ │ └── datafile-a ├── derivatives/ │ └── results-important/ │ └── figures/ ├── CHANGES ├── dataset_description.json └── README ```

YODA (original)

BIDS (minimal)

```bash ├── ci/ │ └── .travis.yml ├── code/ │ ├── tests/ │ │ └── test_myscript.py │ └── myscript.py ├── docs/ │ ├── build/ │ └── source/ ├── envs/ │ └── Singularity ├── inputs/ │ └── data/ │ ├── dataset1/ │ │ └── datafile_a │ └── dataset2/ │ └── datafile_a ├── important_results/ │ └── figures/ ├── CHANGELOG.md ├── HOWTO.md └── README.md ```

```bash project-yoda/ ├── code/ │ ├── ci/ │ │ └── .travis.yml │ ├── tests/ │ │ └── test_myscript.py │ ├── envs/ │ │ └── Singularity │ ├── docs/ │ │ ├── build/ │ │ └── source/ │ ├── myscript.py │ └── HOWTO.md ├── sourcedata/ │ └── data/ │ ├── dataset-1/ │ │ └── datafile-a │ └── dataset-2/ │ └── datafile-a ├── derivatives/ │ └── results-important/ │ └── figures/ ├── CHANGES ├── dataset_description.json └── README ```

Example 4: BIDS-MEGA

The proposed top-level directory structure for BIDS-MEGA BEP035 is already nearly BIDS-compliant. The only substantive change is to nest the study-* directories within sourcedata.

BIDS-MEGA (original)	BIDS (minimal)
```bash my_megaanalysis/ ├── dataset_description.json ├── studies.json ├── studies.tsv ├── derivatives/ │ ├── nimare-0.0.10 │ : │ ├── study-doe2012/ │ ├── dataset_description.json │ ├── participants.json │ ├── participants.tsv │ ├── derivatives/ │ ├── sub-001/ │ ├── sub-002/ │ : │ ├── study-mustermann2017/ │ ├── dataset_description.json │ ├── participants.json │ ├── participants.tsv │ ├── derivatives/ │ ├── sub-001/ │ ├── sub-002/ │ : │ ├── study-smith2015/ : ├── dataset_description.json ├── participants.json ├── participants.tsv ├── derivatives/ ├── sub-001/ ├── sub-002/ : ```	```bash project-megaanalysis/ ├── code/ │ ├── studies.json │ └── studies.tsv ├── sourcedata/ │ ├── study-doe2012/ │ │ ├── dataset_description.json │ │ ├── participants.json │ │ ├── participants.tsv │ │ ├── derivatives/ │ │ ├── sub-001/ │ │ └── sub-002/ │ ├── study-mustermann2017/ │ │ ├── dataset_description.json │ │ ├── participants.json │ │ ├── participants.tsv │ │ ├── derivatives/ │ │ ├── sub-001/ │ │ └── sub-002/ │ └── study-smith2015/ │ ├── dataset_description.json │ ├── participants.json │ ├── participants.tsv │ ├── derivatives/ │ ├── sub-001/ │ └── sub-002/ ├── derivatives/ │ └── nimare-0.0.10/ ├── CHANGES ├── dataset_description.json └── README ```

BIDS-MEGA (original)

BIDS (minimal)

```bash my_megaanalysis/ ├── dataset_description.json ├── studies.json ├── studies.tsv ├── derivatives/ │ ├── nimare-0.0.10 │ : │ ├── study-doe2012/ │ ├── dataset_description.json │ ├── participants.json │ ├── participants.tsv │ ├── derivatives/ │ ├── sub-001/ │ ├── sub-002/ │ : │ ├── study-mustermann2017/ │ ├── dataset_description.json │ ├── participants.json │ ├── participants.tsv │ ├── derivatives/ │ ├── sub-001/ │ ├── sub-002/ │ : │ ├── study-smith2015/ : ├── dataset_description.json ├── participants.json ├── participants.tsv ├── derivatives/ ├── sub-001/ ├── sub-002/ : ```

```bash project-megaanalysis/ ├── code/ │ ├── studies.json │ └── studies.tsv ├── sourcedata/ │ ├── study-doe2012/ │ │ ├── dataset_description.json │ │ ├── participants.json │ │ ├── participants.tsv │ │ ├── derivatives/ │ │ ├── sub-001/ │ │ └── sub-002/ │ ├── study-mustermann2017/ │ │ ├── dataset_description.json │ │ ├── participants.json │ │ ├── participants.tsv │ │ ├── derivatives/ │ │ ├── sub-001/ │ │ └── sub-002/ │ └── study-smith2015/ │ ├── dataset_description.json │ ├── participants.json │ ├── participants.tsv │ ├── derivatives/ │ ├── sub-001/ │ └── sub-002/ ├── derivatives/ │ └── nimare-0.0.10/ ├── CHANGES ├── dataset_description.json └── README ```

nikhil153 commented 7 hours ago

Hi @yarikoptic, @snastase, @Remi-Gau , @jbpoline, @michellewang,

Here is a revised nipoppy layout that conforms to BIDS (minimal) proposal based on our discussions. The key motivations of nipoppy are as follows:

We consider nipoppy primarily as a protocol for study coordinators / data managers dealing with iterative data capture, curation, processing, and tracking tasks. This widens our scope beyond typical BIDSification to include several additional files and processing support (e.g. Boutiques).
The protocol initiates with creation of two project/study level files i.e. nipoppy_manifest and nipoppy_config during data capture stage irrespective of data types and modalities. These files provide a starting point (i.e. ground truth) for all subsequent nipoppy protocol stages, and therefore are intuitively placed at the root of the dataset.
We expect phenotypic (i.e. tabular) and imaging data to be collected and organized via independent workflows and people. Thus we prefer the aggregated phenotypic mode and place this subdirectory on the same level as imaging subdirectory within sourcedata. This simplifies access control and tracking (i.e. bagels) of data availability.

Hope this makes sense! Let me know your thoughts on the sample layout below. We would like to finalize it soon as we are training several collaborators and deploying it for their studies in coming weeks.

Thanks!


<DATASET_ROOT>/
├── nipoppy_config.json (goes into bidsignore)
├── nipoppy_manifest.json (goes into bidsigonre)
├── scratch/ (goes into bidsigonre)
├── sourcedata/
│   ├── downloads/ 
│   ├── raw_imaging/
│   │   ├── unorg/ (possibly this can be moved into downloads...) 
│   │   └── org/
│   ├── imaging/
│   │   ├── participants.tsv
│   │   ├── sub-01/
│   │   │   ├── anat
│   │   │   └── func
│   │   └── sub-02/
│   │       ├── anat
│   │       └── dwi
│   └── tabular/
│       ├── demographics/
│       ├── assessments/
│       └── bagel.csv
├── code/
│   ├── proc/
│   │   ├── invocations/
│   │   ├── descriptors/
│   │   ├── tracker_configs/
│   │   └── pybids/
│   │       ├── bids_db/
│   │       └── ignore_patterns/
│   ├── utils/
│   │   ├── generate_manifest.py
│   │   └── download_dicoms.py
│   └── analysis/
│       ├── run_func_connectivity.py
│       └── run_my_fancy_ML_model.py
├── derivatives/
│   ├── fmriprep/
│   │   ├── 20.2.7/
│   │   └── 23.1.3/
│   ├── mriqc/
│   │   └── 23.1.0/
│   └── bagel.csv
├── README 
├── dataset_description.json 
├── CHANGES 
└── .bidsignore

bids-standard / bids-specification