some files in the bucket "disappeared" (might just have been moved), so urls are no longer valid, and were not included in this dataset, but if desired could be brought back: https://github.com/OpenNeuroLab/metasearch/issues/15 .
every file in the dataset has also git-annex metadata assigned from the values in the spreadsheet (I didn't check what happens with conflicting ones ;-)). That allows for exploration of a neat git-annex feature to quickly create views of the dataset while relayouting entire dataset according to the specification. E.g.
$> ls
adhd-combined/ adhd-hyperactive/ adhd-inattentive/ autism/ control/
happen you like to navigate it that way. This "view" is just a branch so within seconds you could back to original "view" by `git annex vpop`
```shell
$> git annex vpop
vpop 1
Switched to branch 'master'
Your branch is up-to-date with 'origin/master'.
ok
$> ls
_stats/ acpi/ corr/ indi/ rocklandsample/
abide_initiative/ adhd200/ gsp/ ixi/ tumordetect/
or get back to the diagnosis-based one
$> git co views/diagnosis=_\;sex=_
Checking out files: 100% (14200/14200), done.
Switched to branch 'views/diagnosis=_;sex=_'
$> ls
adhd-combined/ adhd-hyperactive/ adhd-inattentive/ autism/ control/
Having this dataset, it would be nice if any action (or creating derivative results) could be done using datalad run/rerun commands to maintain in VCS the record/provenance of those changes/results
https://github.com/ReproNim/openneurolab-metasearch-dataset
git annex info
so it contains 8016 unique files of ~46GB size total.
depending on the analysis to pursue, some data cleaning might be desired: https://github.com/OpenNeuroLab/metasearch/issues/17
some files in the bucket "disappeared" (might just have been moved), so urls are no longer valid, and were not included in this dataset, but if desired could be brought back: https://github.com/OpenNeuroLab/metasearch/issues/15 .
every file in the dataset has also git-annex metadata assigned from the values in the spreadsheet (I didn't check what happens with conflicting ones ;-)). That allows for exploration of a neat git-annex feature to quickly create views of the dataset while relayouting entire dataset according to the specification. E.g.
$> cd openneurolab-metasearch-dataset _stats/ acpi/ corr/ indi/ rocklandsample/ abide_initiative/ adhd200/ gsp/ ixi/ tumordetect/
$> find . ./ixi ./ixi/sub-573 ./ixi/sub-573/ses-1 ./ixi/sub-573/ses-1/IXI573-IOP-1155-T1_rep-0.nii.gz ....
$> git annex view diagnosis= sex= ... $> find | head
.
./autism ./autism/Male ./autism/Male/T1rep-0%abide_initiative%sub-50806%ses-1%.mgz ...
$> ls adhd-combined/ adhd-hyperactive/ adhd-inattentive/ autism/ control/
or get back to the diagnosis-based one
Having this dataset, it would be nice if any action (or creating derivative results) could be done using datalad run/rerun commands to maintain in VCS the record/provenance of those changes/results