ReproNim / simple_workflow-02

1 stars 2 forks source link

DataLad dataset of metasearch "database" is available #7

Open yarikoptic opened 6 years ago

yarikoptic commented 6 years ago

https://github.com/ReproNim/openneurolab-metasearch-dataset

git annex info

$> git annex info
repository mode: indirect
trusted repositories: 0
semitrusted repositories: 3
        00000000-0000-0000-0000-000000000001 -- web
        00000000-0000-0000-0000-000000000002 -- bittorrent
        9ed025be-5276-4e8a-a1fc-d82a04514147 -- yoh@smaug:/mnt/btrfs/datasets/datalad/crawl/labs/openneurolab/metasearch [here]
untrusted repositories: 0
transfers in progress: none
available local disk space: 9.66 terabytes (+1 megabyte reserved)
temporary object directory size: 1.26 megabytes (clean up with git-annex unused)
local annex keys: 7925
local annex size: 46.75 gigabytes
annexed files in working tree: 8016
size of annexed files in working tree: 47.68 gigabytes
bloom filter size: 32 mebibytes (1.6% full)
backend usage: 
        MD5E: 8016

so it contains 8016 unique files of ~46GB size total.

$> cd openneurolab-metasearch-dataset _stats/ acpi/ corr/ indi/ rocklandsample/ abide_initiative/ adhd200/ gsp/ ixi/ tumordetect/

$> find . ./ixi ./ixi/sub-573 ./ixi/sub-573/ses-1 ./ixi/sub-573/ses-1/IXI573-IOP-1155-T1_rep-0.nii.gz ....

so "default" layout is site/sub-id/ses-ses/....   But we could instruct it to be different:

$> git annex view diagnosis= sex= ... $> find | head
.
./autism ./autism/Male ./autism/Male/T1rep-0%abide_initiative%sub-50806%ses-1%.mgz ...

$> ls adhd-combined/ adhd-hyperactive/ adhd-inattentive/ autism/ control/

happen you like to navigate it that way.  This "view" is  just a branch so within seconds you could back to original "view" by `git annex vpop`

```shell
$> git annex vpop
vpop 1 
Switched to branch 'master'
Your branch is up-to-date with 'origin/master'.
ok

$> ls
_stats/            acpi/     corr/  indi/  rocklandsample/
abide_initiative/  adhd200/  gsp/   ixi/   tumordetect/

or get back to the diagnosis-based one

$> git co views/diagnosis=_\;sex=_ 
Checking out files: 100% (14200/14200), done.
Switched to branch 'views/diagnosis=_;sex=_'

$> ls
adhd-combined/  adhd-hyperactive/  adhd-inattentive/  autism/  control/

Having this dataset, it would be nice if any action (or creating derivative results) could be done using datalad run/rerun commands to maintain in VCS the record/provenance of those changes/results