Rest of my plan - Githubissues

mih commented 6 years ago

Just FYI, stop me if this is all wrong.

[x] create basic analysis script to use as payload for showing reproducibility
[x] create analysis container image (Singularity recipe in this repo) with all the bits to go from DICOMs to results. with (potentially split image into multiple, one for DICOM->BIDS, one for BIDS->results):
- [x] heudiconv + dcm2niix
- [x] fsl (something free would be better, but somebody else would need to draft the script for that)
[x] new demo script with a containerized data processing workflow
[ ] create "workspace" image with just enough software to be able conduct an analysis using the analysis image (this could be the datalad-core singularity image that already has the needed pieces, but is presently lacking datalad-containers)
[x] write the whole thing up as documentation
- link to OHBM posters
  - 2046: which has everything about conducting analyses this way
  - possibly 2031: for creating/describing environments
  - 2000: all about ReproIn

Stuff is coming in via #4

satra commented 6 years ago

perhaps we can chat about this early this week. specifically with respect to the examples in the following plan:

Introduction to reproducible neuroimaging: motivations
David Kennedy, University of Massachusetts, United States
8:30-10:00
FAIR Data - BIDS datasets
 Jeffrey Grethe [presenting] and Maryann Martone, UCSD, United States

talk 1: Intro to FAIR
exercise: 16 attributes of FAIR - e.g. Is there a clear license, what is a PID, What is meant by metadata, … 
link attributes for 2 modules below 
talk 2: Standardization and BIDS
exercise: dicom to BIDS conversion exercise: basic conversion (tie in w/ ReproIn in next section)
talk 3: FAIR Metadata: searching and using Scicrunch
exercise: BIDS metadata - participants.tsv and semantic annotation
talk 4: Brief Intro to NIDM
exercise: NIDM conversion tool to create sidecar file

10:00-10:15 coffee break
10:15-11:45
Computational basis
Yaroslav Halchenko, Dartmouth College, United States and Michael Hanke, Magdeburg Germany
talk 1: ReproIn : More on this?
Exercise: 
talk 2: Git/GitAnnex/DataLad: 
Exercise: 
talk 3: Everything Else 
Exercise: 
12:00-13:00  Lunch

13:00-14:30  Neuroimaging Workflows
Dorota Jarecka and Satrajit Ghosh, MIT, United States, Camille Maumet, INRIA, France 
talk 1: ReproFlow: Reusable scripts and environments, PROV
Exercise: Run, rinse, and repeat
talk 2: ReproEnv: Virtual machines/ContainersReproPaper, NIDM components
Exercise: Create different environments 
[talk 3: ReproTest: Variability sources (analysis models, operating systems, software versions)]
Exercise: Run analysis with different environments
14:30-14:45  Break

14:45-16:00  Statistics for reproducibility
Celia Greenwood, McGill University, Canada and Jean-Baptiste Poline, McGill University, Canada
Assumes we have a csv file with say 100 subjects and columns like: “age, sex, pheno1, pheno2… “
talk 1: evil p-value : what they are - and are not
Exercise: test with   
talk 2: 
Exercise: 
talk 3: 
Exercise: 
16:00-16:30  Conclusion & Getting Feedback
Nina Preuss, Preuss Enterprises, United States

satra commented 6 years ago

i think we need to clarify and enhance each exercise within the next week or two, and have multiple people go through the exercises well before the session.

satra commented 6 years ago

with respect to images, perhaps we can do either:

a. several small images for each task (the granularity of the task can be established separately) b. one single image for everything

(a) is my current preference since it associates a small reusable component and allows easier maintenance of the image as software pieces change.

mih commented 6 years ago

@satra I was aiming to fill the void of Yarik's and my exercises first .

re images: I am going for small ones (i.e. A). I see no advantage of B.

satra commented 6 years ago

@mih - sounds good to me. i think there is some amount of redoing across exercises. just wanted to make sure we have a coherent picture.

we will try to finish the exercises for section 3 this coming week together with the talk outlines.

djarecka commented 6 years ago

I see one disadvantage of A - we might end up with people who are running multiple containers at the same time and executing things in a wrong one.

mih commented 6 years ago

@djarecka If you take a look the latest demo script you can see how much of container selection people would have to make in the datalad world:

https://github.com/mih/ohbm2018-training/blob/master/fsl_glm_w_amazing_datalad.sh

Pretty much none. One step, one dataset, one container.

djarecka commented 6 years ago

@mih - ok, I didn't realize that people will run one script with docker run inside. I will read carefully and test it this week!

mih commented 6 years ago

https://myyoda.github.io/module-datalad/03-01-reproin/

Pretty much done.

ReproNim / ohbm2018-training

Rest of my plan #5