ReproNim / ohbm2018-training

http://www.reproducibleimaging.org/ohbm2018-training
3 stars 8 forks source link

VirtualBox list #15

Open satra opened 6 years ago

satra commented 6 years ago

@mjtravers - replacing issue #12 - just check off all the things you already have.

Core VM

general (see each section for additional local installs (listed under install)

FAIR Data - BIDS datasets

Computational basis

Neuroimaging Workflows

Statistics for reproducibility

Others

djarecka commented 6 years ago

@satra - where are the instruction for the fsl&heudiconv container? I was thinking that building an image might be nice (and not only creating a Dockerfile), but I can "cheat" and use existing layers.

satra commented 6 years ago

@djarecka - here you go:

# section 1
docker run --rm kaczmarj/neurodocker:master generate singularity \
  --base neurodebian:latest --pkg-manager apt \
  --install graphviz git wget \
  --miniconda \
    conda_install="python=3 pytest graphviz pip reprozip reprounzip \
       requests rdflib fuzzywuzzy python-levenshtein pygithub pandas" \
    pip_install="owlready2 pybids duecredit \
     https://github.com/incf-nidash/PyNIDM/archive/a90b3f47dbdafb9504f13a3a8d85fdff931cc45c.zip" \
    create_env="section1" \
    activate=true \
  --run-bash "cd /opt && \
    git clone https://github.com/incf-nidash/PyNIDM.git" > Singularity

# section 2/3
docker run --rm kaczmarj/neurodocker:master generate singularity  \
  --base neurodebian:stretch-non-free   --pkg-manager apt   \
  --install fsl-5.0-core fsl-mni152-templates \
  --install make gcc sqlite3 libsqlite3-dev python3-dev \
    libc6-dev python3-pip python3-setuptools python3-wheel \
  --run "pip3 install --system reprozip reprounzip" \
  --add-to-entrypoint "source /etc/fsl/5.0/fsl.sh" > Singularity

docker run --rm kaczmarj/neurodocker:master generate singularity \
  --base neurodebian:latest --pkg-manager apt \
  --install pigz python3-pip python3-traits python3-scipy  \
     python3-setuptools python3-wheel python3-networkx dcm2niix \
  --install make gcc sqlite3 libsqlite3-dev python3-dev libc6-dev \
  --run "pip3 install --system nipype \
    https://github.com/mvdoc/dcmstack/archive/bf/importsys.zip \
    https://github.com/nipy/heudiconv/archive/master.zip \
    reprozip reprounzip" > Singularity
djarecka commented 6 years ago

I would add one Docker image to compare and use in the container lesson, e.g. the second image, so the neurodocker command is:

docker run --rm kaczmarj/neurodocker:master generate docker  \
  --base neurodebian:stretch-non-free   --pkg-manager apt   \
  --install fsl-5.0-core fsl-mni152-templates \
  --install make gcc sqlite3 libsqlite3-dev python3-dev \
    libc6-dev python3-pip python3-setuptools python3-wheel \
  --run "pip3 install --system reprozip reprounzip" \
  --add-to-entrypoint "source /etc/fsl/5.0/fsl.sh" > Dockerfile

And one T1w image would be great, e.g. ds000114/sub-01/ses-test/anat/sub-01_ses-test_T1w.nii.gz, but could be really any image, just want to use as an example for bet command

kaczmarj commented 6 years ago

i will be cutting a new neurodocker release this week after i add more examples.

by the way, in general i recommend running neurodocker with docker without -i/--interactive or -t/--tty. i run it with docker run --rm kaczmarj/neurodocker:master ....

kaczmarj commented 6 years ago

another minor point, pre-compiled reprozip wheels can be installed with pip now (see https://github.com/ViDA-NYU/reprozip/issues/224).

satra commented 6 years ago

@kaczmarj - it did not work day before yesterday when i tried. pip complained about compiling which is why a bunch of those additional dependencies were added.

@mjtravers, @yarikoptic - any chance you can take a look at this issue today? it would be good to cut a VM today or tomorrow if possible to have people play with it before we ask students to download.

satra commented 6 years ago

@kaczmarj - i've updated the commands above without the -i (@djarecka - it may be useful to add the utility of i and t for docker in additional slides.

mjtravers commented 6 years ago

I am running a VM build now that incorporates the section 1, 3, and 4 instructions above and from Al. Will send out a notice when it is posted for downloading and review.

mjtravers commented 6 years ago

I have posted an updated VM: https://training.repronim.org/repronim-training-v0.2.ova

There are 2 conda environments set up named: "section1" and "section4":

For section3, the kaczmarj/neurodocker:master image has been pulled into Docker and the following files are in the home directory:

djarecka commented 6 years ago

Thanks @mjtravers. I could I misunderstood @satra, but I thought that we include singularity/docker images inside, so people don't spend time to build them

djarecka commented 6 years ago
screen shot 2018-06-06 at 13 13 25
djarecka commented 6 years ago

@mjtravers - I'll wait for the next version of VM and will test the new conda environments

mjtravers commented 6 years ago

The next version will be out later tonight. I discovered a couple of issues after I ran the build. The build is also taking a bit longer. I will message when ready

mjtravers commented 6 years ago

The updated VM is now available. I believe this version has everything for sections 1, 3, and 4. Section 2 is still in the works.

Download: https://training.repronim.org/repronim-training.ova The VM size is now ~10GB

satra commented 6 years ago

@mjtravers - that seems large for what it contains. i'll see if i can download and check it out.

for conda are you clearing out the environments post install? example: https://github.com/kaczmarj/neurodocker/blob/master/examples/nipype_tutorial/Dockerfile#L117

also for any apt-get are you using eatmydata or some such? example: https://github.com/kaczmarj/neurodocker/blob/master/examples/nipype_tutorial/Dockerfile#L26

also is the vagrant file somewhere? it may be slightly easier for me to build it than download it on my flight :)

mjtravers commented 6 years ago

@satra No, I'm not doing any slimming of the file so I am sure there are a few GBs we could shave off. I am using packer and building off a baseline Ubuntu Desktop ova. I can post the files to this git repo and the baseline ova to the training.repronim.org. Let me set it up

djarecka commented 6 years ago

don't know if that helps, but just checked the size of the environments:

(section4) vagrant@nitrcce:~$ du -hs /home/vagrant/miniconda2/envs/*
361M    /home/vagrant/miniconda2/envs/section1
65M /home/vagrant/miniconda2/envs/section3
1.3G    /home/vagrant/miniconda2/envs/section4

so together around 1.7

satra commented 6 years ago

as soon as @jbpoline shares the notebooks, we can bring down the size of section 4. i'm sure all he needs is jupyter pandas seaborn scipy (and their dependencies) and may be statsmodels :)

let me know when the packer file is available. i'll try to build it on our cluster remotely. it's going to take the rest of my flight to download that ova!

kaczmarj commented 6 years ago

you can also save about 200 MB by installing jupyter-notebook as notebook instead of the entire jupyter package. and if you need jupyterlab, you can install that from conda-forge as jupyterlab.

the jupyter package installs big dependencies like qt5, which probably are not necessary for the vm.

djarecka commented 6 years ago

I've tested the section1 by running:

cd workspace/Indiv_Diffs_ReadingSkill/
~/PyNIDM/bin/BIDSMRI2NIDM.py -d ~/workspace/Indiv_Diffs_ReadingSkill
 cd ~/nidm-training/
python rdf-age-query.py -nidm ~/workspace/Indiv_Diffs_ReadingSkill/nidm.ttl

and I got:

sub-12 - 2.096 - http://purl.org/nidash/nidm#_998c5d57-6a64-11e8-9b22-080027d6419f
sub-14 - 3.176 - http://purl.org/nidash/nidm#_998c5d5f-6a64-11e8-9b22-080027d6419f
sub-01 - 1.726 - http://purl.org/nidash/nidm#_998c5d2b-6a64-11e8-9b22-080027d6419f
sub-21 - -2.364 - http://purl.org/nidash/nidm#_998c5d7b-6a64-11e8-9b22-080027d6419f
...

So I didn't get any error, but I'm not sure if these is a proper output, should be "subject IDs, age of each subject, and the assessment ID" (the age looks pretty low to me for reading skills, but didn't read anything about the experiment)

For the section 4, I only tested a few things: importing pandas, numpy, opening jupyter notebook and lab. It seems to be working fine now.

mjtravers commented 6 years ago

I have a pull request in containing the VM build scripts.

I will take a look at those tutorials later today and see how much we can slim down the file.

There is a cleanup.sh script in place to add size-reducing code. Now that the code is out there, feel free to edit.

dbkeator commented 6 years ago

@djarecka: My last pull request for PyNIDM had some changes to BIDSMRI2NIDM but I didn't explicitly copy the tool from it's development location ([https://github.com/incf-nidash/PyNIDM/tree/master/nidm/experiment/tools]) to the bin folder so those copies may be out of date.

I'll take a look...

djarecka commented 6 years ago

@dbkeator - so the output should be different?

jgrethe commented 6 years ago

Hi Dorota, Output from the query looks fine based on the data that is in the included participants.tsv. The students will be generating their own participant.tsv file as part of the hands on work…

On Jun 7, 2018, at 11:10, Dorota Jarecka notifications@github.com wrote:

@dbkeator https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_dbkeator&d=DwMCaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=VoBnXVyFWlbGZNrES22a2h0PqI4m1kGEUyTwLwBBYIQ&m=ksyrz2bXpH97oHdYrah4jvmgHShUhBmtnAdNVbxwNks&s=geLIPdmrpZF6E_hZr_gT5oyaLOAX7r7Q4ukAKh_l_sY&e= - so the output should be different?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ReproNim_ohbm2018-2Dtraining_issues_15-23issuecomment-2D395514112&d=DwMCaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=VoBnXVyFWlbGZNrES22a2h0PqI4m1kGEUyTwLwBBYIQ&m=ksyrz2bXpH97oHdYrah4jvmgHShUhBmtnAdNVbxwNks&s=cyicK7OnMQcBp800WENXuxuyCOEz-YocroEcYvRafcg&e=, or mute the thread https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ABjyZRFUA7e0jx7PDVZctpFYboYq8PhYks5t6WyQgaJpZM4Uavfu&d=DwMCaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=VoBnXVyFWlbGZNrES22a2h0PqI4m1kGEUyTwLwBBYIQ&m=ksyrz2bXpH97oHdYrah4jvmgHShUhBmtnAdNVbxwNks&s=qC2U1wDwBPLNm0pbtt-LVjuwd52Bqfj-0zg334gOIOg&e=.

dbkeator commented 6 years ago

@djarecka @mjtravers Hi Folks, so I tried to replicate what Dorota did with the latest OVA:

   cd workspace/Indiv_Diffs_ReadingSkill/
   ~/PyNIDM/bin/BIDSMRI2NIDM.py -d ~/workspace/Indiv_Diffs_ReadingSkill
   cd ~/nidm-training/
   python rdf-age-query.py -nidm ~/workspace/Indiv_Diffs_ReadingSkill/nidm.ttl

In the OVA, first, I received an error that PyNIDM wasn't installed. So I issued the following command: python ~/PyNIDM/setup.py install

Then I received an error that pybids wasn't installed. So, I issued the following command: cd ~
git clone https://github.com/INCF/pybids.git cd pybids python setup.py install

Then, I received an error that urllib.parse doesn't have a module named quote. This was a curious error because urllib comes with python....which led me to the biggest problem: We have installed python2.7 via miniconda. PyNIDM was written for python 3.x and thus the current problem with the urllib package but likely many other downstream problems.

So, I couldn't test the things Dorota did.

@djarecka How did you test section1 with the current OVA file given the python 2.7 install?

Thanks!

djarecka commented 6 years ago

@dbkeator - I did everything using the conda environment created by Matt, so first thing I did was source activate section1. Sorry, should have included it in my post.

That is the environment that is purely for this part, so should have everything, and if not @mjtravers should know.

djarecka commented 6 years ago

@jgrethe - thank you for the confirmation!

dbkeator commented 6 years ago

@djarecka Got it, that worked. Appears the query is also working. I didn't realize the ages were funky....

yarikoptic commented 6 years ago

BTW one personally anything issue for me is that shortcuts on Win+number configured as shortcuts to various heavy applications such as libreoffice

satra commented 6 years ago

i'm having some trouble importing the ova on our older virtualbox on our cluster. can someone verify the md5sum below?

$ md5sum repronim-training.ova 
bad86aace872ed38c46f9e30c9d86d62  repronim-training.ova

(i can't test on osx as it will take me 4 days to download under my current connection)

satra commented 6 years ago

@djarecka verified that the above md5sum is correct.

satra commented 6 years ago

and in case others have the same import appliance issue on some flavor of linux/virtualbox combo, this link helps solve it: http://installfights.blogspot.com/2018/05/how-to-fix-virtualbox-error-when-you.html

jbpoline commented 6 years ago

@satra @mjtravers @kaczmarj
yes - absolutely - there is way to much stuff there - working on this right now and should have an update on what exactly is needed (Satra's list sounds right). @cmtgreenwood has uploaded two R scripts that we should test in the VM as well

djarecka commented 6 years ago

@jbpoline - i'm not really R-user, so I might be doing something wrong, but I tried to open the script multiTesting.Rmd in R studio and run, and it returns errors starting from there is no package called ‘knitr’. I believe @mjtravers didn't have the list of required R packages.

jbpoline commented 6 years ago

@djarecka you are right : I think we need the R libraries library(knitr) library(rmarkdown) library(mvtnorm) library(ggplot2)

@cmtgreenwood do you confirm ? I suppose we can always extract the R code from the Rmd files which should run on the VM - but we do need mvtnorm and ggplot2, right ? @mjtravers : would be hard to include these R libraries in the VM ?

djarecka commented 6 years ago

@jbpoline - my understanding was that r-studio (which is installed) can open Rmd and can run the specific script cells. This is what I tried and that's how I got the package errors.

jbpoline commented 6 years ago

hum - I am no R person but it looks like we need these R "libraries" (equivalent of python packages)

djarecka commented 6 years ago

@jbpoline yes, i only tried to say that we don't need to "extract the R code from Rmd".

mjtravers commented 6 years ago

@djarecka @jbpoline I am able to load those R packages onto the system and have them added to the VM build process. Kinda sure I am loading them right.

I have a build going right now that will include the above R packages in r-base... plus the section 3 stuff for Yarik.

The build will be done and posted for download later this evening.

mjtravers commented 6 years ago

@satra .... plus I added the clean up for apt and conda referred to above. Will see if it reduces the VM size any with the in-progress build

satra commented 6 years ago

@mjtravers - thank you - let's see what this does. it would be nice if packer did some kind of pre-post step assessment of size. may give us an indication of where things are piling up. my internal calculations indicate this VM should not exceed much more than 5-6G as an ova.

my wifi here is quite insufficient, so i'm trying to figure out how to run things remotely on our cluster.

mjtravers commented 6 years ago

@satra I'll give this a try: http://www.netreliant.com/news/8/17/Compacting-VirtualBox-Disk-Images-Linux-Guests.html

mjtravers commented 6 years ago

@satra Success on compacting the OVA. Your calculations are correct, the size of the file is 5.25GB.

I have posted it to the training website. Note the new name (resulting from needing to clone the original OVA file as part of the compaction process):

https://training.repronim.org/reprotraining.ova

This file has the R libraries and section 2 python packages included.

This file does not have the section 3 edits posted this morning. I have pulled those changes and they'll go in the next build.

djarecka commented 6 years ago

Thank @mjtravers ! This image is still not expected to have datalad inside, is that right?

mjtravers commented 6 years ago

@djarecka Datalad is installed in conda env section2

version 0.10.0-rc5

djarecka commented 6 years ago

@jbpoline @cmtgreenwood I opened again rstudio and tried to run scripts. multiTesting.Rmd returns error since it has some path set to C:/CeliaFiles.... The type1 script didn't return any errors, but I did not even try to validate if the output/plots are good.

kaczmarj commented 6 years ago

@mjtravers @djarecka @satra - i released neurodocker version 0.4.0.

docker pull kaczmarj/neurodocker:0.4.0
cmtgreenwood commented 6 years ago

yes knitr and markdown packages are needed to assemble a nice report.
However the R script parts could be run without this so may be much easier if I rewrite the scripts to be plain text.

So I will fix the path information probably tomorrow, and upload another version.

I will create plain text versions (i.e. *.R files) at the same time that do not need knitr and markdown

jbpoline commented 6 years ago

@cmtgreenwood Celia: I moved your scripts into section4/section41 and my notebook in section4/section42 I think Matt has included Rstudio and the libraries needed so - not sure we need to extract the R but I havent checked yet !

mjtravers commented 6 years ago

@jbpoline @cmtgreenwood Yes, the following R packages were installed on the VM:

... and R-Studio