MRtrix3 / containers

Developer tools for generating minified containers for MRtrix3
0 stars 2 forks source link

Strip down external dependencies? #2

Closed Lestropie closed 4 years ago

Lestropie commented 4 years ago

Unlike other more general neuroimaging containers, which may include full versions of multiple relevant software packages, in the case of this particular container the ANTs and FSL packages are only installed due to the dependence of specific MRtrix3 scripts on their availability.

As such, instead of installing and retaining the full gamut of those software packages, one option would be to strip those external dependencies of all content save for the specific functionalities required, potentially greatly reducing the size of the container.

The disadvantage of such is that it may remove commands from those packages that plug current gaps in MRtrix3 capabilities, e.g. inter-modal registration. On the other hand, the goal of this container should really be only to provide MRtrix3 functionalities; commands from other packages can be executed using those packages directly (or indeed dedicated containers for those packages), and there are alternative projects for providing containers encapsulating multiple neuroimaging software packages, e.g. VNM.

kaczmarj commented 4 years ago

Hi @Lestropie - if you provide the mrtrix3 commands that require external dependencies, I can run those and prune the ANTs and FSL directories to include only what is required by mrtrix3. I agree that it is not the purpose of the MRtrix3 Docker image to package ANTs and FSL.

I will "minify" the Docker image using the neurodocker-minify command in Neurodocker.

Lestropie commented 4 years ago

That minify operation does indeed look interesting; curious to know just how clever it is in determining dependencies if you can link me to code or documentation or something that won't take too much of your time.

The list IIRC is:

ANTs:

FSL:

There's probably also a fair bit of supplementary data in FSL that's not required. The FIRST atlases are pretty big and those are required, but I've not looked into how much else there is.

kaczmarj commented 4 years ago

neurodocker-minify is mostly a wrapper that uses ReproZip under the hood. ReproZip runs a command and uses ptrace to trace every syscall made. Then it looks into the /proc/PID/maps file to get all of the files opened by that process. neurodocker-minify runs a ReproZip trace and then deletes every files not caught by ReproZip (optionally under a specific directory). So I would prune the ANTs and FSL installation directories, create a tarball of those minified directories, upload them somewhere, and then download them as part of the MRtrix3 Docker build.

Lestropie commented 4 years ago

Awesome; that's the kind of robustness I was looking for. Such a download will hopefully help me with my CI woes as well (Lestropie/mrtrix3#1). Only other thing I would want included in such a download is FreeSurfer's FreeSurferColorLUT.txt, as there's a couple of scripts that read from that based on FREESURFER_HOME.

justbennet commented 4 years ago

@Lestropie Are you aware of the work that Paul McCarthy has done on creating subsets of the FSL suite for Singularity containers?

https://git.fmrib.ox.ac.uk/paulmc/fsl-singularity/-/tree/master

There is also a 'stripped' version intended to be used on clusters/grids where size is important. See

https://github.com/cbinyu/fsl6-core/blob/master/Dockerfile

There may be something useful in one of both of those for you.

Singularity, if you are not familiar with it, enables you to create a base 'sandbox' image that appears as a normal file system into which you can copy files or from which you can remove files; you can get an interactive shell from the sandbox installation. You can convert the sandbox installation into a single-file installation quite easily. Those capabilities might be useful for development.

I have some interested in an MRtrix3 installation with FSL and CUDA support, so I am working on something parallel to this. FSL can be hard trace, sometimes, because so much of it is actually shell scripts.

Lestropie commented 4 years ago

@justbennet No I was not aware; though they look like they are only minimal in terms of the non-FSL container OS contents, rather than FSL itself. The second link is probably closer to what I'm looking for, i.e. removing unneeded FSL content. Though I'd probably prefer to go through the FSL installation script and delete unwanted content after the fact, rather than doing a direct download and throwing exclusions on the tar extraction.

Singularity sandbox is indeed super useful. I barely ever use Docker myself partly for that reason, but I'm starting with making modifications to the Dockerfile in this repo first since that's the only one that's there currently, I'll likely add a dedicated Singularity recipe file once I'm done with PR #1. A sandbox image would indeed be the fastest way to test addition and removal of content; but given its description I now have high hopes for the minify script. Thanks for raising though, always worth a shot at saving someone a lot of time :+1:

I have a Singularity container that works for me at least for FSL eddy_cuda; see this branch of MRtrix3_connectome. In the end it didn't actually need anything to be done in the container recipe: all that was needed on the HPC system I'm using was in the SLURM script to load the CUDA module, pass through a requisite LD_LIBRARY_PATH via SINGULARITYENV, and bind /usr/local/cuda/9.1. So you may well find that it's a sysadmin that's needed to get CUDA working for you?

Lestropie commented 4 years ago

Closed through #11.