MRtrix3 / containers

Developer tools for generating minified containers for MRtrix3
0 stars 2 forks source link

Reduce container size using neurodocker-minify #4

Closed Lestropie closed 4 years ago

Lestropie commented 4 years ago

Tagging @kaczmarj.

Made a little bit of progress on this, but only in draft form for now.

Lestropie commented 4 years ago

Todo: Install ANTs from source rather than through apt-get. This should permit specifying the installation location, which should in turn enable specifying this location as a directory to prune in the neurodocker-minify call. Currently the ANTs installation puts files in locations that can't be pruned, e.g. /usr/bin.

Lestropie commented 4 years ago

minify got the container from 19GB to 1.8GB. Most of that will be first atlases I think.

That process does however seem to be wiping the environment variables set within the container recipe file...

kaczmarj commented 4 years ago

minify got the container from 19GB to 1.8GB

I'm sure Docker never envisioned 19GB "microservices" :)

The minification will indeed wipe the environment variables, so those will have to be reset. Also it might make more sense to have two separate Dockerfiles -- one full Dockerfile to be used for minification that includes the entire FSL and ANTs installations, and a second Dockerfile that installs the minified versions of FSL and ANTs.

Lestropie commented 4 years ago

The minification will indeed wipe the environment variables, so those will have to be reset.

Do you have an established recommendation for how to address this? I'd expected to go straight from minification to DockerHub upload; utilising a second Dockerfile just to set some environment variables seems clunky. Better to just write them explicitly to e.g. ~/.profile?

Also it might make more sense to have two separate Dockerfiles -- one full Dockerfile to be used for minification that includes the entire FSL and ANTs installations, and a second Dockerfile that installs the minified versions of FSL and ANTs.

How would you envision this working? Would it simply be the case that, after following the instructions currently being drafted in the README, the minified container would then be uploaded to DockerHub, and a second one-liner Dockerfile would be defined that simply downloads such? Or are you thinking of something different?

kaczmarj commented 4 years ago

I was thinking of having one Dockerfile with the full installations of ANTs, FSL, and MRtrix3. The file could be full.Dockerfile for example. This image would be used for minification. neurodocker-minify would prune the ANTs and FSL directories based on various MRtrix3 commands. Then, these pruned directories would be extracted, compressed into a tar.gz or similar, and then uploaded somewhere (even a GitHub release on this repo).

There would be a second Dockerfile, for example slim.Dockerfile, that installs MRtrix3 and the pruned ANTs and FSL from the tarball online. This second Docker image could be made very slim by excluding compilers and development libraries.

I can make a full example later today.

kaczmarj commented 4 years ago

@Lestropie - I wrote my ideas down in the branch https://github.com/MRtrix3/containers/tree/kaczmarj-minify

Can you please take a look? The Dockerfiles there use multi-stage builds, and with BuildKit (which ships with Docker nowadays), those stages can be built in parallel. I have outlines that in the README of that branch. Those Dockerfiles aren't yet complete, but they're close.

Lestropie commented 4 years ago

OK, there's some ideas in there I've not seen before, and your slim recipe is different to what I had been thinking, though yours potentially has greater overlap with MRtrix3/mrtrix3#2134. There might be a third option that merges ideas from both and provides a good solution for both. I suspect you've more experience with containers than I have, but I'll at least write down my thinking in full and we'll see if it makes sense.


What I had initially intended was:

  1. Complete the instructions for minification I've documented thus far.

  2. Upload the resulting image to DockerHub; this would be the image that users would access.

  3. The Dockerfile for building the container pre-minification would be renamed; a second Dockerfile would then be defined, which would simply contain something like:

    from MRtrix3:latest

    So it would just be essentially a proxy for any service that grabs software containers based on the presence of a Dockerfile in the repository.

The problems with this solution are:


Now what I'm thinking instead is:

  1. Using the solution I have here currently; but at the end of tests.sh, erase the contents of /opt/mrtrix3/ from the container.

  2. Instead of your suggestion of tarballing the contents of e.g. /opt/ants/ and /opt/fsl/, upload the whole container image-with MRtrix3 scrubbed-to DockerHub. This becomes a template intended for making MRtrix3, with all of the dependencies for building & running in place, but without MRtrix3 itself.

  3. In this repo (MRtrix3/containers), a second Dockerfile pulls this template container from DockerHub, builds MRtrix3 master as release, removes the compilation dependencies and unnecessary contents of /opt/mrtrix3/.

  4. In MRtrix3/script_test_action, pull the same template container from DockerHub, build the nominated MRtrix3 commitish a release but with asserts enabled, and don't bother with subsequent container cleanup as it's only for running CI tests.


From what you've got there, there's definitely optimisations for the initial build that could be introduced as a standalone build optimisation changeset, but I think my latest idea of having the full container there as a template to pull is cleaner than tarballing the dependencies and then starting the container build from scratch post-minify.

Lestropie commented 4 years ago

Okay, I think I have a working version: http://hub.docker.com/repository/docker/mrtrix3/mrtrix3 Instructions in the README of this branch are all up-to-date. This seems like a decent solution to me, and I think should help sort out the issues I was having with the Python script CI testing; but it's all open to reasonable criticism if there's a better alternative. Also very

@jdtournier: If you want to make a Docker account, I can then add you as a member of the DockerHub organisation. Being a free account, we only get up to three members, and I can't make the base repository private.

jdtournier commented 4 years ago

OK, I've just created an account on DockerHub as jdtournier. Thanks!

Lestropie commented 4 years ago

CLosed in favour of #11.