Split Docker into two layers ?

fzimmermann89 commented 3 weeks ago

I am wondering if it would make sense to split the install_mrpro.sh used in our docker file into such that two layers are created, one layer up to https://github.com/PTB-MR/mrpro/blob/d396143f62bfbaed5dd9b66b86be3182f3467192/docker/install_mrpro.sh#L1-L36 and one layer for https://github.com/PTB-MR/mrpro/blob/d396143f62bfbaed5dd9b66b86be3182f3467192/docker/install_mrpro.sh#L37 as the latter will change more often than the former. We should be able to use caching for the layers that stay the same -- we just need a way to trigger a rebuild if a new pytorch version is released -- maybe we can specify a version in the docker file (related #244), thus a bump would result in a change of the Dockerfileand a rebuild?

ckolbPTB commented 3 weeks ago

The reason for using only one layer was that usually more layers means larger docker files which slows down our github actions.

I played around with docker caching but the problem was that it is sensitive to changes in the Dockerfile. So changes in e.g. your pyproject toml would not result in a rebuild of the docker container. One would need to detect relevant changes by hand in order to use the caching savely.

fzimmermann89 commented 3 weeks ago

My suggestion would be to have one layer for the basic stuff (get python, git etc) One layer for our dependencies One layer for mrpro One layer for binder stuff (only for the binder Docker file)

Layers afaik only significantly increase file size if they overwrite each other. Otherwise the ability to cache and reuse will be a net positive most likely.

Cache is invalidated on copy/add of a file with a different checksum if I remember correctly. So a pip install or git clone does not invalide, but copying in a changed changed repo does.

As we now have a release schedule, we could just update the base docker with our pre installed torch and requirements in each release (and on changes to the pyproject.toml)

This would also save us from breaking tests in a PR by some update of a dependency

So we would just have to check if all updated dependencies still work every two weeks when we release.

Maybe we can discuss this at some point in person.

PTB-MR / mrpro

Split Docker into two layers ? #456