anaconda / ae5-tools

A command-line tool for scripting AE5 actions
https://www.anaconda.com/enterprise/
BSD 3-Clause "New" or "Revised" License
9 stars 8 forks source link

Idea: multi-stage docker build for a leaner result #64

Open mcg1969 opened 4 years ago

mcg1969 commented 4 years ago

@AlbertDeFusco check this out. I was experimenting with using Docker multi-stage builds to do a slimmed-down docker image.

In short, what you do is you build the Docker image the way that it is currently done. Then you build a second image that copies over nothing but /home/anaconda from the first. So the final image does not have the original Miniconda installation. Multi-stage builds also mean you don't have to be so darn careful about cleaning up after yourself.

The one trick of course is now you don't have anaconda-project run. So what do you replace it with? Well, you grab the command itself from anaconda-project list-commands, and stick that in a launch script along with manual activation of the environment.

When I tried it, this is what I got—and it works! Now, the one wrinkle is that it currently only works with the default command, and it only works if the environment doesn't have post-activate scripts to run. But both of these issues can be addressed.

# The base image is flexible; it simply needs to be able
# to support Anaconda-built glibc binaries.
FROM centos:7

# Miniconda is a minimal Python environment, consisting only of Python
# and the conda package manager. Instead of hosting it in the same directory
# as this Dockerfile, it could be downloaded directly from repo.anaconda.com
# using a curl command in the RUN statement below. The only additional package
# we install in the environment is anaconda-project.
ADD https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh miniconda.sh

ENV LANG=en_US.UTF-8 \
    LC_ALL=en_US.UTF-8
COPY condarc project.tar.gz /src/
RUN yum install -y bzip2 && \
    bash miniconda.sh -b -p /opt/conda && \
    cp /src/condarc /opt/conda/.condarc && \
    source /opt/conda/bin/activate && \
    conda config --set auto_update_conda False --system && \
    conda install anaconda-project --yes && \
    useradd -M anaconda && \
    mkdir /home/anaconda && \
    chown anaconda:anaconda /home/anaconda
USER anaconda
WORKDIR /home/anaconda
RUN source /opt/conda/bin/activate && \
    tar xfz /src/project.tar.gz --strip-components=1 && \
    anaconda-project --verbose prepare && \
    printf '#!/bin/sh\nbin=$(compgen -G envs/*/bin)\nexport PATH=$bin:$PATH\n' > .launch.sh && \
    anaconda-project list-commands | grep -A1 '=' | tail -n 1 | sed 's@^[^ ]*@exec@' >> .launch.sh && \
    chmod +x .launch.sh

# The base image is flexible; it simply needs to be able
# to support Anaconda-built glibc binaries.
FROM centos:7
RUN useradd anaconda
USER anaconda
WORKDIR /home/anaconda
COPY --from=0 /home/anaconda .
CMD ./.launch.sh
mcg1969 commented 4 years ago

Honestly this should go into anaconda-project, not ae5-tools. cc: @jbednar @jsignell

AlbertDeFusco commented 4 years ago

That's cool. Does this work with environment variables set in the anaconda-projec.yml file?

mcg1969 commented 4 years ago

Probably not. I suspect that what I really need is some sort of anaconda-project feature to generate the startup commands

jbednar commented 4 years ago

Well, you grab the command itself from anaconda-project list-commands

What happens when there are multiple commands? I'd prefer to have anaconda-project available in the final Docker image, so that someone who has the Docker image hasn't lost any functionality compared to the .zip file, only gained it. But sure, it makes sense to eliminate anything else specific only to the build process, not the final result.

I suspect that what I really need is some sort of anaconda-project feature to generate the startup commands

It seems to me that all of the docker-image generation code could be part of anaconda-project rather than ae5-tools; it seems like an alternative way to package up a project, similar to a zip file but with other affordances...

AlbertDeFusco commented 4 years ago

I'm on board with developing anaconda-project dock

mcg1969 commented 4 years ago

What happens when there are multiple commands?

If you want to be able to use the same Docker container to run all commands, that's a different use case. And it effectively requires installing the full anaconda-project environment inside it.

But I don't think that's what people should do. Rather, I think there should be a separate Docker image for each command. If the Dockerfile is designed correctly, can have these different Docker images share the same environment layer (assuming they use the same environment for each command). Still, a Docker container is supposed to have a somewhat immutable function, which suggests to me that it needs to "focus", if you will, on each command.

It seems to me that all of the docker-image generation code could be part of anaconda-project rather than ae5-tools

Agreed. However, note that as currently constituted, the ae5-tools approach doesn't require anaconda-project to be a dependency of ae5-tools. Rather, it is installed into the docker containers themselves.

jbednar commented 4 years ago

I think there should be a separate Docker image for each command.

We're probably imagining different scenarios here. I'm imagining something like the various projects on https://examples.pyviz.org, where each project is designed to be some reproducible content, with some commands allowing the project to be tested, some allowing it to be deployed, and some allowing it to be opened as a notebook for the user to explore. See e.g. https://github.com/pyviz-topics/examples/blob/master/attractors/anaconda-project.yml . I'm imagining someone being able to pass around a Docker image that by default does one thing, but which can also be tested by running other commands. Having separate Docker images doesn't seem like it would work, because part of the point is to test that the (first) Docker image is complete and runnable.

mcg1969 commented 4 years ago

That sounds like a reasonable workflow for anaconda-project but for ae5-tools the deployment model is more constrained.