Closed fernandogelin closed 4 years ago
I think this looks great! The only things that are missing are probably to be part of future PRs (which you mentioned above):
fall-2020
) so that we don't need to run a terraform update if there is an image update, then we could use something like the label of the branch being merged as a tag or something like that. See PR Labeler. if we want to create tags on commit hash, then we would need to pair with a terraform -apply
- I'm just always terrified to do that in production
Could also first tag as dev-fall2020
which is tested in the dev hub and then changed to the approbate class... We probably need to chat about this pointI think this looks great! The only things that are missing are probably to be part of future PRs (which you mentioned above):
* Tagging strategy: If we keep using the strategy of a fixed tag pre semester (say `fall-2020`) so that we don't need to run a terraform update if there is an image update, then we could use something like the label of the branch being merged as a tag or something like that. See [PR Labeler](https://github.com/TimonVS/pr-labeler-action). if we want to create tags on commit hash, then we would need to pair with a `terraform -apply` - I'm just always terrified to do that in production Could also first tag as `dev-fall2020` which is tested in the dev hub and then changed to the approbate class... We probably need to chat about this point
We can have a branch named as the semester (e.g. fall2020), and we push updates to it throughout the semester. In the action workflow we can pass ~${GITHUB_REF##*/}
~ ${{ github.ref }}
as a tag, that will get the branch name from the environment. We can also pass ~${GITHUB_SHA}
~ ${{ github.sha }}
if we want to tag with the commit sha. Then when the semester is over we can freeze and tag a release, and start a new branch for the next semester (or just rename the current branch and make changes there).
That PR Labeler action is only adding labels to PRs. Did you mean a different action?
* Steps for installing Julia and other plugins
I'll work on this one, but what I'm thinking is to use Docker's multi-stage builds. Then in the
docker-compose.yml
we can set the target, we can pass the target as an environment variable to each class workflow.
If I'm understanding correctly, I'm not sure that multistage builds are a good solution for plugins. Each container will start clean, if you want Julia and Python and you're using a multistage build, you'll have to copy the Python artifacts to Julia or vice-versa. I'd instead suggest using build arguments to specify what goes into the docker images
The labeler what just if we didn't want to use branches, and instead use labels of a PR based on name. Branches are more flexible and cover more scenarios, but a bit more bookkeeping. I'm okay try branches or releases as a first pass
If I'm understanding correctly, I'm not sure that multistage builds are a good solution for plugins. Each container will start clean, if you want Julia and Python and you're using a multistage build, you'll have to copy the Python artifacts to Julia or vice-versa. I'd instead suggest using build arguments to specify what goes into the docker images
What I'm thinking is to have something like this:
FROM image as base
# install all base things needed for Jupyter
# install python packages
FROM base as julia # this stage has python and julia
# install Julia packages
FROM base as r_lang # this stage has python and R
# install r pakages
FROM julia as julia_r # this stage has python, julia, and R
# install r packages
Then in the docker-compose for each class, we pass the target
as an env variable.
If I'm understanding correctly, I'm not sure that multistage builds are a good solution for plugins. Each container will start clean, if you want Julia and Python and you're using a multistage build, you'll have to copy the Python artifacts to Julia or vice-versa. I'd instead suggest using build arguments to specify what goes into the docker images
What I'm thinking is to have something like this:
FROM image as base # install all base things needed for Jupyter # install python packages FROM base as julia # this stage has python and julia # install Julia packages FROM base as r_lang # this stage has python and R # install r pakages FROM julia as julia_r # this stage has python, julia, and R # install r packages
Then in the docker-compose for each class, we pass the
target
as an env variable.
Oh, I get it. I wasn't thinking about rolling the resulting container down the file. It feels a little weird to me. Is there an advantage to using multistage builds over build arguments?
This looks rad! To create a new class, I just need to add a workflow file and a requirements file?
yes!
If I'm understanding correctly, I'm not sure that multistage builds are a good solution for plugins. Each container will start clean, if you want Julia and Python and you're using a multistage build, you'll have to copy the Python artifacts to Julia or vice-versa. I'd instead suggest using build arguments to specify what goes into the docker images
What I'm thinking is to have something like this:
FROM image as base # install all base things needed for Jupyter # install python packages FROM base as julia # this stage has python and julia # install Julia packages FROM base as r_lang # this stage has python and R # install r pakages FROM julia as julia_r # this stage has python, julia, and R # install r packages
Then in the docker-compose for each class, we pass the
target
as an env variable.Oh, I get it. I wasn't thinking about rolling the resulting container down the file. It feels a little weird to me. Is there an advantage to using multistage builds over build arguments?
I'm not sure. How are you envisioning the build arguments will work with this process?
If I'm understanding correctly, I'm not sure that multistage builds are a good solution for plugins. Each container will start clean, if you want Julia and Python and you're using a multistage build, you'll have to copy the Python artifacts to Julia or vice-versa. I'd instead suggest using build arguments to specify what goes into the docker images
What I'm thinking is to have something like this:
FROM image as base # install all base things needed for Jupyter # install python packages FROM base as julia # this stage has python and julia # install Julia packages FROM base as r_lang # this stage has python and R # install r pakages FROM julia as julia_r # this stage has python, julia, and R # install r packages
Then in the docker-compose for each class, we pass the
target
as an env variable.Oh, I get it. I wasn't thinking about rolling the resulting container down the file. It feels a little weird to me. Is there an advantage to using multistage builds over build arguments?
I'm not sure. How are you envisioning the build arguments will work with this process?
I guess I was thinking something like:
FROM image
ARG WITH_JULIA=false
ARG WITH_R=false
# Install base requirements
RUN if [ "${WITH_JULIA}" = "true" ]; then \
apt-get update && apt-get install -y \
# install julia stuff here \
fi
RUN if [ "${WITH_R}" = "true" ]; then \
# install r stuff here \
fi
If I'm understanding correctly, I'm not sure that multistage builds are a good solution for plugins. Each container will start clean, if you want Julia and Python and you're using a multistage build, you'll have to copy the Python artifacts to Julia or vice-versa. I'd instead suggest using build arguments to specify what goes into the docker images
What I'm thinking is to have something like this:
FROM image as base # install all base things needed for Jupyter # install python packages FROM base as julia # this stage has python and julia # install Julia packages FROM base as r_lang # this stage has python and R # install r pakages FROM julia as julia_r # this stage has python, julia, and R # install r packages
Then in the docker-compose for each class, we pass the
target
as an env variable.Oh, I get it. I wasn't thinking about rolling the resulting container down the file. It feels a little weird to me. Is there an advantage to using multistage builds over build arguments?
I'm not sure. How are you envisioning the build arguments will work with this process?
I guess I was thinking something like:
FROM image ARG WITH_JULIA=false ARG WITH_R=false # Install base requirements RUN if [ "${WITH_JULIA}" = "true" ]; then \ apt-get update && apt-get install -y \ # install julia stuff here \ fi RUN if [ "${WITH_R}" = "true" ]; then \ # install r stuff here \ fi
oh I see, I like this too. Not sure what the pros and cons are for ARGS vs multi-stage.
Choose whatever method does a better job at caching!
I'm not sure how the intermediate containers are handled with cache. I think they're not cached? I'm gonna have to look that up
https://pythonspeed.com/articles/faster-multi-stage-builds/
That was a helpful read. By default intermediate stages are not cached, you need to tag them and push them separately and explicitly ask docker to use that as part of it's cache. This could slow down your build times if you don't push the intermediate stages
https://pythonspeed.com/articles/faster-multi-stage-builds/
That was a helpful read. By default intermediate stages are not cached, you need to tag them and push them separately and explicitly ask docker to use that as part of it's cache. This could slow down your build times if you don't push the intermediate stages
ah, good to know! Thanks for that.
But that could be more reusable than with the ARGS, because in the end different classes may have different permutations of the stages.... does that sound right?
But that could be more reusable than with the ARGS, because in the end different classes may have different permutations of the stages.... does that sound right?
I think ARGS are more easily reusable in this case, because we can just pass the ARGS needed for that specific class in the workflow. With multi-stage we would end up creating more stages if a class depends in multiple stages but not all. I guess we can experiment when we add more complex classes.
I think neither is going to cache well. Intermediates aren't cached by default, and the ARGs cache will be invalidated each time someone changes WITH_JULIA=false
to WITH_JULIA=true
. If I had to guess (and that's all it is right now), I'd guess that the naive ARGs solution would cache better, but you could do complicated magic with the multistage builds to get better performance in the long term
maybe also have the class specific, optional, files of:
classname/Project.toml
(or JuliaProject.toml
)
classname/RInstall.R
(just an R file with all the install package commands needed)
This all awesome!
With the multi-staged/build args, would it be possible to do something like
COPY --from=julia:1.5 <this things that are a julia install>
or similarly lean on those images in a multi-staged build?maybe also have the class specific, optional, files of:
classname/Project.toml
(orJuliaProject.toml
)classname/RInstall.R
(just an R file with all the install package commands needed)
for Julia yes, and it's in the other PR. But for R, no, the r packages are installed with conda.
https://www.docker.com/blog/advanced-dockerfiles-faster-builds-and-smaller-images-using-buildkit-and-multistage-builds/ this looks maybe relevant as well
Overview
We use Github Actions and Docker Compose to create the environments, build, and push the docker images. This new process will replace the previous process outline in the
docker-stacks
repository. By moving the creation of the Docker images to Github actions, we cut down the time we spend building and waiting for these images to build. Often when we do this in our personal computers, the images end up taking too much disk space and eventually docker will complain. This will not be a problem with the new process.The GH Actions way
To create an image to be used in JupyterHub for a particular class, we need these components:
Shared Components
Dockerfile
: the base dockerfile to create the image. The file currently being used is from Berkley.docker-compose.yml
: docker compose file with two services. One creates the conda environment files usingbrownccv/jh_sandbox:0.1.0
. This step will generate and write the conda environment files. These files will be uploaded as artifacts so students can use them to reproduce the JH environment. The second step uses the environment file generated in step one to build the image and push to GCR (Google Container Registry).scripts/
: contains the scripts needed by the image. Currently it has scripts needed by the Berkley image and the ones needed for the Jupyter official images.Exclusive Components
Each class has the following exclusive components:
className/requirements.txt
: the requirement file with the packages needed to create the conda environment..github/workflow/className.yml
: the github action workflow. One workflow per class will make the environment files artifacts easier to find. In addition, it allows us to run the workflow conditionally on changes related to a single class.General Notes
If the Dockerfile for a class needs extra steps, these should be added as stages in the same Dockerfile and an extra service should be added to
docker-compose.yml
with atarget
key.The actions running on push will allow for a streamlined development, however, I would suggest that we tag releases for the images that are officially being used in production. The release workflow is not part of this PR and still needs to be created.
This process also can be moved to the same repo as the actual JupyterHub deployment code.
The secrets needed for this action were added to the Organization level, so they can easily be reused in case we create different repos.