Closed zonca closed 4 years ago
@pibion I reorganized the branches on the repo:
cdms_roberts
cdms_zonca
The CDMS jupyterlab image is huge and has a very large number of layers. https://gitlab.com/supercdms/CompInfrastructure/cdms-jupyterlab/blob/master/Dockerfile
Pulling fails on Jetstream:
sudo docker pull supercdms/cdms-jupyterlab:1.8b
I think it could be the storage driver. I am testing now overlay2
instead of devicemapper
.
actually I increased the docker volume size from 10GB to 20GB, testing
20GB not sufficient, testing with 50GB
ok, this works, the image itself is 20GB, but there are other images from kubernetes, so /var/lib/docker
is 21GB. I think we can safely lower this to 40GB.
it takes 8 minutes to pull the supercdms image:
Normal Pulling 11m kubelet, k8s-4qtmvqk6gv47-minion-0 pulling image "supercdms/cdms-jupyterlab:1.8b"
Normal Pulled 3m16s kubelet, k8s-4qtmvqk6gv47-minion-0 Successfully pulled image "supercdms/cdms-jupyterlab:1.8b"
this is probably the dominant factor in creating new worker nodes. I think in production we should use User Placeholders: https://zero-to-jupyterhub.readthedocs.io/en/latest/administrator/optimization.html#scaling-up-in-time-user-placeholders
[fedora@k8s-4qtmvqk6gv47-minion-0 ~]$ sudo docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
docker.io/supercdms/cdms-jupyterlab 1.8b 764e36e089da 5 weeks ago 20 GB
docker.io/jupyterhub/k8s-image-awaiter 0.8.2 938cb370f906 10 months ago 4.15 MB
docker.io/jupyterhub/k8s-network-tools 0.8.2 02576979bd59 13 months ago 5.62 MB
gcr.io/kubernetes-helm/tiller v2.11.0 ac5f7ee9ae7e 16 months ago 71.8 MB
gcr.io/google_containers/kubernetes-dashboard-amd64 v1.8.3 0c60bcf89900 23 months ago 102 MB
docker.io/coredns/coredns 1.0.1 58d63427cdea 2 years ago 45.1 MB
gcr.io/google_containers/pause 3.0 99e59f495ffa 3 years ago 747 kB
REPOSITORY TAG IMAGE ID CREATED SIZE
quay.io/kubernetes-ingress-controller/nginx-ingress-controller 0.24.1 98675eb54d0e 9 months ago 631 MB
k8s.gcr.io/defaultbackend-amd64 1.5 b5af743e5984 16 months ago 5.13 MB
docker.io/k8scloudprovider/openstack-cloud-controller-manager v0.2.0 5b5ea0c144e8 18 months ago 39.4 MB
gcr.io/google_containers/heapster-amd64 v1.4.2 d4e02f5922ca 2 years ago 73.4 MB
gcr.io/google_containers/pause 3.0 99e59f495ffa 3 years ago 747 kB
we don't have the single-user image on the master node, but we have the node schedulable (just for 1 user), need to configure this.
ok, this works, the image itself is 20GB, but there are other images from kubernetes, so
/var/lib/docker
is 21GB. I think we can safely lower this to 40GB.
Yikes, no wonder the student who updates the docker image is running out of space on his hard drive.
Would it be helpful if we looked into ways to make the image smaller? I know that part of the issues is that we have a lot of dependencies - ROOT, BLAS, and boost among them. But I could talk with my docker-knowledgeable students and see if there are any clear ways to shrink the image.
yes, @pibion that would help a lot. Ideally you would want to start from a Alpine base image with Python and add the dependencies to that one. Maybe use a 2-stage build (https://medium.com/capital-one-tech/multi-stage-builds-and-dockerfile-b5866d9e2f84)?
Anyway, I can proceed forward as it is now.
it takes 8 minutes to pull the supercdms image:
Normal Pulling 11m kubelet, k8s-4qtmvqk6gv47-minion-0 pulling image "supercdms/cdms-jupyterlab:1.8b" Normal Pulled 3m16s kubelet, k8s-4qtmvqk6gv47-minion-0 Successfully pulled image "supercdms/cdms-jupyterlab:1.8b"
this is probably the dominant factor in creating new worker nodes. I think in production we should use User Placeholders: https://zero-to-jupyterhub.readthedocs.io/en/latest/administrator/optimization.html#scaling-up-in-time-user-placeholders
I think SLAC IT may have taken this approach. Compute Canada has made much lighter containers and installed the software on CVMFS, I think.
I'll ask SLAC if they have any comments on this.
@pibion if using the software from CVMFS is a promising option, I think we can anticipate the investigation into #4. Sorry let's move that into #4
@pibion I get this error:
Normal Pulled 10s (x4 over 50s) kubelet, k8s-4qtmvqk6gv47-minion-0 Container image "supercdms/cdms-jupyterlab:1.8b" already present on machine
Normal Created 10s (x4 over 50s) kubelet, k8s-4qtmvqk6gv47-minion-0 Created container
Warning Failed 9s (x4 over 49s) kubelet, k8s-4qtmvqk6gv47-minion-0 Error: failed to start container "notebook": Error response from daemon: oci runtime error: container_linux.go:247: starting container process caused "exec: \"jupyterhub-singleuser\": executable file not found in $PATH"
I could try to debug the issue myself, but better if I focus my time on Kubernetes, @pibion can you have someone look into this? Need to make sure that the jupyterhub
package is installed into the container and that it adds jupyterhub-singleuser
into the PATH. Once you have news, please add details to this issue. Thanks.
We'll look into this and report back. I suspect this is tied up with how JupyterHub is deployed at SLAC; we may need to figure out how to manage different needs for Dockerfiles on different systems.
The jupyterhub
package should be installed in the image.
SLAC uses jupyter-labhub
rather than jupyterhub-singleuser
. They pointed me to their launch file: https://github.com/slaclab/slac-jupyterlab/blob/master/scripts/runlab.sh.
At SLAC they use a different image: https://github.com/slaclab/slac-jupyterlab/blob/master/Dockerfile
do you want to use this one instead of CDMS?
I think we do need the CDMS image.
We "inherit" from the SLAC image https://hub.docker.com/r/slaclab/slac-jupyterhub/tags, but we need to build our own software on top of that.
Building our software in a docker container is one of the main reasons we started exploring JupyterHub, the installation is tricky even for experienced users. And if you don't have the CDMS-specific software you can't look at data.
I cannot get the CDMS image to work on the deployment. I also tried to start the container using /opt/slac/jupyterlab/runlab.sh
.
Is there anyone else deploying the CDMS container in JupyterHub? Can they send me the configuration of their deployment?
I've reached out to SLAC IT; right now they're the only ones deploying this CDMS image. Compute Canada uses a totally different method, they've got a lightweight container and use CVMFS.
I wonder if I should start looking at creating a separate image? I'm not exactly sure where to start with this but it sounds like the jupyterhub
package and adding jupyterhub-singleuser
into the PATH are a place to start?
@pibion so SLAC IT deploys this CDMS image in their JupyterHub environment and it works fine? Can they send me their JupyterHub configuration? maybe there is some special setting I need to enable.
I would wait before creating a new image. I think using CVMFS would be the best solution, I'll work on #4 to assess how difficult that is.
An update from SLAC IT:
the hub deployment is at https://github.com/slaclab/slac-jupyterhub the lab/jupyter is at https://github.com/slaclab/slac-jupyterlab
Compute Canada staff member here with hopefully some useful advice. Is the Compute Canada CVMFS use case related to ATLAS? Before commenting in your other issue about CVMFS usage, you can definitely get the image size down. That Dockerfile is super bloated. You can take a look at Docker multistage builds which would allow you to use different images for building what you need but then throw out all the bloat and just copy over what you need into a Jupyter lab environment.
@netscruff I'll look into Docker multistage builds, thanks!
@pibion I think we can now rely on CVMFS to get the CDMS software stack (#4)
Given this, what image would you like to use for the Jupyter Notebooks? Or should we start from one of the barebones images from the Jupyter team and build up from there as needed?
I'm inclined towards using one of the barebones images from Jupyter and building ours from there - but I'm open either way. Did you have an image in mind?
Would it be helpful for us to compile a list of all the "stuff" we want in our Jupyter environment? We've got a set of python packages, and then's other things like: we use JupyterLab and find extensions like the git extension very useful.
you can start from one of the Jupyter images: https://jupyter-docker-stacks.readthedocs.io/en/latest/using/selecting.html
once you choose that, I'll create a Docker image which derives from that and autobuilds on dockerhub and then you can make a pull request and add all other packages you need.
It looks like jupyter/tensorflow-notebook is the best fit.
ok, I have a base image here:
ok, I am preparing a deployment for testing, I'll log here the steps:
Potential issue with using CVMFS for the CDMS image. The version of boost that CERN provides in their LCG system, which the CDMS repo builds against, is partially broken in that it doesn't ship the boost_numpy library. https://sft.its.cern.ch/jira/browse/SPI-1560
We need that library to build one of the python packages (scdmsPyTools). I'm not sure if there's a sensible workaround at this point. E.g. could compile boost locally in the Docker image, but that would probably break the rest of the cdms stack unless using exactly the same version and flags.
The issue is fixed in the latest LCG nightly build, so maybe there will be a fixed stable release before this jupyter image is production ready...
@bloer yes, at this point let's wait for a fix
@bloer do you recommend we use CentOS 7 as a base for the Jupyter Notebook image? Is that a requirement to setup the CDMS environment via CVMFS?
@zonca What base do you have in mind? The CVMFS CDMS repo supports officially CentOS 7 and SLC6. I am reasonably certain that any RedHat-derived flavor should work though (in particular I know RHEL6 works).
There is no support from CERN for newer OS's yet. There is ubuntu support; I am not building a version of the CDMS image off of that currently but probably could if there was a good reason to do so.
Also worth pointing out only bash is supported right now
@bloer I was testing with debian which the the base of the Jupyter stack. is there documentation about the CVMFS CDMS somewhere where I can find more details?
@zonca the software stack provided by cern has some decent documentation here: http://lcgdocs.web.cern.ch/lcgdocs/lcgreleases/introduction/
The only documentation on the CDMS-specific stack is here: https://confluence.slac.stanford.edu/display/CDMS/Using+CDMS+software+releases and the source for the build tool: http://titus.stanford.edu:8080/git/summary/?r=CompInfrastructure/ReleaseBuilder.git
ReleaseBuilder
is password protected
ok, my plan for now is to try slaclab/slac-jupyterhub
, which is Centos7 and not too big. But first I need to fix #10
ok, got a temporary fix for #10, couldn't make slac-jupyterlab
work.
So I rebuilt the whole Jupyter Docker stacks out of Centos 7 instead of Ubuntu.
I have a first deployment available for testing, see documentation at:
https://github.com/det-lab/jupyterhub-deploy-kubernetes-jetstream/tree/cdms_zonca
it has dummy authentication for now, so any username will work.
@pibion @bloer I emailed you the URL, you are welcome to share it, just do not post it anywhere public. Please let me know what is not working or in general any feedback.
ok, I merged the pull request, so that the documentation about the repository is on the homepage:
https://github.com/det-lab/jupyterhub-deploy-kubernetes-jetstream
For now we are done with this issue, will open new ones if we identify problems.
Now that the issue with volumes is fixed (zonca#23), start working on this and log progress here.