det-lab / jupyterhub-deploy-kubernetes-jetstream

CDMS JupyterHub deployment on XSEDE Jetstream
0 stars 1 forks source link

Customize JupyterHub with the CDMS enviroment #3

Closed zonca closed 4 years ago

zonca commented 4 years ago

Now that the issue with volumes is fixed (zonca#23), start working on this and log progress here.

zonca commented 4 years ago

@pibion I reorganized the branches on the repo:

zonca commented 4 years ago

The CDMS jupyterlab image is huge and has a very large number of layers. https://gitlab.com/supercdms/CompInfrastructure/cdms-jupyterlab/blob/master/Dockerfile

Pulling fails on Jetstream:

sudo docker pull supercdms/cdms-jupyterlab:1.8b

I think it could be the storage driver. I am testing now overlay2 instead of devicemapper.

zonca commented 4 years ago

actually I increased the docker volume size from 10GB to 20GB, testing

zonca commented 4 years ago

20GB not sufficient, testing with 50GB

zonca commented 4 years ago

ok, this works, the image itself is 20GB, but there are other images from kubernetes, so /var/lib/docker is 21GB. I think we can safely lower this to 40GB.

zonca commented 4 years ago

it takes 8 minutes to pull the supercdms image:

  Normal   Pulling                 11m    kubelet, k8s-4qtmvqk6gv47-minion-0  pulling image "supercdms/cdms-jupyterlab:1.8b"
  Normal   Pulled                  3m16s  kubelet, k8s-4qtmvqk6gv47-minion-0  Successfully pulled image "supercdms/cdms-jupyterlab:1.8b"                                                                                                                                  

this is probably the dominant factor in creating new worker nodes. I think in production we should use User Placeholders: https://zero-to-jupyterhub.readthedocs.io/en/latest/administrator/optimization.html#scaling-up-in-time-user-placeholders

zonca commented 4 years ago

Images pulled to the nodes

Worker node

[fedora@k8s-4qtmvqk6gv47-minion-0 ~]$ sudo docker images
REPOSITORY                                            TAG                 IMAGE ID            CREATED             SIZE
docker.io/supercdms/cdms-jupyterlab                   1.8b                764e36e089da        5 weeks ago         20 GB
docker.io/jupyterhub/k8s-image-awaiter                0.8.2               938cb370f906        10 months ago       4.15 MB
docker.io/jupyterhub/k8s-network-tools                0.8.2               02576979bd59        13 months ago       5.62 MB
gcr.io/kubernetes-helm/tiller                         v2.11.0             ac5f7ee9ae7e        16 months ago       71.8 MB
gcr.io/google_containers/kubernetes-dashboard-amd64   v1.8.3              0c60bcf89900        23 months ago       102 MB
docker.io/coredns/coredns                             1.0.1               58d63427cdea        2 years ago         45.1 MB
gcr.io/google_containers/pause                        3.0                 99e59f495ffa        3 years ago         747 kB

Master node

REPOSITORY                                                       TAG                 IMAGE ID            CREATED             SIZE
quay.io/kubernetes-ingress-controller/nginx-ingress-controller   0.24.1              98675eb54d0e        9 months ago        631 MB
k8s.gcr.io/defaultbackend-amd64                                  1.5                 b5af743e5984        16 months ago       5.13 MB
docker.io/k8scloudprovider/openstack-cloud-controller-manager    v0.2.0              5b5ea0c144e8        18 months ago       39.4 MB
gcr.io/google_containers/heapster-amd64                          v1.4.2              d4e02f5922ca        2 years ago         73.4 MB
gcr.io/google_containers/pause                                   3.0                 99e59f495ffa        3 years ago         747 kB

we don't have the single-user image on the master node, but we have the node schedulable (just for 1 user), need to configure this.

pibion commented 4 years ago

ok, this works, the image itself is 20GB, but there are other images from kubernetes, so /var/lib/docker is 21GB. I think we can safely lower this to 40GB.

Yikes, no wonder the student who updates the docker image is running out of space on his hard drive.

Would it be helpful if we looked into ways to make the image smaller? I know that part of the issues is that we have a lot of dependencies - ROOT, BLAS, and boost among them. But I could talk with my docker-knowledgeable students and see if there are any clear ways to shrink the image.

zonca commented 4 years ago

yes, @pibion that would help a lot. Ideally you would want to start from a Alpine base image with Python and add the dependencies to that one. Maybe use a 2-stage build (https://medium.com/capital-one-tech/multi-stage-builds-and-dockerfile-b5866d9e2f84)?

Anyway, I can proceed forward as it is now.

pibion commented 4 years ago

it takes 8 minutes to pull the supercdms image:

  Normal   Pulling                 11m    kubelet, k8s-4qtmvqk6gv47-minion-0  pulling image "supercdms/cdms-jupyterlab:1.8b"
  Normal   Pulled                  3m16s  kubelet, k8s-4qtmvqk6gv47-minion-0  Successfully pulled image "supercdms/cdms-jupyterlab:1.8b"                                                                                                                                  

this is probably the dominant factor in creating new worker nodes. I think in production we should use User Placeholders: https://zero-to-jupyterhub.readthedocs.io/en/latest/administrator/optimization.html#scaling-up-in-time-user-placeholders

I think SLAC IT may have taken this approach. Compute Canada has made much lighter containers and installed the software on CVMFS, I think.

I'll ask SLAC if they have any comments on this.

zonca commented 4 years ago

@pibion if using the software from CVMFS is a promising option, I think we can anticipate the investigation into #4. Sorry let's move that into #4

zonca commented 4 years ago

@pibion I get this error:

  Normal   Pulled                  10s (x4 over 50s)  kubelet, k8s-4qtmvqk6gv47-minion-0  Container image "supercdms/cdms-jupyterlab:1.8b" already present on machine
  Normal   Created                 10s (x4 over 50s)  kubelet, k8s-4qtmvqk6gv47-minion-0  Created container
  Warning  Failed                  9s (x4 over 49s)   kubelet, k8s-4qtmvqk6gv47-minion-0  Error: failed to start container "notebook": Error response from daemon: oci runtime error: container_linux.go:247: starting container process caused "exec: \"jupyterhub-singleuser\": executable file not found in $PATH"

I could try to debug the issue myself, but better if I focus my time on Kubernetes, @pibion can you have someone look into this? Need to make sure that the jupyterhub package is installed into the container and that it adds jupyterhub-singleuser into the PATH. Once you have news, please add details to this issue. Thanks.

pibion commented 4 years ago

We'll look into this and report back. I suspect this is tied up with how JupyterHub is deployed at SLAC; we may need to figure out how to manage different needs for Dockerfiles on different systems.

pibion commented 4 years ago

The jupyterhub package should be installed in the image.

SLAC uses jupyter-labhub rather than jupyterhub-singleuser. They pointed me to their launch file: https://github.com/slaclab/slac-jupyterlab/blob/master/scripts/runlab.sh.

zonca commented 4 years ago

At SLAC they use a different image: https://github.com/slaclab/slac-jupyterlab/blob/master/Dockerfile

do you want to use this one instead of CDMS?

pibion commented 4 years ago

I think we do need the CDMS image.

We "inherit" from the SLAC image https://hub.docker.com/r/slaclab/slac-jupyterhub/tags, but we need to build our own software on top of that.

Building our software in a docker container is one of the main reasons we started exploring JupyterHub, the installation is tricky even for experienced users. And if you don't have the CDMS-specific software you can't look at data.

zonca commented 4 years ago

I cannot get the CDMS image to work on the deployment. I also tried to start the container using /opt/slac/jupyterlab/runlab.sh. Is there anyone else deploying the CDMS container in JupyterHub? Can they send me the configuration of their deployment?

pibion commented 4 years ago

I've reached out to SLAC IT; right now they're the only ones deploying this CDMS image. Compute Canada uses a totally different method, they've got a lightweight container and use CVMFS.

I wonder if I should start looking at creating a separate image? I'm not exactly sure where to start with this but it sounds like the jupyterhub package and adding jupyterhub-singleuser into the PATH are a place to start?

zonca commented 4 years ago

@pibion so SLAC IT deploys this CDMS image in their JupyterHub environment and it works fine? Can they send me their JupyterHub configuration? maybe there is some special setting I need to enable.

I would wait before creating a new image. I think using CVMFS would be the best solution, I'll work on #4 to assess how difficult that is.

pibion commented 4 years ago

An update from SLAC IT:

the hub deployment is at https://github.com/slaclab/slac-jupyterhub the lab/jupyter is at https://github.com/slaclab/slac-jupyterlab

netscruff commented 4 years ago

Compute Canada staff member here with hopefully some useful advice. Is the Compute Canada CVMFS use case related to ATLAS? Before commenting in your other issue about CVMFS usage, you can definitely get the image size down. That Dockerfile is super bloated. You can take a look at Docker multistage builds which would allow you to use different images for building what you need but then throw out all the bloat and just copy over what you need into a Jupyter lab environment.

pibion commented 4 years ago

@netscruff I'll look into Docker multistage builds, thanks!

zonca commented 4 years ago

@pibion I think we can now rely on CVMFS to get the CDMS software stack (#4)

Given this, what image would you like to use for the Jupyter Notebooks? Or should we start from one of the barebones images from the Jupyter team and build up from there as needed?

pibion commented 4 years ago

I'm inclined towards using one of the barebones images from Jupyter and building ours from there - but I'm open either way. Did you have an image in mind?

Would it be helpful for us to compile a list of all the "stuff" we want in our Jupyter environment? We've got a set of python packages, and then's other things like: we use JupyterLab and find extensions like the git extension very useful.

zonca commented 4 years ago

you can start from one of the Jupyter images: https://jupyter-docker-stacks.readthedocs.io/en/latest/using/selecting.html

once you choose that, I'll create a Docker image which derives from that and autobuilds on dockerhub and then you can make a pull request and add all other packages you need.

pibion commented 4 years ago

It looks like jupyter/tensorflow-notebook is the best fit.

zonca commented 4 years ago

ok, I have a base image here:

zonca commented 4 years ago

ok, I am preparing a deployment for testing, I'll log here the steps:

bloer commented 4 years ago

Potential issue with using CVMFS for the CDMS image. The version of boost that CERN provides in their LCG system, which the CDMS repo builds against, is partially broken in that it doesn't ship the boost_numpy library. https://sft.its.cern.ch/jira/browse/SPI-1560

We need that library to build one of the python packages (scdmsPyTools). I'm not sure if there's a sensible workaround at this point. E.g. could compile boost locally in the Docker image, but that would probably break the rest of the cdms stack unless using exactly the same version and flags.

The issue is fixed in the latest LCG nightly build, so maybe there will be a fixed stable release before this jupyter image is production ready...

zonca commented 4 years ago

@bloer yes, at this point let's wait for a fix

zonca commented 4 years ago

@bloer do you recommend we use CentOS 7 as a base for the Jupyter Notebook image? Is that a requirement to setup the CDMS environment via CVMFS?

bloer commented 4 years ago

@zonca What base do you have in mind? The CVMFS CDMS repo supports officially CentOS 7 and SLC6. I am reasonably certain that any RedHat-derived flavor should work though (in particular I know RHEL6 works).

There is no support from CERN for newer OS's yet. There is ubuntu support; I am not building a version of the CDMS image off of that currently but probably could if there was a good reason to do so.

Also worth pointing out only bash is supported right now

zonca commented 4 years ago

@bloer I was testing with debian which the the base of the Jupyter stack. is there documentation about the CVMFS CDMS somewhere where I can find more details?

bloer commented 4 years ago

@zonca the software stack provided by cern has some decent documentation here: http://lcgdocs.web.cern.ch/lcgdocs/lcgreleases/introduction/

The only documentation on the CDMS-specific stack is here: https://confluence.slac.stanford.edu/display/CDMS/Using+CDMS+software+releases and the source for the build tool: http://titus.stanford.edu:8080/git/summary/?r=CompInfrastructure/ReleaseBuilder.git

zonca commented 4 years ago

ReleaseBuilder is password protected

zonca commented 4 years ago

ok, my plan for now is to try slaclab/slac-jupyterhub, which is Centos7 and not too big. But first I need to fix #10

zonca commented 4 years ago

ok, got a temporary fix for #10, couldn't make slac-jupyterlab work.

So I rebuilt the whole Jupyter Docker stacks out of Centos 7 instead of Ubuntu.

I have a first deployment available for testing, see documentation at:

https://github.com/det-lab/jupyterhub-deploy-kubernetes-jetstream/tree/cdms_zonca

it has dummy authentication for now, so any username will work.

@pibion @bloer I emailed you the URL, you are welcome to share it, just do not post it anywhere public. Please let me know what is not working or in general any feedback.

zonca commented 4 years ago

ok, I merged the pull request, so that the documentation about the repository is on the homepage:

https://github.com/det-lab/jupyterhub-deploy-kubernetes-jetstream

For now we are done with this issue, will open new ones if we identify problems.