2i2c-org / infrastructure

Infrastructure for configuring and deploying our community JupyterHubs.
https://infrastructure.2i2c.org
BSD 3-Clause "New" or "Revised" License
103 stars 63 forks source link

A docker-on-laptops on-ramp for 2i2c #1705

Open jameshowison opened 2 years ago

jameshowison commented 2 years ago

Context

In my discussions with faculty and IT there is great enthusiasm for teaching with the infrastructure that 2i2c provides. Yet I run into three concerns:

  1. Budget is not allocated per-course, cost management/potential overruns are a big concern for adjuncts (and everyone, but especially less powerful instructors)
  2. Preparation is made just in time for the semester. Coming to know and trust outside engineers and approaches in that timeframe is difficult.
  3. Autonomy in managing the infrastructure is important to being and being seen to be a responsive professor for students. This means that replying "I'm talking with the service providers" or "I'm not quite sure how it works" is not an answer faculty want to give their students.

Proposal

A "serverless" hub for individual courses. This can be immediately useful for individual faculty, and form an easy on-ramp to migrate to cloud setups.

My hunch is that a way to distribute class images to individual student laptops and have students run their image via local laptop Docker can provide this on-ramp, while re-using many of the approaches required to establish things for use on 2i2c infrastructure.

I think that the 2i2c workflow of generating and updating images, building and pushing to quay.io would be the same, then generating docker-compose.yml files for each student to check out and run on their laptops. Perhaps this is creating a repo per student, which could also be configured for mounting shared and individual data spaces.

Classes would need some set of approaches for doing the equivalent of nbgitpuller, such that the git pull eventually occurred on the laptop images (perhaps this is as simple as the instructor creating an nbgitpuller link which goes to a webpage or page in the shared class repo) and shows generated instructions to the students, "Go to Jupyter terminal on your laptop-hub, run cd xyz and git pull") By teaching the instructor to generate nbgitpuller links if the hub was moved to cloud then things would "just work".

Filling this out to have zero-cost, full autonomy classes but using the 2i2c approaches and then offering on-ramps to on-prem hosting and eventually to commercial cloud is what I'm getting at.

Seems like this might also offer a route for "Individual right to replicate" https://github.com/2i2c-org/infrastructure/issues/975 which was about helping students get their space down to their laptops for post-course or post-graduation work.

Updates and actions

No response

colliand commented 2 years ago

Thanks @jameshowison. We will reflect on the proposal to use docker images as on ramps. Some related conversations involving binder and Jupyterlite are ongoing.

With other universities, 2i2c has addressed the three issues using an approach used with other education technology platforms like the LMS. The LMS is a central service that is valuable to most (if not all) classes. Interactive computing is emerging as an important set of collaborative tools that can used in most (if not all) classes. U. Texas could work with 2i2c to launch a persistent Jupyter service for use in many classes that can be accessed by all students through SSO. 2i2c can also assist with launching and managing customized Jupyter services designed for use in particular classes with access restricted to students and instructors in those classes. Our developing shared responsibility model aims to define how the universities we serve and 2i2c can work together to provide support.

yuvipanda commented 2 years ago

Thank you very much for this thought through issue, @jameshowison!

A useful action item here to me sounds like: "Write documentation on how a end user can replicate the hub experience (including nbgitpuller link) on their local computer".

So let's say that we have the following assumptions:

  1. The end user (student) has docker desktop installed
  2. They already have a nbgitpuller link pointed towards the hub
  3. How do they run this locally instead?

Would a workflow for that support your use case, @jameshowison?

yuvipanda commented 2 years ago

If you run the following code on any 2i2c hub:

import os
print(f'docker run -p9000:9000 -it {os.environ["JUPYTER_IMAGE"]}  jupyter notebook --ip=0.0.0.0 --port=9000')

It should give you output like:

docker run -p 9000:9000 -it quay.io/2i2c/2i2c-hubs-image:69b1f9dff7c7  jupyter notebook --ip=0.0.0.0 --port=9000

If you run it on the terminal of any machine with docker desktop installed, it should pull the image and start a server. It should provide some output, ending with lines that look like:


    To access the notebook, open this file in a browser:
        file:///home/jovyan/.local/share/jupyter/runtime/nbserver-1-open.html
    Or copy and paste one of these URLs:
        http://e1047b107688:9000/?token=a13704b36331f073c869941e82104b0cd92c23f0bc41926d
     or http://127.0.0.1:9000/?token=a13704b36331f073c869941e82104b0cd92c23f0bc41926d

The first two options are lies, and won't work. The third one, however will! And you'll get a notebook (classic notebook is what it'll start at) that matches the exact environment you have in the hub.

Next step is the nbgitpuller link. If the link is, say: https://staging.2i2c.cloud/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2Fdata-8%2Fmaterials-x22&urlpath=lab%2Ftree%2Fmaterials-x22%2F (pointing to the staging 2i2c hub). You can modify it by *removing everything before /git-pull and writing out http://127.0.0.1:9000 instead. So the above link instead becomes:

http://127.0.0.1:9000/git-pull?repo=https%3A%2F%2Fgithub.com%2Fdata-8%2Fmaterials-x22&urlpath=lab%2Ftree%2Fmaterials-x22%2F

If you go here, it should pull the repo (in this case, https://github.com/data-8/materials-x22 as an example) and open it in lab.

This is missing a few things:

  1. No persistent storage. All user modifications are thrown out if they restart
  2. Additional things (like the postgres db we set up for utexas hub) won't show up here, as that becomes a more complex multi-container setup.

But, to a first approximation, is this the kinda workflow you are looking for?

jameshowison commented 2 years ago

Yes, @yuvipanda that is spot on. I think this shows the flexibility of 2i2c's approach. That way an instructor can have students up and running with two links, give or take. So the instructions for setup in class become:

  1. Install Docker Desktop
  2. Find terminal.app or windows equivalent
  3. Copy and execute docker run -p 9000:9000 -it quay.io/2i2c/2i2c-hubs-image:69b1f9dff7c7 jupyter notebook --ip=0.0.0.0 --port=9000
  4. Copy http://127.0.0.1:9000/?token=a13704b36331f073c869941e82104b0cd92c23f0bc41926d to your browser
  5. Click on a nbgitpuller link instructor provides on the daily webpage for the class (localhost-targeted)
  6. Get on with class.

Persistent storage is an interesting question. Most universities have some shared filespace (such as Box) which can be mounted via something like rclone (after some browser auth that drops a token on their laptop). I'm guessing that can be done so that the mount ends up on the docker container. Or students have a github repo for the class (for their own work) and they pull/push to that. Or perhaps there is some way to have cloud/on-prem storage and deployment of github volumes? Something using https://github.com/gluster/gluster-kubernetes or similar? Can you have a docker volume stored in something like Box?

Adapting for additional software (like postgres) is either shifting to a docker-compose approach with multiple images running in Docker on laptop, or just installing all the needed software into the class image (an approach that I suspect might be simpler when considering this as an on-ramp to 2i2c's infrastructure).

jameshowison commented 2 years ago

Each of the steps above might be simplified more:

  1. Install Docker Desktop (ok, that one can't ;)
  2. (and 3) Find terminal.app or equivalent, run docker run command.

Surely there is some way to run a docker run command via docker desktop that doesn't involve the command line. Some double-clickable file that Docker Desktop opens? Maybe there is a docker:// URL scheme that docker-desktop could be the handler for and therefore could just start up the container? Acknowledging there is the "don't make it too easy for remote code to execute on the laptop" general security issue here.

  1. Copy http://127.0.0.1:9000/?token= link. In theory this could be a link the students click on a webpage, if the token was available. An alternative might be to use a class password rather than the token. Then the students just click on a class link to http://127.0.0.1:9000/ and enter the shared password. If the password is shared in a class venue (e.g., Canvas) then perhaps not too insecure.
  2. nbgitpull link targeting localhost already seems as simple as possible :)

You might be wondering why I am trying to make this simpler than the already simple. Literally any time I give a sequence of actions for a class to follow where that sequence involves switching between multiple apps and copy and pasting things the class quickly gets desynchronized and I'm trying to help 15 students all at different stages. when I create guides with screenshots they never cover all the variations of OSes and apps, and all the software changes so I have to update that setup guide multiple times during the semester, let alone across semesters :)