2i2c-org / features

Temporary location for feature requests sent to 2i2c
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

Proto-goal: Allow hub admins to modify profileLists #26

Open yuvipanda opened 1 year ago

yuvipanda commented 1 year ago

Context

The current jupyterhub-configurator was built to reduce toil for the 2i2c engineers, allowing some config changes to be done directly by JupyterHub admins. In particular, the primary change was to allow them to change the image used - this removes the need for 2i2c engineers to be involved in image changes. JupyterHub admins can build their own images, test them in staging, and move them to production without any effort from us.

Problem

Alas, the configurator does not support using KubeSpawner's profileList feature! This feature is now heavily used to allow end users to select different machine types and images without having to bother the admins, and any hub using this can not use the configurator. This leads to toil for the 2i2c engineers and an extra step for hub admins, sometimes with back and forth PRs like (https://github.com/2i2c-org/infrastructure/pull/2556, https://github.com/2i2c-org/infrastructure/pull/2560, https://github.com/2i2c-org/infrastructure/pull/2567, https://github.com/2i2c-org/infrastructure/pull/2550, https://github.com/2i2c-org/infrastructure/pull/2551). Even without back and forth, it causes extra toil for support and hub admins, like in https://github.com/2i2c-org/infrastructure/pull/2547.

Problem with the configurator

I built the current configurator a few years ago, with the goal of it being usable by all JupyterHub installations. As such, it was quite generic - it rendered a JSON Schema into a frontend form, and theoretically allowed any kubespawner traitlet to be configured via it. However, in practice, I fundamentally believe this is a dead end now.

  1. It is too generic. There isn't really a way to allow people to easily pick from a curated list of images, nor to validate that images specified are actually images. We even run into problems with trailing spaces here.
  2. Some of these could be fixed by custom JS - however, this requires investment and work in frontend react + JS based work. Not a skillset that is particularly prevalent in our community.
  3. It's structured as a modern JS app, with a Python frontend. This provides a lot of flexibility in terms of UX, but comes at a cost in complexity in developing the application. Most changes would require touching python and JS, and this adds to the complexity quite a bit.
  4. There is absolutely no version control here, so JupyterHub admins can not 'revert' to a known working configuration in any way or form. This makes changes more risky.

So while we theoretically could continue with the current configurator and put effort into 'fixing it up', I believe it's an architectural dead end and we should explore other options.

Proposed solution

Approach + Prototype

Django is an extremely popular and well supported web framework with a built in Admin functionality. This allows us to define complex models in Python code, and have django automatically generate UI for it. This UI won't be as polished as what we could build ourselves, but it works pretty well and is easy to do. I spent a few hours prototyping this today, and here are some results:

"Add an Image" screen, allowing JupyterHub admins to manage different images available

image

"Add a profile" screen, allowing JupyterHub admins to add different profile options

image

Built in error validation for uniqueness constraints, preventing a profile from listing same image twice

image

Version control of config changes, with ability to roll back (via django-reversion)

Screen Shot 2023-05-23 at 5 20 17 PM image image

Profiles can pick from several node options provided by infrastructure admins as config

image

If you only associate one nodegroup with a profile, users will not be offered a choice in the profile selection. If a profile has multiple nodegroups associated with it, users will be offered a dropdown choice.

JSON Output that can be fed directly to kubespawner

When only one image is available

{
  "profile_list": [
    {
      "display_name": "testing-something",
      "slug": "testing-something",
      "default": false,
      "kubespawner_override": {
        "image": "pangeo/pangeo-notebook:2015-05-03"
      }
    }
  ]
}

When multiple images are available

{
  "profile_list": [
    {
      "display_name": "testing-something",
      "slug": "testing-something",
      "default": false,
      "kubespawner_override": {},
      "profile_options": {
        "image": {
          "display_name": "Image",
          "choices": {},
          "pangeo-notebook": {
            "display_name": "Pangeo Notebook",
            "kubespawner_override": {
              "image": "pangeo/pangeo-notebook:2015-05-03"
            },
            "default": false
          },
          "test": {
            "display_name": "Test",
            "kubespawner_override": {
              "image": "ts:tress"
            },
            "default": false
          }
        }
      }
    }
  ]
}

Validate that the image actually exists

image

We use skopeo to check if the image exists. This restricts us to public images for now, but that's status quo too.

The code for this is here: https://github.com/yuvipanda/z2jh-configurator. It's an initial prototype, but it currently supports:

  1. Logging in via JupyterHub auth, and allowing only JupyterHub admins to access the django admin interface. This is possible via python-social-auth and its good support for django.
  2. Run via uvicorn, the same web server we use to run the configurator right now
  3. Pythonic model definitions that are the source of truth for the UI. Note that these are all z2jh specific, and allow us to do far more useful things than something that is broadly applicable to all of JupyterHub.
  4. There is already a log of changes done to the models via the admin UI, and with django-reversion we can also provide version control + reverts.
  5. sqlite based database for storing this config, very similar to what JupyterHub does.
  6. Output JSON unauthenticated that can be fed directly into kubespawner's profile_list
  7. Validation for image existing in the registry
  8. List Node Groups that profiles can be spawned on to are provided as config by Infrastructure Admins, to match what is available in the underlying kubernetes cluster.

Right to Replicate Concerns

This project will be specific to z2jh, not to 2i2c infrastructure. We will run this as a sidecar in the hub pod (similar to how we run the configurator right now), so when communities decide to leave they can continue to use the same UI they are used to.

Upstreaming concerns

We develop this on its own repo (not as part of 2i2c-org/infrastructure), with the aim of being tied only to z2jh and no 2i2c dependencies. This overlaps with the R2R concerns, and primarily means we have an instance of this per hub, rather than a global multitenant instance. I think the primarily-python nature of this will also help with upstreaming and broader adoption.

Definition of 'done'

The goal is to have zero PRs made to this repo for profileList changes.

Possible technical implementation steps

Looking through our profileLists, I think the most common things we provide are:

  1. memory & cpu guarantees / limits
  2. Multiple choice of images
  3. Multiple choice of node selectors

I think we should set the goal to provide these options as modifiable in this admin UI, and roll it out to our end users. In addition, the current set of things supported by the configurator - image for non-profileList use case, as well as the 'default UI' should also be supported.

Future work

This can also help us in the future provide a curated set of images for folks to choose from with an appropriate cadence of updates without having to trouble us at all.

yuvipanda commented 1 year ago

Note that while I prototyped this up, that doesn't necessarily mean I lead the actual implementation of this.

jmunroe commented 1 year ago

I was just trying to compare some different images for use on researchdelight. What I wanted was to be able to different images at the same time. If this goal gets implemented it is far better than trying make PRs on our infrastructure hub to update the profile.

consideRatio commented 1 year ago

We develop this on its own repo (not as part of 2i2c-org/infrastructure), with the aim of being tied only to z2jh and no 2i2c dependencies.

:100:

I suggest the aim is made stricter to reduce coupling to 2i2c further as an open source project. I'm thinking of for example committing to not writing docs coupling to 2i2c's basehub helm chart, and overall writing content to the git repo it as if it would have lived independently from 2i2c, in for example the jupyterhub github organization.