Launch user sessions in multiple cluster from a single hub

choldgraf commented 2 years ago

Description of problem and opportunity to address it

Problem description

When communities have datasets or resources that are spread across multiple cloud locations (across data centers, cloud providers, etc), they currently must deploy one JupyterHub per location to provide access to the cloud resources that are there. This creates a few problems:

The hub configurations, user lists, etc are spread in multiple places, which creates unnecessary complexity to set up and operate
It means that all billing for the hub is tied to a single cloud account - whatever is paying for the hub's infrastructure
There is extra operational and set-up costs associated with running infrastructure on each of these providers

Proposed solution We should make it possible for a single hub to launch interactive sessions in multiple cloud locations, not only on the location where a hub is running.

This would allow communities to have a single hub as a "launch pad" for other kinds of infrastructure that is out there. It would reduce the complexity of running multiple hubs at once, and is potentially a way for communities to divide up their interactive sessions across billing accounts.

Implementation guide and constraints

Tech implementation

One likely candidate to make this possible is to define a new JupyterHub Spawner that knows how to talk to other Kubernetes clusters, along with some kind of process that can live on those clusters and "listen" for requests to launch interactive sessions. Then the spawner would request a session on a remote cluster, and direct the person there.

Considerations

What to do about filesystems for daily use? It will confuse people if the location where they launch a session also changes the files available to them.
- Could we treat one file system as the "source of truth" for them and encourage them to keep this one updated?
- Could we facilitate interaction with external filesystems like GitHub so they don't rely on NFS on a cluster to store their stuff?

Driving test cases

@rabernat has need for a few hubs that are similar flavors of a Pangeo hub. These are attached to a few different pots of money. Rather than providing one hub per test case, we could use this as an opportunity to prototype a multi-cluster launcher that is described here.

Updates and ongoing work

[ ] We've got a first version of the multi-cluster spawner here: https://github.com/yuvipanda/jupyterhub-multicluster-kubespawner
[ ] Next step is to deploy this on a hub setup. We're hoping to use the next set of hubs for @rabernat for this.
[ ] Currently waiting on cloud credits from Google that will power those hubs
[ ] We also have an offer of credits from the Azure Planetary Computer team. We should decide if we want to use them.

damianavila commented 2 years ago

The filesystem issue is key and probably not easy to solve. Wondering if there is some existing abstraction as well that could interact with underlying NFS layers from the different cloud providers... in that scenario, we would have a multispawner to select the node where you want to spawn and a multistorage to select where to persist the stuff you are working on. Alternatively, we could push on previously discussed @rabernat's idea about riding without a "filesystem" and change people's filesystem-based mindset on the way (which would be the most difficult thing, IMHO).

yuvipanda commented 2 years ago

Unfortunately cross-DC NFS is not really viable for reliability, performance and security reasons :(

I think step 1 would likely just involve a per-cluster home directory. We could augment it with a shared directory that is sync'd across all the clouds, via either FUSE or something like https://rclone.org/.

I've made a release of the spawner already at https://github.com/yuvipanda/jupyterhub-multicluster-kubespawner, and am waiting for cloud credits to land up before I can do a deployment.

consideRatio commented 2 years ago

2i2c team sprint meeting notes:

Colombia "LEAP" project credits has arrived to an GCP account
Yuvi could start working on this next week: to setup a GCP based cluster

choldgraf commented 2 years ago

Update: pinning this one for a bit

@yuvipanda and I just had a conversation about this work, and we agreed that it'd be best to prioritize some other development efforts first before we complete this one, especially since the LEAP hub needed to be deployed quickly enough that we just did it "the old fashioned way".

We're going to focus on these two pieces

And will re-visit this one at a later date.

2i2c-org / features