2i2c-org / infrastructure

Infrastructure for configuring and deploying our community JupyterHubs.
https://infrastructure.2i2c.org
BSD 3-Clause "New" or "Revised" License
108 stars 65 forks source link

Move each hub to its own nodegroup on the openscapes cluster #4482

Closed yuvipanda closed 4 months ago

yuvipanda commented 4 months ago

After the outcome of the spike in https://github.com/2i2c-org/infrastructure/issues/4465, we are going to give each hub its own nodepool that is properly tagged to track cost on a per-hub basis.

Definition of done

Trade-offs

Since our health check triggers a user spawn, this means that instead of spawning 1 node when we trigger deploys on all of the hubs, we will trigger 3 separate nodes. This is fine - the autoscaler reclaims them after ~10min, and even with the largest nodes that doesn't cost enough to be a problem.

Out of scope

dask-gateway is out of scope here, and handled by https://github.com/2i2c-org/infrastructure/issues/4485

sgibson91 commented 4 months ago

I think there's a language problem here (and in #4486) of tags vs. labels, both of which exist. As I understand it, tags operate at the cloud vendor level, but labels can be used as selectors at the kubernetes level. If we want pods to be spun up in specific node pools, we definitely want to be using labels. But I don't know if the cost-tracking system we are going to use will be looking at cloud tags or kubernetes labels.

In the end, neither of which are costly to apply so I will probably just do both.

sgibson91 commented 4 months ago

I'm attempting this, but I have no idea where to put the node_selector.2i2c/hub-name: <hub-name> value. I've have to copy the whole profile list of image options out of common values file because kubespawner_override.node_selector is a true override and doesn't merge with singleuser.nodeSelector. Also helm overwriting lists means I can't merge config that what either. #4499 represents what I've tried for staging but it doesn't work.

yuvipanda commented 4 months ago

kubespawner_override.node_selector is a true override and doesn't merge with singleuser.nodeSelector

If they are dictionaries (rather than lists), they should merge (since https://github.com/jupyterhub/kubespawner/pull/650). So your instinct to put it in singleuser.nodeSelector is correct. You can also try hub.config.KubeSpawner.node_selector, although it should be the same as singleuser.nodeselector.

it doesn't work.

Can you provide more detail?

sgibson91 commented 4 months ago

Image

This is using my first instinct to add singleuser.nodeSelector. We're basically not triggering the new node pool(s) at all. Currently deployed config is in #4499