Closed yuvipanda closed 4 months ago
I think there's a language problem here (and in #4486) of tags vs. labels, both of which exist. As I understand it, tags operate at the cloud vendor level, but labels can be used as selectors at the kubernetes level. If we want pods to be spun up in specific node pools, we definitely want to be using labels. But I don't know if the cost-tracking system we are going to use will be looking at cloud tags or kubernetes labels.
In the end, neither of which are costly to apply so I will probably just do both.
I'm attempting this, but I have no idea where to put the node_selector.2i2c/hub-name: <hub-name>
value. I've have to copy the whole profile list of image options out of common values file because kubespawner_override.node_selector
is a true override and doesn't merge with singleuser.nodeSelector
. Also helm overwriting lists means I can't merge config that what either. #4499 represents what I've tried for staging but it doesn't work.
kubespawner_override.node_selector is a true override and doesn't merge with singleuser.nodeSelector
If they are dictionaries (rather than lists), they should merge (since https://github.com/jupyterhub/kubespawner/pull/650). So your instinct to put it in singleuser.nodeSelector
is correct. You can also try hub.config.KubeSpawner.node_selector
, although it should be the same as singleuser.nodeselector.
it doesn't work.
Can you provide more detail?
This is using my first instinct to add singleuser.nodeSelector
. We're basically not triggering the new node pool(s) at all. Currently deployed config is in #4499
After the outcome of the spike in https://github.com/2i2c-org/infrastructure/issues/4465, we are going to give each hub its own nodepool that is properly tagged to track cost on a per-hub basis.
staging
,prod
andworkshop
2i2c:hub-name
to match the name of the hub2i2c:node-purpose
set touser
Definition of done
tag
2i2c:hub-name
and2i2c:node-purpose
on all the nodes spawned when users log on to the hub. You can verify this by looking at EC2 instances on the AWS console.Trade-offs
Since our health check triggers a user spawn, this means that instead of spawning 1 node when we trigger deploys on all of the hubs, we will trigger 3 separate nodes. This is fine - the autoscaler reclaims them after ~10min, and even with the largest nodes that doesn't cost enough to be a problem.
Out of scope
dask-gateway is out of scope here, and handled by https://github.com/2i2c-org/infrastructure/issues/4485