Answer questions about what our needs for kubernetes on Jetstream are

yuvipanda commented 4 months ago

As part of our project pythia grant (https://github.com/2i2c-org/meta/issues/769 has more information), we keep an eye on how we can better support running infrastructure on Jetstream2.

As part of this, we have been asked to provide information on what our needs with respect to managed kubernetes are. This issue tracks these questions, and helps us provide answers in a central location.

Area	Need	Explanation	Hard Requirement?
Autoscaler	Cluster Autoscaler, with scale to 0	Nodes must only be spun up when users needs them, so scale to 0 is required. Cluster autoscaler should also be able to discover all the nodepools in use and their sizes, without us needing to keep that information in sync	yes
Nodepools	Multiple nodepools, with different machine profiles, labels and taints	Needed so we can match user memory requirements / GPU requirements better to node sizes	yes
`StorageClass` dynamic provisioner	Required, with volume expansion support	we need PVCs to be bound to durable disks for storage of hub databases, prometheus, etc	yes
k8s API HA	Standard 3 replica HA master		no
k8s master upgrades	Master upgrades must be managed by the infrastructure provider	Ideally we would be able to initiate it so we can schedule it	yes
k8s node upgrades	Master upgrades must be managed by the infrastructure provider	Ideally we would be able to initiate it so we can schedule it	yes
k8s API network restrictions	k8s API must be available from the internet	It should be protected with strong authentication so this can be secure. Needed as we deploy from GitHub actions and our local machines. No bastions
LoadBalancer support		We set up our own `nginx-ingress` inside the cluster, and it needs a single IP / CNAME we can point DNS records to	yes
Network Policy Enforcement	Helpful if available	Is an additional layer of security, rather than the primary layer of security. JupyterHub user pods are protected via oauth2 by default	no
k8s API auth methods	Needs auth for human users as well as a GitHub Actions based bot user		yes
Unattended updates for base OS	Preferred, especially if managed via a container optimized OS rather than classic Ubuntu / Debian		yes
GPU support	automated driver installation would be ideal, as drivers and hardware are often tightly coupled	We could manually manage it if necessary	no

While not absolutely complete, this is a start!

yuvipanda commented 4 months ago

This was requested by @jmunroe and @julianpistorius.

julianpistorius commented 4 months ago

Thank you for this @yuvipanda! I'm sure we'll fill any remaining gaps soon. I think this is enough for us to go on for now.

yuvipanda commented 4 months ago

yw, @julianpistorius :)

julianpistorius commented 2 months ago

@yuvipanda How negotiable is the 'scale to 0' requirement? Does this requirement spring from a requirement to save communities money when they use commercial cloud? Or is there some other critical reason for requiring this?

Background: The cluster autoscaler for Jetstream2's managed Kubernetes service can't scale to 0 (yet). However the Jetstream2 resources are provided without charge to qualifying US-based researchers, so hopefully that makes this less of an issue.

yuvipanda commented 2 months ago

@julianpistorius it's primarily because we try to offer multiple machine size options via different node pools, and if they don't scale to 0 we don't have just 1 node running empty but 3-4. So we'd have to shift how we offer options for folks to spawn on. There's also energy usage concerns with leaving machines on that aren't being used, which is particularly significant for GPU instances.

So I'd say that:

Ideally, we'd want scale to zero
But it doesn't block initial deployment, we will have to make adjustments in the profile options we offer. There are some energy consumption concerns to keep in mind.
It's definitely is a blocker for any GPU deployments.

I hope that helps answer the question!

julianpistorius commented 2 months ago

Thank you for explaining the rationale @yuvipanda! That helps answer my question, and makes a lot of sense.

Even though the OpenStack Cluster API provider doesn't explicitly support autoscaling to zero: https://github.com/kubernetes-sigs/cluster-api-provider-openstack/issues/1328

.. according to https://cluster-api.sigs.k8s.io/tasks/automated-machine-management/autoscaling.html#scale-from-zero-support it might still be possible:

If your Cluster API provider does not have support for scaling from zero, you may still use this feature through the capacity annotations. You may add these annotations to your MachineDeployments, or MachineSets if you are not using MachineDeployments (it is not needed on both), to instruct the cluster autoscaler about the sizing of the nodes in the node group. At the minimum, you must specify the CPU and memory annotations, these annotations should match the expected capacity of the nodes created from the infrastructure.

For example, if my MachineDeployment will create nodes that have “16000m” CPU, “128G” memory, “100Gi” ephemeral disk storage, 2 NVidia GPUs, and can support 200 max pods, the following annotations will instruct the autoscaler how to expand the node group from zero replicas:
apiVersion: cluster.x-k8s.io/v1alpha4
kind: MachineDeployment
metadata:
  annotations:
    cluster.x-k8s.io/cluster-api-autoscaler-node-group-max-size: "5"
    cluster.x-k8s.io/cluster-api-autoscaler-node-group-min-size: "0"
    capacity.cluster-autoscaler.kubernetes.io/memory: "128G"
    capacity.cluster-autoscaler.kubernetes.io/cpu: "16"
    capacity.cluster-autoscaler.kubernetes.io/ephemeral-disk: "100Gi"
    capacity.cluster-autoscaler.kubernetes.io/gpu-type: "nvidia.com/gpu"
    capacity.cluster-autoscaler.kubernetes.io/gpu-count: "2"
    capacity.cluster-autoscaler.kubernetes.io/maxPods: "200"

I'll work with @sd109 and @mkjpryor from @StackHPC to see what's possible on Jetstream2.

2i2c-org / infrastructure

Answer questions about what our needs for kubernetes on Jetstream are #4452