2i2c-org / infrastructure

Infrastructure for configuring and deploying our community JupyterHubs.
https://infrastructure.2i2c.org
BSD 3-Clause "New" or "Revised" License
106 stars 65 forks source link

Answer questions about what our needs for kubernetes on Jetstream are #4452

Closed yuvipanda closed 4 months ago

yuvipanda commented 4 months ago

As part of our project pythia grant (https://github.com/2i2c-org/meta/issues/769 has more information), we keep an eye on how we can better support running infrastructure on Jetstream2.

As part of this, we have been asked to provide information on what our needs with respect to managed kubernetes are. This issue tracks these questions, and helps us provide answers in a central location.

Area Need Explanation Hard Requirement?
Autoscaler Cluster Autoscaler, with scale to 0 Nodes must only be spun up when users needs them, so scale to 0 is required. Cluster autoscaler should also be able to discover all the nodepools in use and their sizes, without us needing to keep that information in sync yes
Nodepools Multiple nodepools, with different machine profiles, labels and taints Needed so we can match user memory requirements / GPU requirements better to node sizes yes
StorageClass dynamic provisioner Required, with volume expansion support we need PVCs to be bound to durable disks for storage of hub databases, prometheus, etc yes
k8s API HA Standard 3 replica HA master no
k8s master upgrades Master upgrades must be managed by the infrastructure provider Ideally we would be able to initiate it so we can schedule it yes
k8s node upgrades Master upgrades must be managed by the infrastructure provider Ideally we would be able to initiate it so we can schedule it yes
k8s API network restrictions k8s API must be available from the internet It should be protected with strong authentication so this can be secure. Needed as we deploy from GitHub actions and our local machines. No bastions
LoadBalancer support We set up our own nginx-ingress inside the cluster, and it needs a single IP / CNAME we can point DNS records to yes
Network Policy Enforcement Helpful if available Is an additional layer of security, rather than the primary layer of security. JupyterHub user pods are protected via oauth2 by default no
k8s API auth methods Needs auth for human users as well as a GitHub Actions based bot user yes
Unattended updates for base OS Preferred, especially if managed via a container optimized OS rather than classic Ubuntu / Debian yes
GPU support automated driver installation would be ideal, as drivers and hardware are often tightly coupled We could manually manage it if necessary no

While not absolutely complete, this is a start!

yuvipanda commented 4 months ago

This was requested by @jmunroe and @julianpistorius.

julianpistorius commented 4 months ago

Thank you for this @yuvipanda! I'm sure we'll fill any remaining gaps soon. I think this is enough for us to go on for now.

yuvipanda commented 4 months ago

yw, @julianpistorius :)

julianpistorius commented 2 months ago

@yuvipanda How negotiable is the 'scale to 0' requirement? Does this requirement spring from a requirement to save communities money when they use commercial cloud? Or is there some other critical reason for requiring this?

Background: The cluster autoscaler for Jetstream2's managed Kubernetes service can't scale to 0 (yet). However the Jetstream2 resources are provided without charge to qualifying US-based researchers, so hopefully that makes this less of an issue.

yuvipanda commented 2 months ago

@julianpistorius it's primarily because we try to offer multiple machine size options via different node pools, and if they don't scale to 0 we don't have just 1 node running empty but 3-4. So we'd have to shift how we offer options for folks to spawn on. There's also energy usage concerns with leaving machines on that aren't being used, which is particularly significant for GPU instances.

So I'd say that:

  1. Ideally, we'd want scale to zero
  2. But it doesn't block initial deployment, we will have to make adjustments in the profile options we offer. There are some energy consumption concerns to keep in mind.
  3. It's definitely is a blocker for any GPU deployments.

I hope that helps answer the question!

julianpistorius commented 2 months ago

Thank you for explaining the rationale @yuvipanda! That helps answer my question, and makes a lot of sense.

Even though the OpenStack Cluster API provider doesn't explicitly support autoscaling to zero: https://github.com/kubernetes-sigs/cluster-api-provider-openstack/issues/1328

.. according to https://cluster-api.sigs.k8s.io/tasks/automated-machine-management/autoscaling.html#scale-from-zero-support it might still be possible:

If your Cluster API provider does not have support for scaling from zero, you may still use this feature through the capacity annotations. You may add these annotations to your MachineDeployments, or MachineSets if you are not using MachineDeployments (it is not needed on both), to instruct the cluster autoscaler about the sizing of the nodes in the node group. At the minimum, you must specify the CPU and memory annotations, these annotations should match the expected capacity of the nodes created from the infrastructure.

For example, if my MachineDeployment will create nodes that have “16000m” CPU, “128G” memory, “100Gi” ephemeral disk storage, 2 NVidia GPUs, and can support 200 max pods, the following annotations will instruct the autoscaler how to expand the node group from zero replicas:

apiVersion: cluster.x-k8s.io/v1alpha4
kind: MachineDeployment
metadata:
  annotations:
    cluster.x-k8s.io/cluster-api-autoscaler-node-group-max-size: "5"
    cluster.x-k8s.io/cluster-api-autoscaler-node-group-min-size: "0"
    capacity.cluster-autoscaler.kubernetes.io/memory: "128G"
    capacity.cluster-autoscaler.kubernetes.io/cpu: "16"
    capacity.cluster-autoscaler.kubernetes.io/ephemeral-disk: "100Gi"
    capacity.cluster-autoscaler.kubernetes.io/gpu-type: "nvidia.com/gpu"
    capacity.cluster-autoscaler.kubernetes.io/gpu-count: "2"
    capacity.cluster-autoscaler.kubernetes.io/maxPods: "200"

I'll work with @sd109 and @mkjpryor from @StackHPC to see what's possible on Jetstream2.