2i2c-org / infrastructure

Infrastructure for configuring and deploying our community JupyterHubs.
https://infrastructure.2i2c.org
BSD 3-Clause "New" or "Revised" License
106 stars 65 forks source link

aws clusters: existing clusters to use eks managed addons and gp3 storage class with volume expansion by default #5129

Closed consideRatio closed 3 days ago

consideRatio commented 3 days ago

With https://github.com/2i2c-org/infrastructure/pull/4769 we use a gp3 storage class by default with allowVolumeExpansion also set by default. This PR and manual updates in all EKS clusters makes use that storage class for new dynamically provisioned volumes going forward also in existing clusters.


I used the following procedure when deploying this:

  1. Update .jsonnet file Add tags in jsonnet cluster's metadata, update addons list, save file
  2. Create and update addons and use --force to transition to EKS managed addons
    jsonnet $CLUSTER_NAME.jsonnet > $CLUSTER_NAME.eksctl.yaml
    eksctl get addon --config-file=$CLUSTER_NAME.eksctl.yaml
    eksctl create addon --force --config-file=$CLUSTER_NAME.eksctl.yaml
    eksctl update addon --force --config-file=$CLUSTER_NAME.eksctl.yaml
    eksctl get addon --config-file=$CLUSTER_NAME.eksctl.yaml
  3. With a new gp3 based storage class with allowVolumeExpansion etc available by default, disable the old gp2 storageclass as we otherwise have two defaults

    deployer use-cluster-credentials $CLUSTER_NAME
    kubectl get storageclass
    kubectl annotate storageclass gp2 storageclass.kubernetes.io/is-default-class-
    kubectl get storageclass
consideRatio commented 3 days ago

EKS addon migration, self-managed -> managed

All clusters EKS addons was installed in a "self-managed" way, where eksctl installed them directly, instead of eksctl having EKS install them as managed. The eksctl self-managed way of installing was deprecated and not how new clusters were setup, so it made things consistent in the new system of only using EKS managed addons.

In practice this change meant that when looking at the AWS UI, we see four addons listed, while we previously only saw one - the one declared in the addons list in our eksctl config.

This is how it looks now, but before we only saw Amazon EBS CSI Driver listed there:

image

As part of this, I removed now irrelevant docs about upgrading the legacy self-managed EKS plugins, as all EKS managed plugins are updated in a single command going onward.

Default storageclass with GP3 and allowVolumeExpansion

When I setup nmfs-openscapes, I learned that the gp2 storageclass wasn't provided in the k8s cluster as a Default storage class any more, and that meant that storage provisioning for prometheus disk for example failed as there were no default storageclass available.

Due to that, I historically updated the template.jsonnet to do something that was also done for nmfs-openscapes when I set it up --- I configured the Amazon EBS CSI Driver addon to create a k8s StorageClass resource for us and declare it as the default.

In this PR I made this systematic for existing clusters as well. Instead of having some clusters with only gp2 storageclass, and sometimes with allowVolumeExpansion set true, I made us get a new StorageClass providing gp3 disks with allowVolumeExpansion set true.

Disabling of networkPolicy that failed to activate

In nmfs-openscapes I had trialed use of network policy enforcement with the vpc-cni EKS addon, which could be enabled for a EKS managed installation of it (but not the previous deprecated eksctl self-managed variant). However, activating that didn't seem to do the trick, so I now disabled it as well. This was in nmfs-openscapes, and because I set it in a template.jsonnet, also for the recently created strudel cluster.