Single AZ Support - Githubissues

nickumia-reisys commented 2 years ago

This allows users to specify that the entire workload should be run within a single AWS Availability Zone for latency or other operational requirements.

New Additions:

Restrict Managed Node Group Node Creation to a single availability zone if single_az = true
Change volume binding for storage class to "WaitForFirstConsumer" to ensure it can be attached to nodes with the correct topology (specifically, within the same availability zone)

This is not the most optimal solution (primarily just a quick workaround). Things that must be considered long-term is auto-scaling specification to ensure new nodes are only created within the same existing availability zone and there aren't any other edge cases to consider. By forcing the managed node group to live within a single subnet, it forces nodes to be assigned solely to that subnet's availability zone.

References for creating an optional nested block:

Additional Background surrounding Storage Classes:

A cluster administrator can address this issue by specifying the WaitForFirstConsumer mode which will delay the binding and provisioning of a PersistentVolume until a Pod using the PersistentVolumeClaim is created. PersistentVolumes will be selected or provisioned conforming to the topology that is specified by the Pod's scheduling constraints. These include, but are not limited to, resource requirements, node selectors, pod affinity and anti-affinity, and taints and tolerations.

If a PV is scheduled in AZ A and a pod requires that PV and there are no available nodes in AZ A because new pods were higher in the scheduling queue. The pod will be stuck in a pending state until a node becomes available in AZ A where it's already provisioned volume exists. By setting "WaitForFirstConsumer " on the storage class, the PV won't be provisioned until the pod is scheduled and it's state is known to create a volume compatible with it.

FuhuXia commented 2 years ago

Is there any way to verify each node's AZ before and after the change?

nickumia-reisys commented 2 years ago

@FuhuXia There is a way to tell which nodes are in which AZ, but there isn't really a notion of "before" and "after" this change. If you had an existing cluster manually deployed with terraform apply and then re-apply with the new option of single_az=true, then you can check either the AWS Console or kubectl describe nodes to see where the nodes are. If this cluster was deployed through a Broker, you could do the same thing, but it would just be another layer of abstraction to workaround. You could inspect the cluster and then try to take advantage of https://github.com/GSA/data.gov/issues/3083 to try and upgrade an instance and then inspect it after. However, there isn't an automated way to inventory it right now. The best would be a command like the following,

nickumia@DL62-2-2MDD043:~/eks-brokerpak/terraform/modules/provision-aws$ kubectl describe node | grep "\(Name:\|topology.kubernetes.io/zone\)"
Name:               ***.compute.internal
                    topology.kubernetes.io/zone=us-west-2a
Name:               ***.compute.internal
                    topology.kubernetes.io/zone=us-west-2a
Name:               ***.compute.internal
                    topology.kubernetes.io/zone=us-west-2a
Name:               ***.compute.internal
                    topology.kubernetes.io/zone=us-west-2a
Name:               ***.compute.internal
                    topology.kubernetes.io/zone=us-west-2a
Name:               ***.compute.internal
                    topology.kubernetes.io/zone=us-west-2a

mogul commented 2 years ago

Noting here since it will probably get the right eyes: Karpenter added pod affinity support.

GSA-TTS / datagov-brokerpak-eks

Single AZ Support #93