Azure / AKS

Azure Kubernetes Service
https://azure.github.io/AKS/
1.96k stars 306 forks source link

[Feature] Set zoneBalance of virtualMachineScaleSets to true if multiple zones are selected #3279

Open rgarcia89 opened 1 year ago

rgarcia89 commented 1 year ago

Is your feature request related to a problem? Please describe. I am a bit frusted with the AKS logic when it comes to creating a cluster that spreads over multiple availability zones. My problem is that I want a 3 node cluster to be spreaded over zone 1, 2, 3. So that each zones hosts one node. However things are looking differently currently as zoneBalance in the virtualMachineScaleSets is set to false. This leads to the following:

NAME                                  REGION               ZONE
aks-workerpool1-39166192-vmss000000   germanywestcentral   germanywestcentral-2
aks-workerpool1-39166192-vmss000001   germanywestcentral   germanywestcentral-3
aks-workerpool1-39166192-vmss000003   germanywestcentral   germanywestcentral-2

So I am not sure if I am missing something. We are deploying our clusters using terraform. Unfortunately there is also no possibility to define it there. Do you have any idea?

Describe the solution you'd like I would love to see having strictly node per zone deployment. For that from my understanding zoneBalance needs to be set to true.

Describe alternatives you've considered I could of course create one pool per zone and within that a one node. However, that would then cause me problems with the automatic updates again. Because currently we make use of the automatic update of the nodes on kubernetes version patch level.

roy-work commented 1 year ago

My org has been struggling with this since about June, when we first really tried to tackle this problem in a disciplined manner. I too am more than a bit frustrated by this.

Azure: the entire point of being able to request a nodepool over 3 zones is to make a service fault-tolerant to failures within a single zone. Getting a balanced nodepool is absolutely key to surviving an AZ outage. Like, the only people setting AZ settings on a nodepool are those who are interesting in surviving an AZ outage, and that it's a silent failure (the nodepool allocation succeeds, and unless you double check that Azure did it correctly, you'll never notice that it hasn't) makes it even worse.

And I see the same as the above poster: I request all 3 zones, and I get…

» k --context devportal-admin describe no aks-nodepool2-16907530-vmss000000 aks-nodepool2-16907530-vmss000001 aks-nodepool2-16907530-vmss000003 | grep 'topology.k.*zone='
                    topology.kubernetes.io/zone=eastus-1
                    topology.kubernetes.io/zone=eastus-2
                    topology.kubernetes.io/zone=eastus-1

Not what I asked for.

I could of course create one pool per zone and within that a one node.

I actually don't think this works. The VMSS documentation states,

With best-effort zone balance, the scale set attempts to scale in and out while maintaining balance. However, if for some reason this is not possible (for example, if one zone goes down, the scale set cannot create a new VM in that zone), the scale set allows temporary imbalance to successfully scale in or out.

The implication of this being that imbalanced zones, like the ones seen here, should only arise in failure conditions. (The problem, as we'll see, is that Azure is in a state of perpetual failure.)

So, let's that zone, and find out:

» az aks nodepool add --name nodepool4 --cluster-name example --resource-group example --node-vm-size Standard_D8as_v5 --zones 3 --node-count 1 --enable-ultra-ssd --mode System --max-pods 250
(ReconcileVMSSAgentPoolFailed) We are unable to serve this request due to an internal error, Correlation ID: 5b6a1975-6832-4d6a-a565-7e246f4de70d, Operation ID: 73316f60-bc86-4332-b2ff-a503f2361640, Timestamp: 2023-02-14T15:05:01Z.
Code: ReconcileVMSSAgentPoolFailed
Message: We are unable to serve this request due to an internal error, Correlation ID: 5b6a1975-6832-4d6a-a565-7e246f4de70d, Operation ID: 73316f60-bc86-4332-b2ff-a503f2361640, Timestamp: 2023-02-14T15:05:01Z.

Internal error. 🤦

In this case, the VMSS does end up getting created, but its provisioning state is "Failed", as we've forced its hand at provisioning in the zone we want. If you click into the failed provisioning banner, you can get at the real reason:

Allocation failed. We do not have sufficient capacity for the requested VM size in this zone.

rgarcia89 commented 1 year ago

Someone from the AKS team could please confirm that they have at least taken notice of this feature request?

buehlmann commented 1 year ago

Dear AKS team: any comments/updates on this issue?

microsoft-github-policy-service[bot] commented 6 months ago

Action required from @Azure/aks-pm

microsoft-github-policy-service[bot] commented 5 months ago

Issue needing attention of @Azure/aks-leads

microsoft-github-policy-service[bot] commented 5 months ago

Issue needing attention of @Azure/aks-leads

microsoft-github-policy-service[bot] commented 4 months ago

Issue needing attention of @Azure/aks-leads

microsoft-github-policy-service[bot] commented 4 months ago

Issue needing attention of @Azure/aks-leads

microsoft-github-policy-service[bot] commented 3 months ago

Issue needing attention of @Azure/aks-leads

microsoft-github-policy-service[bot] commented 3 months ago

Issue needing attention of @Azure/aks-leads

meappy commented 2 months ago

This may not be ideal but I could technically set the zoneBalance key to true for the AKS managed VMSS with the az vmss command like so:

Check before

❯ az vmss show -g <aks vmss resource group> -n <aks vmss name> -o json | jq '.zoneBalance'
false

Update the zoneBalance key

❯ az vmss update -g <aks vmss resource group> -n <aks vmss name> --set zoneBalance=true

Check after

❯ az vmss show -g <aks vmss resource group> -n <aks vmss name> -o json | jq '.zoneBalance'
true

It looks like the AKS API does not allow the setting of zoneBalance property, but manipulating it with the VMSS API allows for this.

Again this is not ideal and I can see proplems with this when the infrastructure is managed by external IaaC tools such as Terraform. Also I note that this is not what OP intended to ask, which was the ability to set zoneBalance during AKS cluster creation.

But my point for this would be, if the VMSS API is able to do so surely the AKS API should allow this as well? Can someone comment?

microsoft-github-policy-service[bot] commented 2 months ago

Issue needing attention of @Azure/aks-leads

microsoft-github-policy-service[bot] commented 1 month ago

Issue needing attention of @Azure/aks-leads

microsoft-github-policy-service[bot] commented 1 month ago

Issue needing attention of @Azure/aks-leads

microsoft-github-policy-service[bot] commented 3 weeks ago

Issue needing attention of @Azure/aks-leads

microsoft-github-policy-service[bot] commented 6 days ago

Issue needing attention of @Azure/aks-leads