Azure / AKS

Azure Kubernetes Service
1.92k stars 284 forks source link

[BUG] user nodes are not available when upgrade to 1.29.2 #4211

Open hugo-zhang-uipath opened 1 month ago

hugo-zhang-uipath commented 1 month ago

Describe the bug When upgrade AKS to 1.29.2, system node is able to upgrade to 1.29.2. However, user nodes are stuck in 0 ready nodes. I checked the vmss instances in MC resource group. There are new instances got created but somehow the new instances not getting recognized by the node pool.

To Reproduce Steps to reproduce the behavior:

  1. az aks upgrade --resource-group myResourceGroup --name myAKSCluster --kubernetes-version 1.29.2
  2. see a warning is node pools in azure portal saying node cannot be registered to the node pool timeout after 15m

Expected behavior New instance in vmss should be recognized by node pool.

firefixmaarten commented 3 weeks ago

I also see similar issues. I see issues in the cluster autoscaler:

I0425 09:54:52.314127 1 azure_template.go:114] Fetching instance information for SKU: Standard_B2als_v2 from SKU API I0425 09:54:52.314494 1 azure_template.go:125] Falling back to static SKU list for SKU: Standard_B2als_v2 I0425 09:54:52.314592 1 azure_template.go:134] Instance type \"Standard_B2als_v2\" not supported, err: instance type \"Standard_B2als_v2\" not supported E0425 09:54:52.314608 1 mixed_nodeinfos_processor.go:160] Unable to build proper template node for aks-pipelines1-33165521-vmss: instance type \"Standard_B2als_v2\" not supported E0425 09:54:52.314622 1 static_autoscaler.go:384] Failed to get node infos for groups: instance type \"Standard_B2als_v2\" not supported

So somehow nodes that worked in 1.28 fail to work in 1.29.

I noticed they were not officially suggested, but not that they don't work: https://learn.microsoft.com/en-us/azure/aks/quotas-skus-regions

At first glance I cannot find anything about it in the recent release notes: https://github.com/Azure/AKS/releases?page=1

Unrelated, was due to capacity issues in West-Europe for that specific VM type. Although you might hit the same issue under the hood ;).