Open slzmruepp opened 1 year ago
@kaarthis @chandraneel please take a look
Action required from @Azure/aks-pm
Issue needing attention of @Azure/aks-leads
I think, @slzmruepp, you should be able to work around this issue with az aks nodepool upgrade
, see https://learn.microsoft.com/en-us/azure/aks/use-multiple-node-pools#upgrade-a-node-pool
This did not help. The only way this could be mitigated was to update the nodepool version through the rest api by putting the proper version in postman:
Endpoint:
https://management.azure.com/subscriptions/{{subscriptionId}}/resourceGroups/{{resourcegroup}}/providers/Microsoft.ContainerService/managedClusters/{{aksname}}/agentPools/{{nodepoolname}}?api-version=2023-01-01
Body raw:
{
"properties": {
"orchestratorVersion": "1.24.9"
}
}
@slzmruepp I see. I found this issue while investigating unexpected partial Kubernetes upgrade of my AKS cluster which I performed using Terraform. It's second time ever I run such upgrade of the same cluster, but first time it's succeeded with partial upgrade result. so I thought my issue might be related to your issue with the auto-ugprade.
I run upgrade from 1.25.5 to 1.26.3 and here is what I mean by partial upgrade result:
$ az aks show --resource-group ${AKS_CLUSTER_GROUP} --name ${AKS_CLUSTER} --output table
Name Location ResourceGroup KubernetesVersion CurrentKubernetesVersion ProvisioningState
------------------- -------- --------------- ----------------- ------------------------ -----------------
aks-xxx-uks-stg-aks uksouth rg-aks-xxx-stg 1.26.3 1.26.3 Succeeded
but for some reason the system node pool has not been upgraded
$ az aks nodepool list --resource-group ${AKS_CLUSTER_GROUP} --cluster-name ${AKS_CLUSTER} --output table
Name OsType KubernetesVersion VmSize Count MaxPods ProvisioningState Mode
------- -------- ------------------- ---------------- ------- --------- ------------------- ------
default Linux 1.25.5 Standard_D2_v3 1 30 Succeeded System
w1abc Windows 1.26.3 Standard_E2as_v5 0 30 Succeeded User
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
aks-default-36914368-vmss000000 Ready agent 8d v1.25.5
aksw1abc000001 Ready agent 3d v1.26.3
After I found your issue, I also confirmed that the orchestrator versions of the system node pool were left at 1.25.5
I decided to try az aks nodepool upgrade
and it did the trick for me:
Name OsType KubernetesVersion VmSize Count MaxPods ProvisioningState Mode
------- -------- ----------------- ---------------- ------- ------- ----------------- ------
default Linux 1.26.3 Standard_D2_v3 1 30 Succeeded System
w1abc Windows 1.26.3 Standard_E2as_v5 1 30 Succeeded User
So, I shared it above.
Issue needing attention of @Azure/aks-leads
Also facing this issue.
My setup:
patch
Updating kubernetes_version
from 1.24 -> 1.25
does upgrade AKS to 1.25 but leaves the node pools on version 1.24.
Initially I thought it was me misunderstanding how Automatic Channel Upgrades works. So I tried switching from patch
to stable
but node pools are still left untouched during upgrades. The documentation states that orchestrator_version
should be left empty if they are to always follow AKS version. Is it possible that Terraform is setting the orchestrator version if it is not explicitly set to null
and that is causing node pools to be stuck on older version?
Did some digging and I don't think this is a bug or issue in AKS/Azure's API.
AzureRM Terraform provider will always set orchestratorVersion
to currentOrchestratorVersion
IF the Terraform variable orchestrator_version
is unset. See PR when change was introduced: https://github.com/hashicorp/terraform-provider-azurerm/pull/18130
Also see these links:
orchestrator_version
is set: https://github.com/hashicorp/terraform-provider-azurerm/blob/5dca5a760e56c7b59f3507631021310c1972874b/internal/services/containers/kubernetes_cluster_node_pool_resource.go#L911confirm if we can roll the default node pool if the value is unset in the config
Basically, what I think is happening in my case is that I have an existing cluster running on version 1.24.10. The cluster was deployed using Terraform's azurerm_kubernetes_cluster
resource with kubernetes_version="1.24"
, default_node_pool.orchestrator_version=null
and automatic_channel_upgrade="patch"
.
Now I want to upgrade to 1.25 and what happens is AzureRM TF provider performs the upgrade on AKS (control-plane) but because orchestrator_version
is unset it uses the currentOrchestratorVersion
returned by Azure's API which correctly is set to 1.24.10
and so from Azure's point of view I am specifically asking to upgrade AKS to 1.25
but keep the node-pools on 1.24.10
.
Hope that makes sense.
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
I got the error even from the portal when selecting "only upgrade control plane"
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
@kaarthis, @sdesai345 would you be able to assist?
Describe the bug Auto channel upgrade stable misses to update orchestratorVersion on Nodepool.
To Reproduce Steps to reproduce the behavior: Create AKS cluster with nodepool system and user with Kubernetes Version N-2 Enable Autochannel upgrade stable Wait for the cluster to upgrade. Look at the gui under node pools which reports wrong patch version for user pool or az command as well as kubectl get nodes command.
Expected behavior orchestratorVersion and currentOrchestratorVersion should report the correct deployed version.
Screenshots
Environment (please complete the following information):
az aks show --resource-group rg-project-stg --name aks-project-stg | grep kubernetesVersion "kubernetesVersion": "1.24.9", az aks show --resource-group rg-project-stg --name aks-project-stg | grep currentKubernetesVersion "currentKubernetesVersion": "1.24.9", az aks nodepool show --resource-group rg-project-stg --cluster-name aks-project-stg --name user| grep orchestratorVersion "orchestratorVersion": "1.24.6", az aks nodepool show --resource-group rg-project-stg --cluster-name aks-project-stg --name user| grep currentOrchestratorVersion "currentOrchestratorVersion": "1.24.9",
│ Error: updating Managed Cluster (Subscription: "XXX" │ Resource Group Name: "rg-project-master-int" │ Managed Cluster Name: "aks-project-master-int"): managedclusters.ManagedClustersClient#CreateOrUpdate: Failure sending request: StatusCode=0 -- Original Error: Code="NotAllAgentPoolOrchestratorVersionSpecifiedAndUnchanged" Message="Using managed cluster api, all Agent pools' OrchestratorVersion must be all specified or all unspecified. If all specified, they must be stay unchanged or the same with control plane. For agent pool specific change, please use per agent pool operations: https://aka.ms/agent-pool-rest-api" │ │ with module.aks_cluster.azurerm_kubernetes_cluster.aks_cluster, │ on modules/aks/main.tf line 32, in resource "azurerm_kubernetes_cluster" "aks_cluster": │ 32: resource "azurerm_kubernetes_cluster" "aks_cluster" {