Azure / azure-cli

Azure Command-Line Interface
MIT License
4.01k stars 2.99k forks source link

NotAllAgentPoolOrchestratorVersionSpecifiedAndUnchanged during `az aks update --enable-managed-identity` #22807

Closed mdhomer closed 2 years ago

mdhomer commented 2 years ago

Related command az aks update --resource-group <> --name <> --enable-managed-identity --assign-identity <> --assign-kubelet-identity <> OR the simplified: az aks update --resource-group <> --name <> --enable-managed-identity

(both give same error, which I believe is a validation error before proceeding)

Describe the bug We recently updated our Azure Managed AKS clusters and their control-plane from: kubernetesVersion: 1.21.2 -> 1.22.6 orchestratorVersion: 1.21.2 -> 1.22.6

As a followup to these upgrades we also want to enable managed-identities for the same cluster. However the API call during the above az aks update --enabled-managed-identity command, is returning some unexpected output, which I believe may be a bug between the HTTP API payload versions:

urllib3.connectionpool: https://management.azure.com:443 "GET /subscriptions/.../resourceGroups/.../providers/Microsoft.ContainerService/managedClusters/...?api-version=2022-04-01 HTTP/1.1" 200 None
...
cli.azure.cli.core.sdk.policies: Response content:
cli.azure.cli.core.sdk.policies: {
  "id": "/subscriptions/.../resourcegroups/.../providers/Microsoft.ContainerService/managedClusters/...",
   ...
   "kubernetesVersion": "1.22.6",
   "currentKubernetesVersion": "1.22.6",
   ...
   "orchestratorVersion": "1.21.2",
    "currentOrchestratorVersion": "1.22.6",
    ...

I believe the orchestratorVersion returned from the above API calls is then passed on the AKS update call:

cli.azure.cli.core.sdk.policies: Request URL: 'https://management.azure.com/subscriptions/.../resourceGroups/.../providers/Microsoft.ContainerService/managedClusters/...?api-version=2022-04-01'
...
cli.azure.cli.core.sdk.policies: Request body:
cli.azure.cli.core.sdk.policies: {
...
"mode": "System", "orchestratorVersion": "1.21.2", "upgradeSettings": ...
}
urllib3.connectionpool: https://management.azure.com:443 "PUT /subscriptions/cbe888f6-e994-4f24-aabc-1834bf620d36/resourceGroups/staginguk/providers/Microsoft.ContainerService/managedClusters/staginguk-primary-aks?api-version=2022-04-01 HTTP/1.1" 400 394
cli.azure.cli.core.sdk.policies: Response status: 400

Which means we hit this exception response:

cli.azure.cli.core.sdk.policies: {
  "code": "NotAllAgentPoolOrchestratorVersionSpecifiedAndUnchanged",
  "message": "Using managed cluster api, all Agent pools' OrchestratorVersion must be all specified or all unspecified. If all specified, they must be stay unchanged or the same with control plane. For agent pool specific change, please use per agent pool operations: https://aka.ms/agent-pool-rest-api",
  "subcode": ""
 }

To Reproduce

  1. Upgrade a Azure AKS cluster & orchestratorVersion.
  2. Attempt to enable managed identities with: az aks update --resource-group <> --name <> --enable-managed-identity

Expected behaviour The cluster should utilise the new managed identities & be present the json represenation of the AKS cluster. via: identityProfile & kubeletIdentityProfile blocks.

Environment summary

az --version
azure-cli                         2.37.0

core                              2.37.0
telemetry                          1.0.6

Extensions:
datafactory                        0.5.0

Dependencies:
msal                            1.18.0b1
azure-mgmt-resource             21.1.0b1

Python location '/home/mitchell/test-venv/bin/python3'
Extensions directory '/home/mitchell/.azure/cliextensions'

Python (Linux) 3.8.10 (default, Mar 15 2022, 12:22:08) 
[GCC 9.4.0]

Additional context n/a

yonzhan commented 2 years ago

route to CXP team

navba-MSFT commented 2 years ago

@mdhomer Apologies for the late reply. Thanks for reaching out to us and sharing this feedback. Could you please retry the same operation and confirm if you are still facing the same issue ? Awaiting your reply.

navba-MSFT commented 2 years ago

@mdhomer I had reached out to the Product Owners in the background and they have confirmed that this is a bug and working on fixing it. I will keep you posted on the progress of the fix.

mdhomer commented 2 years ago

@navba-MSFT Thanks a bunch for the followup, I didn't get a chance to try perform the operation again. Should I hold off until the bug is addressed?

Also will this fix be included w/ a new version of the cli (require an upgrade), or is it for the underlying HTTP APIs?

navba-MSFT commented 2 years ago

@mdhomer Thanks for getting back. You need not repro the issue again. I will keep you posted on the fix and I will get back with an answer to your above questions.

navba-MSFT commented 2 years ago

@mdhomer I have heard back from the Product Owners. The fix is done at the Resource Provider level and it will be rolled out next Monday. Hope this answers.

navba-MSFT commented 2 years ago

@mdhomer I wanted to share a quick update on this. The fix rollout just started yesterday and will take a couple days to reach all regions. Until the fix is deployed to all regions, Maybe you can avoid using failing version for now as a workaround. Hope this helps.

mdhomer commented 2 years ago

Thanks for the update @navba-MSFT I think this --enable-managed-identity feature is only available in cli v2.37.0 based on the docs I was following. So I guess I'll just have to wait a little longer?

The AKS cluster I'm targetting is in UK South, is this fix being rolled out on the underlying http API layer or within the CLI itself? Wondering if the docs should be updated if its the latter

navba-MSFT commented 2 years ago

@mdhomer As mentioned in the above comment, the fix is not in CLI version / HTTP API level. The fix is at the AKS Resource Provider level.

navba-MSFT commented 2 years ago

@mdhomer I am yet to get confirmation about the fix being deployed to all regions. In the meantime, Could you please try running the same above command again and check if you are facing the issue ?

navba-MSFT commented 2 years ago

@mdhomer The action is currently pending on you to test the same command again. If you still face the same issue, please feel free to reopen this thread. We would be happy to help.

mdhomer commented 2 years ago

@navba-MSFT unfortunately this still seems to be a problem. Sorry I didn't get to re-try it sooner.

I attempted with "azure-cli": "2.37.0", error Log:

(NotAllAgentPoolOrchestratorVersionSpecifiedAndUnchanged) Using managed cluster api, all Agent pools' OrchestratorVersion must be all specified or all unspecified. If all specified, they must be stay unchanged or the same with control plane. For agent pool specific change, please use per agent pool operations: https://aka.ms/agent-pool-rest-api
Code: NotAllAgentPoolOrchestratorVersionSpecifiedAndUnchanged
Message: Using managed cluster api, all Agent pools' OrchestratorVersion must be all specified or all unspecified. If all specified, they must be stay unchanged or the same with control plane. For agent pool specific change, please use per agent pool operations: https://aka.ms/agent-pool-rest-api

&& the newer release: "azure-cli": "2.38.0",

(NotAllAgentPoolOrchestratorVersionSpecifiedAndUnchanged) Using managed cluster api, all Agent pools' OrchestratorVersion must be all specified or all unspecified. If all specified, they must be stay unchanged or the same with control plane. For agent pool specific change, please use per agent pool operations: https://aka.ms/agent-pool-rest-api
Code: NotAllAgentPoolOrchestratorVersionSpecifiedAndUnchanged
Message: Using managed cluster api, all Agent pools' OrchestratorVersion must be all specified or all unspecified. If all specified, they must be stay unchanged or the same with control plane. For agent pool specific change, please use per agent pool operations: https://aka.ms/agent-pool-rest-api

let me know if you require debug output and I can provide here as a comment

navba-MSFT commented 2 years ago

@mdhomer Please share the below details over email. My email navba [ @ ] microsoft . com

Awaiting your reply.

mdhomer commented 2 years ago

@navba-MSFT I've sent over the requested information, please let me know if this hasn't got to you

navba-MSFT commented 2 years ago

@mdhomer Thanks for emailing me the requested details. I had reached out to the Product Owners and they have informed that the fix is yet to be rolled out in all regions. We will keep you posted once it is done. We appreciate your patience on this.

schafei commented 2 years ago

@navba-MSFT Is this fix still not rolled out to all regions? We just got the same error when trying to update an aks cluster with 2 node pools from v1.23.5 to v1.23.8 in westeurope region. Our clusters with only the default node pool could be updated succesfully, but at least one cluster with an additional node pool fails to update with the following error:

│ Error: updating Kubernetes Version for Cluster: (Managed Cluster Name "..." / Resource Group "..."): containerservice.ManagedClustersClient#CreateOrUpdate: Failure sending request: StatusCode=400 -- Original Error: Code="NotAllAgentPoolOrchestratorVersionSpecifiedAndUnchanged" Message="Using managed cluster api, all Agent pools' OrchestratorVersion must be all specified or all unspecified. If all specified, they must be stay unchanged or the same with control plane. For agent pool specific change, please use per agent pool operations: https://aka.ms/agent-pool-rest-api" │ with module.aks.azurerm_kubernetes_cluster.aks, │ on vendor/modules/crt-aks-module/aks.tf line 7, in resource "azurerm_kubernetes_cluster" "aks": │ 7: resource "azurerm_kubernetes_cluster" "aks" {

mkemmerz commented 2 years ago

Same issue as @schafei. We wanted to upgrade a cluster which is located in West Europe and it also failed with NotAllAgentPoolOrchestratorVersionSpecifiedAndUnchanged. The upgrade was from 1.22.x to 1.23.x.

verysonglaa commented 2 years ago

@navba-MSFT Could you please provide an update we still experience the same issue in westeurope.

navba-MSFT commented 2 years ago

@verysonglaa Based on my discussion with the Product Owners, the above fix has been deployed to all regions. Note that the agentpools have been fixed. Customers should have consistent orchestratorVersion and currentOrchestratorVersion now. If you are still facing the same issue, please open a support ticket. Our Support Professional will get in touch with you and troubleshoot this further.

wasker commented 2 years ago

FWIW, if one is stuck in this state, it seems that it's possible to get yourself out from it by going into the AKS Node Pools blade in the Portal and manually changing scale parameters of the node pool. This seem to reset the record of the k8s version in the nodepool.

tomasz-k-kaminski commented 2 years ago

In case you use ARM templates please make sure to specify per node pool

"orchestratorVersion": "[parameters('kubernetesVersion')]", "currentOrchestratorVersion": "[parameters('kubernetesVersion')]"

rehan2908 commented 2 years ago

i have same issue using ARM template

pettermoe95 commented 11 months ago

I'm still getting this error today. Using the az aks update --resource-group <> --name <> --enable-managed-identity --assign-identity <> on my AKS cluster causes the same error.