Azure / azure-cli

Azure Command-Line Interface
MIT License
4.03k stars 3.01k forks source link

`az aks enable-addons`: fails due to region name to id mapping issue #27473

Open NedAnd1 opened 1 year ago

NedAnd1 commented 1 year ago

Describe the bug

After installing a cluster with the Azure CLI (az aks create) in regions like East US and West Europe, enabling container insights for AKS monitoring fails due to some resource mapping issue with the Israel Central region... This has occurred multiple times (not consistently, but frequently) within our testing infrastructure, using both v2.52.0 of the core az cli for az aks commands and the aks-preview extension (on separate VMs).

Related command

az aks enable-addons -a monitoring -n $clusterName -g $resourceGroup --workspace-resource-id $logAnalyticsResourceId --enable-msi-auth-for-monitoring --enable-syslog

Errors

WARNING: Argument '--enable-msi-auth-for-monitoring' is in preview and under development. Reference and support levels: https://aka.ms/CLI_refstatus WARNING: Argument '--enable-syslog' is in preview and under development. Reference and support levels: https://aka.ms/CLI_refstatus WARNING: The behavior of this command has been altered by the following extension: aks-preview ERROR: The command failed with an unexpected error. Here is the traceback: ERROR: 'Israel Central' Traceback (most recent call last): File "/opt/az/lib/python3.10/site-packages/knack/cli.py", line 233, in invoke cmd_result = self.invocation.execute(args) File "/opt/az/lib/python3.10/site-packages/azure/cli/core/commands/init.py", line 663, in execute raise ex File "/opt/az/lib/python3.10/site-packages/azure/cli/core/commands/init.py", line 726, in _run_jobs_serially results.append(self._run_job(expanded_arg, cmd_copy)) File "/opt/az/lib/python3.10/site-packages/azure/cli/core/commands/init.py", line 697, in _run_job result = cmd_copy(params) File "/opt/az/lib/python3.10/site-packages/azure/cli/core/commands/init.py", line 333, in call return self.handler(*args, kwargs) File "/opt/az/lib/python3.10/site-packages/azure/cli/core/commands/command_operation.py", line 121, in handler return op(command_args) File "/root/.azure/cliextensions/aks-preview/azext_aks_preview/custom.py", line 1636, in aks_enable_addons ensure_container_insights_for_monitoring( File "/opt/az/lib/python3.10/site-packages/azure/cli/command_modules/acs/addonconfiguration.py", line 404, in ensure_container_insights_for_monitoring if location not in region_ids: File "/opt/az/lib/python3.10/site-packages/azure/cli/command_modules/acs/addonconfiguration.py", line 403, in lambda x: region_names_to_id[x], resource["locations"]) KeyError: 'Israel Central' To check existing issues, please visit: https://github.com/Azure/azure-cli/issues

Issue script & Debug output

Debug-AKSMonitoring.log

Expected behavior

Enabling Container Insights for AKS monitoring to consistently succeed (or provide a better error message, if there is a legitimate cluster-related issue).

Environment Summary

azure-cli: 2.52.0 azure-cli-core: 2.52.0 azure-cli-telemetry: 1.1.0 extensions: {}

Additional context

No response

azure-client-tools-bot-prd[bot] commented 1 year ago

Hi @NedAnd1,

2.52.0 is not the latest Azure CLI(2.53.0).

If you haven't already attempted to do so, please upgrade to the latest Azure CLI version by following https://learn.microsoft.com/en-us/cli/azure/update-azure-cli.

yonzhan commented 1 year ago

Thank you for opening this issue, we will look into it.

egondalia commented 1 year ago

This is a "major issue" with Azure DevOps Agents, each time we try to deploy AKS using below command in our AzureDevOps yaml we getting errors (seems linked to the log analytics workspace but ours is eastus so doesnt make sense we get an error ERROR: 'Israel Central'

az aks enable-addons --addons monitoring --name $(MyClusterName) --resource-group $(MyResGrp) --workspace-resource-id $myworkspaceid

we have the lastest CLI 2.53 locally but the DevOps Agents is running at 2.52

(the key issue is ERROR: 'Israel Central' we are using eastus (this was working earlier but since then we keep getting this error)<<<<

-------> Enabling monitoring Add-on ERROR: The command failed with an unexpected error. Here is the traceback: ERROR: 'Israel Central' Traceback (most recent call last): File "/opt/az/lib/python3.10/site-packages/knack/cli.py", line 233, in invoke cmd_result = self.invocation.execute(args) File "/opt/az/lib/python3.10/site-packages/azure/cli/core/commands/init.py", line 663, in execute raise ex File "/opt/az/lib/python3.10/site-packages/azure/cli/core/commands/init.py", line 726, in _run_jobs_serially results.append(self._run_job(expanded_arg, cmd_copy)) File "/opt/az/lib/python3.10/site-packages/azure/cli/core/commands/init.py", line 697, in _run_job result = cmd_copy(params) File "/opt/az/lib/python3.10/site-packages/azure/cli/core/commands/init.py", line 333, in call return self.handler(*args, kwargs) File "/opt/az/lib/python3.10/site-packages/azure/cli/core/commands/command_operation.py", line 121, in handler return op(command_args) File "/opt/az/lib/python3.10/site-packages/azure/cli/command_modules/acs/custom.py", line 1029, in aks_enable_addons ensure_container_insights_for_monitoring( File "/opt/az/lib/python3.10/site-packages/azure/cli/command_modules/acs/addonconfiguration.py", line 404, in ensure_container_insights_for_monitoring if location not in region_ids: File "/opt/az/lib/python3.10/site-packages/azure/cli/command_modules/acs/addonconfiguration.py", line 403, in lambda x: region_names_to_id[x], resource["locations"]) KeyError: 'Israel Central' To check existing issues, please visit: https://github.com/Azure/azure-cli/issues

rmagalla commented 1 year ago

Hello, I have the same problem

az aks create \
    --resource-group ${AZ_RESOURCE_GROUP} \
    --name ${AZ_AKS_CLUSTER_NAME} \
    --generate-ssh-keys \
    --vm-set-type VirtualMachineScaleSets \
    --node-vm-size $AZ_VM_SKU \
    --load-balancer-sku standard \
    --enable-managed-identity \
    --network-plugin "azure" \
    --vnet-subnet-id $AKS_SUBNET_ID \
    --dns-service-ip 10.2.0.10 \
    --service-cidr 10.2.0.0/24 \
    --node-count 1 \
    --zones $AZ_AKS_ZONES \
    --max-pods 30 \
    --enable-cluster-autoscaler \
    --min-count 1 \
    --max-count 5 \
    --kubernetes-version $AZ_AKS_VERSION \
    --enable-addons monitoring \
    --enable-msi-auth-for-monitoring \
    --workspace-resource-id $AZ_WS_ID
The command failed with an unexpected error. Here is the traceback:
'Israel Central'
Traceback (most recent call last):
  File "/usr/lib64/az/lib/python3.9/site-packages/knack/cli.py", line 233, in invoke
    cmd_result = self.invocation.execute(args)
  File "/usr/lib64/az/lib/python3.9/site-packages/azure/cli/core/commands/__init__.py", line 663, in execute
    raise ex
  File "/usr/lib64/az/lib/python3.9/site-packages/azure/cli/core/commands/__init__.py", line 726, in _run_jobs_serially
    results.append(self._run_job(expanded_arg, cmd_copy))
  File "/usr/lib64/az/lib/python3.9/site-packages/azure/cli/core/commands/__init__.py", line 697, in _run_job
    result = cmd_copy(params)
  File "/usr/lib64/az/lib/python3.9/site-packages/azure/cli/core/commands/__init__.py", line 333, in __call__
    return self.handler(*args, **kwargs)
  File "/usr/lib64/az/lib/python3.9/site-packages/azure/cli/core/commands/command_operation.py", line 121, in handler
    return op(**command_args)
  File "/usr/lib64/az/lib/python3.9/site-packages/azure/cli/command_modules/acs/custom.py", line 640, in aks_create
    mc = aks_create_decorator.construct_mc_profile_default()
  File "/usr/lib64/az/lib/python3.9/site-packages/azure/cli/command_modules/acs/managed_cluster_decorator.py", line 6033, in construct_mc_profile_default
    mc = self.set_up_addon_profiles(mc)
  File "/usr/lib64/az/lib/python3.9/site-packages/azure/cli/command_modules/acs/managed_cluster_decorator.py", line 5699, in set_up_addon_profiles
    ] = self.build_monitoring_addon_profile()
  File "/usr/lib64/az/lib/python3.9/site-packages/azure/cli/command_modules/acs/managed_cluster_decorator.py", line 5494, in build_monitoring_addon_profile
    self.context.external_functions.ensure_container_insights_for_monitoring(
  File "/usr/lib64/az/lib/python3.9/site-packages/azure/cli/command_modules/acs/addonconfiguration.py", line 404, in ensure_container_insights_for_monitoring
    if location not in region_ids:
  File "/usr/lib64/az/lib/python3.9/site-packages/azure/cli/command_modules/acs/addonconfiguration.py", line 403, in <lambda>
    lambda x: region_names_to_id[x], resource["locations"])
KeyError: 'Israel Central'
NedAnd1 commented 1 year ago

There is a mitigation: retry for as much 15 minutes, but until Sep 25 (~17:30 UTC) the command was reliably successful on the first attempt, and the error message thrown here is very perplexing.

microsoft-github-policy-service[bot] commented 1 year ago

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @dyu1208, @FumingZhang, @andyliuliming.