Azure / azure-cli

Azure Command-Line Interface
MIT License
4k stars 2.98k forks source link

az aks command invoke on private cluster errors with Failed to run command due to cluster perf issue #24192

Open kbrkelsi opened 2 years ago

kbrkelsi commented 2 years ago

az feedback auto-generates most of the information requested below, as of CLI version 2.0.62

Related command

This command when attempted either in cloudshell or remotely from the latest version - 2.40.0, produces the error (sanitized) "(KubernetesOperationError) Failed to run command due to cluster perf issue, container command-################# in aks-command namespace did not start within 30s on your cluster, retry may helps. If issue persist, you may need to tune your cluster with better performance (larger node/paid tier). Code: KubernetesOperationError Message: Failed to run command due to cluster perf issue, container command-################# in aks-command namespace did not start within 30s on your cluster, retry may helps. If issue persist, you may need to tune your cluster with better performance (larger node/paid tier).

To Reproduce

run the command after authenticating to private cluster

Expected behavior

Expected to run kubectl and helm operations on private cluster

Environment summary

az cli 2.40 private aks cluster Kubernetes RBAC enabled AKS-managed AAD Enabled

Additional context

yonzhan commented 2 years ago

route to CXP team

PramodValavala-MSFT commented 2 years ago

@kbrkelsi The error message suggests that the pod created in your cluster which runs the command wasn't able to start, presumably because of insufficient resources (CPU/memory). It is likely that your current workload is taking up all resources and/or you don't have auto scale setup to create new nodes when needed.

You can check the current resources using the kubectl top node command. Also, check if any taints on your nodes are not allowing the container to schedule on a less busy node.

ghost commented 1 year ago

Hi, we're sending this friendly reminder because we haven't heard back from you in a while. We need more information about this issue to help address it. Please be sure to give us your input within the next 7 days. If we don't hear back from you within 14 days of this comment the issue will be automatically closed. Thank you!

kbrkelsi commented 1 year ago

@PramodValavala-MSFT This happens on newly created clusters with no workloads, just trying to deploy helm charts, so performance issues have been ruled out. Please let me know if you need additional information.

bramdehart commented 6 months ago

Any updates on this issue?