Azure / AKS

Azure Kubernetes Service
https://azure.github.io/AKS/
1.97k stars 310 forks source link

[Question] AKS node image updated partially and cluster in inconsistent image version. #4659

Open sreeniHari opened 1 week ago

sreeniHari commented 1 week ago

Describe scenario

On our production AKS cluster, an operation was triggered to upgrade the node image version, as recorded in the activity logs:

    •   Operation Name: Upgrade agent pool node image version
    •   Timestamp: Fri Nov 15, 2024, 21:22:39 GMT+0800 (Singapore Standard Time)
    •   Event Initiated By: Microsoft.ContainerService

Checking the activity log, we noticed that only 2 node pools in the production cluster have been successfully updated to versionAKSUbuntu-2204gen2containerd-202410.27.0, while one node pool remains stuck at version AKSUbuntu-2204gen2containerd-202410.15.0Additionally, our staging cluster, which shares the same region and similar settings, still has all nodes on version AKSUbuntu-2204gen2containerd-202410.15.0

The Planned Updates set to: Node Image the Scheduler set to: No Schedule

Looked at kubelet logs and couldn't find anything meaningful

Question

  1. Why one node pool in the production cluster is stuck at the old version?
  2. Why the staging cluster nodes haven’t been updated automatically?
  3. What steps we can take to ensure all node pools are upgraded consistently across both clusters?

Please advise