Azure / AgentBaker

Agent Baker is aiming to provide a centralized, portable k8s agent node provisioning lib as well as rich support on different OS image with optimized k8s binaries.
MIT License
92 stars 200 forks source link

Azure CLI missing from Node Image gen2/2204containerd/202406.19.0 #4568

Open aolear-ss opened 2 weeks ago

aolear-ss commented 2 weeks ago

What happened:

After adding a new Node Pool to an AKS cluster in a nonproduction environment, workloads using the azure-blob-fuse storage class fail in "ContainerCreating" with the following error message:

MountVolume.MountDevice failed for volume "***" :   rpc error: code = Internal desc = Mount failed with error: rpc error: code =   Unknown desc = exit status 1 Error: failed to initialize new pipeline [Azure CLI not found on path]

Upon logging into the node, it is confirmed that the Azure CLI is not installed. After manually installing Azure CLI onto the node, the error message is resolved.

It seems that the utility is missing from the node image version.

What you expected to happen:

The azure-blob-fuse storage class to work and for the pod to be started successfully.

How to reproduce it:

Deploy a pod that mounts a volume based on the "azureblob-fuse-premium" storage class.

Anything else we need to know?:

I'm not sure if this is the right place. The node version showing for the node pool is: image

Looking at the https://github.com/Azure/AKS/releases page, that version does not appear to exist. I do see it in this repository, which I found by following the links here for 202406.19.0 so I'm making the assumption that new versions are pushed from here and that page is not updated yet. If I'm in the wrong spot, I welcome any points in the right direction so we can get this resolved before it happens to a production cluster.

Environment:

N/A?

cameronmeissner commented 2 weeks ago

This is actually an upstream issue caused by this change: https://github.com/Azure/azure-storage-fuse/pull/1384, which relies on the Azure CLI being installed.

We've never previously installed the Azure CLI on any official AKS node images.

UtheMan commented 2 weeks ago

@aolear-ss For now, the workaround would be to deploy a daemonset installing azure-cli on your node (probably easiest way to unblock that). We are in touch with the Azure Storage team to get a permanent solution.

edit: I see that you already did that (based on the issue in main AKS repo), so for now that's our recommended short-term fix.

aolear-ss commented 2 weeks ago

Great - thank you both for the response. I did end up creating a daemonset to do just that, and that does work for now. Not ideal - but should hold us over until the permanent fix is available.

Thanks again!