Closed Laffs2k5 closed 2 weeks ago
pls email me your cluster info, we could upgrade your blob csi driver in backend to v1.24.1 which has the fix, thanks
This is also happening with azure file
Answer: Pod failing to schedule due to the error : file.csi.azure.com_csi-azurefile-controller-7786ddb966-bz9hn_4902be80-5e92-4246-82d3-94709310755e failed to provision volume with StorageClass 'trun-azurefile': rpc error: code = Internal desc = failed to ensure storage account: create private DNS zone(privatelink.file.core.windows.net) in resourceGroup(Test3SpInDevEnvRG): authenticated requests are not permitted for non TLS protected (https) endpoints
pls upgrade to aks 1.29 which already have the fix, if you don't want to upgrade cluster version, pls email me your cluster info, I will help upgrade your azure file driver version in the backend, thanks.
Ok thanks. I'll upgrade our clusters to 1.29
👍
Thanks for reaching out. I'm closing this issue as it was marked with "Answer Provided" and it hasn't had activity for 2 days.
Describe the bug
We have been utilizing dynamic volume provisioning for our deployments in AKS for a while. This is realized using the Azure Blob Storage CSI driver for Kubernetes with a custom sc (StorageClass) and pvc (PersistentVolumeClaim) in accordance with the documentation found here: Create a persistent volume with Azure Blob storage in Azure Kubernetes Service (AKS).
This week it stopped working with an error message during provisioning of the sc:
failed to provision volume with StorageClass "sc-blob-storage-test-application-pr-203": rpc error: code = Internal desc = ensure storage account failed with create private DNS zone(privatelink.blob.core.windows.net) in resourceGroup(rg6-ss2-cm-net-dev): authenticated requests are not permitted for non TLS protected (https) endpoints
The issue occurs for all deployments with new pvc's. Deployments with already existing pvc's continue to run fine. the result is that we are prevented from deploying any new apps with storage backed by dynamic volume provisioning.
The first part of the description,
ensure storage account failed with
, is logged from blob-csi-driver/pkg/blob/controllerserver.go L386.The next part,
create private DNS zone(privatelink.blob.core.windows.net) in resourceGroup(rg6-ss2-cm-net-dev):
, is logged from cloud-provider-azure/pkg/provider/azure_storageaccount.go L247, this code is the out-of-tree cloud provider for Azure.NOTE: code execution should never have gotten to line 247 in our case, as the private link DNS zone already exist and thus should have been retrieved on line 243 by
az.privatednsclient.Get()
. To understand the failure better I would very much like to known what is logged by line 244,klog.V(2).Infof("get private dns zone %s returned with %v", privateDNSZoneName, err.Error())
. But I don't know where or how to enable this logging.I've looked through the kube-system logs and identified that the
csi-blob-node
daemonset was updated to use the imagemcr.microsoft.com/oss/kubernetes-csi/blob-csi:v1.23.4
in-place ofmcr.microsoft.com/oss/kubernetes-csi/blob-csi:v1.23.3
at 6AM April 23rd. We have logs indicating that the dynamic provisioning worked fine April 22nd and developers reported the first issue dynamic provisioning April 24th.NOTE: we have not touched or modified the
csi-blob-node
daemonset ourselves, this is fully manged by AKS.The cluster in question is on AKS
1.28.5
, and the update to version1.23.4
of the Blob Storage CSI driver corresponds well with the AKS Release 2024-04-11 which under Component Updates lists:If you are able to reproduce on your side, I would suggest to downgrade the Blob Storage CSI driver to version
1.23.3
until the cause of the failure is understood and mitigated.To Reproduce
privatelink.blob.core.windows.net
already existing in the resource group of the AKS vnet.Contributor
role in the resource group hosting the AKS vnet.az aks update --enable-blob-driver
Expected behavior
Screenshots
Not applicable.
Environment
1.28.5
(clusterkubernetesVersion
and node poolorchestratorVersion
)AKSUbuntu-2204gen2containerd-202402.07.0
1.23.4
Additional context
custom StorageClass
```yaml apiVersion: storage.k8s.io/v1 kind: StorageClass allowVolumeExpansion: true metadata: labels: name: sc-blob-storage-test-application-pr-203 mountOptions: - '-o allow_other' - '--file-cache-timeout-in-seconds=120' - '--use-attr-cache=true' - '--cancel-list-on-mount-seconds=10' - '-o attr_timeout=120' - '-o entry_timeout=120' - '-o negative_timeout=120' - '--log-level=LOG_WARNING' - '--cache-size-mb=1000' parameters: allowBlobPublicAccess: 'false' containerName: test-application-pr-203 matchTags: 'true' networkEndpointType: privateEndpoint protocol: fuse2 skuName: Standard_LRS tags: >- CreatedBy=Azure Kubernetes Service,ApplicationName=AKS application storage,Description=Storage account for ephemeral environments in AKS provisioner: blob.csi.azure.com reclaimPolicy: Delete volumeBindingMode: Immediate ```PersistentVolumeClaim
```yaml apiVersion: v1 kind: PersistentVolumeClaim metadata: annotations: volume.beta.kubernetes.io/storage-class: sc-blob-storage-test-application-pr-203 name: pvc-blob-storage-test-application-pr-203 namespace: test-application-pr-203 spec: accessModes: - ReadWriteMany resources: requests: storage: 10Gi storageClassName: sc-blob-storage-test-application-pr-203 ```Deployment (not a complete example, just the important parts)
```yaml apiVersion: apps/v1 kind: Deployment metadata: name: test-application-pr-203-deployment namespace: test-application-pr-203 spec: replicas: 1 selector: matchLabels: app: test-application-pr-203-app template: metadata: labels: app: test-application-pr-203-app app.kubernetes.io/name: test-application-backend app.kubernetes.io/part-of: test-application app.kubernetes.io/version: pr-203-2024.04.25.53165 azure.workload.identity/use: 'false' spec: containers: - env: - name: AZURE_PERSISTENT_STORAGE_MOUNT_PATH value: /mnt/azure-blob-storage image: