Azure / AKS

Azure Kubernetes Service
https://azure.github.io/AKS/
1.97k stars 308 forks source link

[BUG] Can't create PV and mount based XFS volumes on Azure Linux // `wrong fs type, bad option, bad superblock` / `Superblock has unknown incompatible features (0x20) enabled` #4643

Open jkroepke opened 2 days ago

jkroepke commented 2 days ago

Describe the bug We are running AKS 1.30 with with Azure Linux 2.0.20241006.

We are using a custom Storage Class with following configuration:

allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    helm.sh/resource-policy: keep
    meta.helm.sh/release-name: opsstack
    meta.helm.sh/release-namespace: opsstack
parameters:
  cachingMode: ReadOnly
  fsType: xfs
  networkAccessPolicy: DenyAll
  perfProfile: Basic
  publicNetworkAccess: Disabled
  skuname: Premium_ZRS
provisioner: disk.csi.azure.com
reclaimPolicy: Retain
volumeBindingMode: WaitForFirstConsumer

After a pod is started with an volume, the following error is visible via k get events

Events:
  Type     Reason       Age                 From     Message
  ----     ------       ----                ----     -------
  Warning  FailedMount  81s (x50 over 87m)  kubelet  MountVolume.MountDevice failed for volume "pvc-bef9b07d-7183-4d1e-b0e3-242f03cca9be" : rpc error: code = Internal desc = could not format /dev/disk/azure/scsi1/lun0(lun: 0), and mount it at /var/lib/kubelet/plugins/kubernetes.io/csi/disk.csi.azure.com/8810556f8ee1e7432240c93de60a4ae1a2819a71a1f48fe6c1c8fc42e4d0112e/globalmount, failed with mount failed: exit status 32
Mounting command: mount
Mounting arguments: -t xfs -o nouuid,defaults /dev/disk/azure/scsi1/lun0 /var/lib/kubelet/plugins/kubernetes.io/csi/disk.csi.azure.com/8810556f8ee1e7432240c93de60a4ae1a2819a71a1f48fe6c1c8fc42e4d0112e/globalmount
Output: mount: /var/lib/kubelet/plugins/kubernetes.io/csi/disk.csi.azure.com/8810556f8ee1e7432240c93de60a4ae1a2819a71a1f48fe6c1c8fc42e4d0112e/globalmount: wrong fs type, bad option, bad superblock on /dev/sdb, missing codepage or helper program, or other error.
       dmesg(1) may have more information after failed mount system call.

and on the dmesg

[66939.597693] XFS (sdb): Superblock has unknown incompatible features (0x20) enabled.
[66939.597697] XFS (sdb): Filesystem cannot be safely mounted by this kernel.
[66939.597710] XFS (sdb): SB validate failed with error -22.

Additional infos

It seems like https://docs.redpanda.com/current/deploy/deployment-option/self-hosted/kubernetes/aks-guide/ this error is well known. It looks like xfsutils is newer than the xfs kernel module.

Version 0.6.0 is required to avoid volume-mounting issues caused by recent mkfs.xfs updates. Newer versions enable the -i nrext64=1 option, triggering the following error on default AKS kernels:

XFS (dm-0): Superblock has unknown incompatible features (0x20) enabled.

jkroepke commented 2 days ago

Workaround:

dnf install xfsprogs

Run mkfs.xfs -f /dev/disk/azure/scsi1/lun0 (Dangerous command, it wipes all data on that disk)

andyzhangx commented 1 day ago

related to this issue: https://github.com/kubernetes-sigs/azuredisk-csi-driver/issues/2588, just provide me the cluster info (by email) thus I could help you downgrade csi driver version to fix this issue, thanks.

andyzhangx commented 1 day ago

btw, xfs disk format works on azure linux 3.0 since 3.0 has 6.2 linux kernel: https://learn.microsoft.com/en-us/azure/azure-linux/how-to-enable-azure-linux-3

jkroepke commented 19 hours ago

just provide me the cluster info (by email) thus I could help you downgrade csi driver version to fix this issue, thanks.

It's not really helping use, since the issue appear on fresh AKS cluster. Once PVCs are created, new PVCs are very rarly.

Does it help to use the older patch version of 1.30 on creation time? Is the issues fixed in 1.31 AKS?