ceph / ceph-csi

CSI driver for Ceph
Apache License 2.0
1.27k stars 539 forks source link

Unable to mount or create PVC: invalid value specified for ceph.dir.subvolume #1939

Closed TheDJVG closed 3 years ago

TheDJVG commented 3 years ago

Describe the bug

When I try to mount or create a PVC it fails with:

reason: 'ProvisioningFailed' failed to provision volume with StorageClass "rook-cephfs": rpc error: code = Internal desc = rados: ret=-22, Invalid argument: "invalid value specified for ceph.dir.subvolume"

It's unclear to me how I got in this situation as the cluster was working fine with existing claims, when I had to move some pods around and I noticed it didn't want to mount them on the new hosts. Creating a new PVC fails too.

Environment details

Steps to reproduce

Steps to reproduce the behavior:

  1. Setup details: Create a PVC wit the default rook-cephfs storage class with this spec:
    spec:
    accessModes:
    - ReadWriteOnce
    resources:
    requests:
      storage: 1Gi
    storageClassName: rook-cephfs
    volumeMode: Filesystem
  2. Deployment to trigger the issue '....'
  3. See error

Actual results

PVC is unable to be created:

  Type     Reason                Age                   From                                                                                                              Message
  ----     ------                ----                  ----                                                                                                              -------
  Normal   ExternalProvisioning  3m6s (x26 over 9m8s)  persistentvolume-controller                                                                                       waiting for a volume to be created, either by external provisioner "rook-ceph.cephfs.csi.ceph.com" or manually created by system administrator
  Normal   Provisioning          29s (x11 over 9m8s)   rook-ceph.cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-7d8c44596f-q9l87_d46979da-3c8f-4ca5-8add-d38626ba9ec3  External provisioner is provisioning volume for claim "home-automation/deconz-config"
  Warning  ProvisioningFailed    29s (x11 over 9m7s)   rook-ceph.cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-7d8c44596f-q9l87_d46979da-3c8f-4ca5-8add-d38626ba9ec3  failed to provision volume with StorageClass "rook-cephfs": rpc error: code = Internal desc = rados: ret=-22, Invalid argument: "invalid value specified for ceph.dir.subvolume"

or mounting existing PVC:

  Type     Reason                  Age               From                     Message
  ----     ------                  ----              ----                     -------
  Normal   Scheduled               27s               default-scheduler        Successfully assigned entertainment/suite-764d74559b-nbtrc to k8s-node1
  Normal   SuccessfulAttachVolume  27s               attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-53a3888f-91d9-447b-b749-9e6e16e240d6"
  Normal   SuccessfulAttachVolume  27s               attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-d50b0bb2-c61a-4391-8d18-adce2a95fba8"
  Normal   SuccessfulAttachVolume  27s               attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-92bb575c-5d3c-4c8f-86c3-b450e9861e55"
  Warning  FailedMount             3s (x5 over 12s)  kubelet                  MountVolume.MountDevice failed for volume "pvc-92bb575c-5d3c-4c8f-86c3-b450e9861e55" : rpc error: code = Internal desc = rados: ret=-22, Invalid argument: "invalid value specified for ceph.dir.subvolume"
  Warning  FailedMount             2s (x5 over 12s)  kubelet                  MountVolume.MountDevice failed for volume "pvc-53a3888f-91d9-447b-b749-9e6e16e240d6" : rpc error: code = Internal desc = rados: ret=-22, Invalid argument: "invalid value specified for ceph.dir.subvolume"
  Warning  FailedMount             2s (x5 over 11s)  kubelet                  MountVolume.MountDevice failed for volume "pvc-d50b0bb2-c61a-4391-8d18-adce2a95fba8" : rpc error: code = Internal desc = rados: ret=-22, Invalid argument: "invalid value specified for ceph.dir.subvolume"

Expected behavior

The PVC would be created or mounted in the pod.

Logs

If the issue is in PVC creation, deletion, cloning please attach complete logs of below containers.

$ kubectl -n rook-ceph logs -l app=csi-cephfsplugin-provisioner -c csi-provisioner
I0329 10:51:11.568887       1 controller.go:1317] provision "home-automation/deconz-config" class "rook-cephfs": started
I0329 10:51:11.569517       1 event.go:282] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"home-automation", Name:"deconz-config", UID:"b4f9363e-acfa-43d6-95fe-e9bdc01ad6aa", APIVersion:"v1", ResourceVersion:"2991376", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "home-automation/deconz-config"
W0329 10:51:11.694324       1 controller.go:943] Retrying syncing claim "b4f9363e-acfa-43d6-95fe-e9bdc01ad6aa", failure 9
E0329 10:51:11.694356       1 controller.go:966] error syncing claim "b4f9363e-acfa-43d6-95fe-e9bdc01ad6aa": failed to provision volume with StorageClass "rook-cephfs": rpc error: code = Internal desc = rados: ret=-22, Invalid argument: "invalid value specified for ceph.dir.subvolume"
I0329 10:51:11.694446       1 event.go:282] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"home-automation", Name:"deconz-config", UID:"b4f9363e-acfa-43d6-95fe-e9bdc01ad6aa", APIVersion:"v1", ResourceVersion:"2991376", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' failed to provision volume with StorageClass "rook-cephfs": rpc error: code = Internal desc = rados: ret=-22, Invalid argument: "invalid value specified for ceph.dir.subvolume"
I0329 10:55:27.694639       1 controller.go:1317] provision "home-automation/deconz-config" class "rook-cephfs": started
I0329 10:55:27.694914       1 event.go:282] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"home-automation", Name:"deconz-config", UID:"b4f9363e-acfa-43d6-95fe-e9bdc01ad6aa", APIVersion:"v1", ResourceVersion:"2991376", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "home-automation/deconz-config"
W0329 10:55:27.937903       1 controller.go:943] Retrying syncing claim "b4f9363e-acfa-43d6-95fe-e9bdc01ad6aa", failure 10
E0329 10:55:27.937951       1 controller.go:966] error syncing claim "b4f9363e-acfa-43d6-95fe-e9bdc01ad6aa": failed to provision volume with StorageClass "rook-cephfs": rpc error: code = Internal desc = rados: ret=-22, Invalid argument: "invalid value specified for ceph.dir.subvolume"
I0329 10:55:27.938170       1 event.go:282] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"home-automation", Name:"deconz-config", UID:"b4f9363e-acfa-43d6-95fe-e9bdc01ad6aa", APIVersion:"v1", ResourceVersion:"2991376", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' failed to provision volume with StorageClass "rook-cephfs": rpc error: code = Internal desc = rados: ret=-22, Invalid argument: "invalid value specified for ceph.dir.subvolume"
I0329 10:38:38.959840       1 csi-provisioner.go:121] Version: v2.0.0
I0329 10:38:38.959992       1 csi-provisioner.go:135] Building kube configs for running in cluster...
I0329 10:38:38.974627       1 connection.go:153] Connecting to unix:///csi/csi-provisioner.sock
I0329 10:38:39.976819       1 common.go:111] Probing CSI driver for readiness
W0329 10:38:39.980215       1 metrics.go:142] metrics endpoint will not be started because `metrics-address` was not specified.
I0329 10:38:39.985113       1 leaderelection.go:243] attempting to acquire leader lease  rook-ceph/rook-ceph-cephfs-csi-ceph-com...
E0329 10:38:59.638011       1 leaderelection.go:357] Failed to update lock: Operation cannot be fulfilled on leases.coordination.k8s.io "rook-ceph-cephfs-csi-ceph-com": the object has been modified; please apply your changes to the latest version and try again
kubectl -n rook-ceph logs -l app=csi-cephfsplugin -c csi-cephfsplugin
E0329 10:58:00.604953       1 volume.go:87] ID: 644 Req-ID: 0001-0009-rook-ceph-0000000000000002-635a7f08-8fa6-11eb-8284-321fabd1400c failed to get subvolume info for the vol csi-vol-635a7f08-8fa6-11eb-8284-321fabd1400c: rados: ret=-22, Invalid argument: "invalid value specified for ceph.dir.subvolume"
E0329 10:58:00.605065       1 utils.go:136] ID: 644 Req-ID: 0001-0009-rook-ceph-0000000000000002-635a7f08-8fa6-11eb-8284-321fabd1400c GRPC error: rpc error: code = Internal desc = rados: ret=-22, Invalid argument: "invalid value specified for ceph.dir.subvolume"
E0329 10:58:01.094945       1 volume.go:87] ID: 646 Req-ID: 0001-0009-rook-ceph-0000000000000002-636093a6-8fa6-11eb-8284-321fabd1400c failed to get subvolume info for the vol csi-vol-636093a6-8fa6-11eb-8284-321fabd1400c: rados: ret=-22, Invalid argument: "invalid value specified for ceph.dir.subvolume"
E0329 10:58:01.095020       1 utils.go:136] ID: 646 Req-ID: 0001-0009-rook-ceph-0000000000000002-636093a6-8fa6-11eb-8284-321fabd1400c GRPC error: rpc error: code = Internal desc = rados: ret=-22, Invalid argument: "invalid value specified for ceph.dir.subvolume"
E0329 10:59:04.074531       1 volume.go:87] ID: 649 Req-ID: 0001-0009-rook-ceph-0000000000000002-635bff22-8fa6-11eb-8284-321fabd1400c failed to get subvolume info for the vol csi-vol-635bff22-8fa6-11eb-8284-321fabd1400c: rados: ret=-22, Invalid argument: "invalid value specified for ceph.dir.subvolume"
E0329 10:59:04.074594       1 utils.go:136] ID: 649 Req-ID: 0001-0009-rook-ceph-0000000000000002-635bff22-8fa6-11eb-8284-321fabd1400c GRPC error: rpc error: code = Internal desc = rados: ret=-22, Invalid argument: "invalid value specified for ceph.dir.subvolume"
E0329 10:59:04.681709       1 volume.go:87] ID: 651 Req-ID: 0001-0009-rook-ceph-0000000000000002-635a7f08-8fa6-11eb-8284-321fabd1400c failed to get subvolume info for the vol csi-vol-635a7f08-8fa6-11eb-8284-321fabd1400c: rados: ret=-22, Invalid argument: "invalid value specified for ceph.dir.subvolume"
E0329 10:59:04.681808       1 utils.go:136] ID: 651 Req-ID: 0001-0009-rook-ceph-0000000000000002-635a7f08-8fa6-11eb-8284-321fabd1400c GRPC error: rpc error: code = Internal desc = rados: ret=-22, Invalid argument: "invalid value specified for ceph.dir.subvolume"
E0329 10:59:05.181935       1 volume.go:87] ID: 653 Req-ID: 0001-0009-rook-ceph-0000000000000002-636093a6-8fa6-11eb-8284-321fabd1400c failed to get subvolume info for the vol csi-vol-636093a6-8fa6-11eb-8284-321fabd1400c: rados: ret=-22, Invalid argument: "invalid value specified for ceph.dir.subvolume"
E0329 10:59:05.182006       1 utils.go:136] ID: 653 Req-ID: 0001-0009-rook-ceph-0000000000000002-636093a6-8fa6-11eb-8284-321fabd1400c GRPC error: rpc error: code = Internal desc = rados: ret=-22, Invalid argument: "invalid value specified for ceph.dir.subvolume"

Additional context

This is 4 OSD cluster on two nodes and 3 mons. The Ceph cluster is healthy and also has NFS enabled on this cephfs filesystem.

TheDJVG commented 3 years ago

Looks like I get the same error when I manually try to create a subvolume:

[root@rook-ceph-tools-6f58686b5d-8rnnf /]# ceph fs subvolume create mainfs testdir
Error EINVAL: invalid value specified for ceph.dir.subvolume
kotreshhr commented 3 years ago

@TheDJVG Could you upload ceph logs (mgr, monitor, mds logs)?

TheDJVG commented 3 years ago

@kotreshhr sure thing. I think there's a problem in ceph itself. Currently my account is still pending for the ceph tracker so I cannot open an issue there.

Logs attached. mds_debug.log mgr.log mon.log

TheDJVG commented 3 years ago

I think I have found why it was failing, it's working now:

[root@rook-ceph-tools-6f58686b5d-lrq8x /]# ceph fs subvolume create mainfs testing 
[root@rook-ceph-tools-6f58686b5d-lrq8x /]# ceph fs subvolume ls mainfs             
[
    {
        "name": "testing"
    }
]

It started working after I applied setfattr -n ceph.dir.subvolume -v 0 . for some reason ceph.dir.subvolume was set on /. It's unclear to my why that happened as I've only used ceph-csi and not mounted the directories manually.

Madhu-1 commented 3 years ago

@TheDJVG Thanks, Am closing this one as it's not an issue from the cephcsi side.

Expro commented 5 months ago

Today I have hit same issue after upgrading Ceph from 17.x to 18.x. Solution was the same as above, but I have theory how it happened - I had empty subvolumeGroup value provided for CephFS CSI driver helm chart. It seems like it was acceptable value for Ceph in version 17.x, but no longer valid for Ceph 18.x.