ceph / ceph-csi

CSI driver for Ceph
Apache License 2.0
1.27k stars 539 forks source link

cephfs: ceph-fuse: impossible to unmount a volume that has been deleted #4249

Closed gman0 closed 10 months ago

gman0 commented 11 months ago

Describe the bug

Deleting a CephFS volume that is still mounted (ceph-fuse) makes it impossible to delete the Pods that use it, and they remain in Terminating state.

Ceph tracker https://tracker.ceph.com/issues/63471

Environment details

Steps to reproduce

  1. Create a CephFS PVC with mounter set to fuse
  2. Create a Pod that mounts the PVC
  3. Delete the CephFS subvol
  4. Delete the Pod created in step 2

Actual results

Not able to delete pods whose backing CephFS volumes have been deleted -- if they are mounted with ceph-fuse. The kernel client returns EACCES/ESTALE instead of ENOENT, which is correctly recognized as a "corrupted mount", and unmount is correctly performed.

Expected behavior

It should be possible to delete Pods whose volumes went missing.

Logs

I1108 10:59:02.171220 2528431 cephcsi.go:199] Driver version: v3.9.0 and Git version: c6db73f0daef5570756b4257043c00bf58b5fd3e
I1108 10:59:02.171633 2528431 cephcsi.go:276] Initial PID limit is set to -1
I1108 10:59:02.171713 2528431 cephcsi.go:282] Reconfigured PID limit to -1 (max)
I1108 10:59:02.171786 2528431 cephcsi.go:231] Starting driver type: cephfs with name: cephfs.csi.ceph.com
I1108 10:59:02.183844 2528431 volumemounter.go:79] loaded mounter: kernel
I1108 10:59:02.195554 2528431 volumemounter.go:90] loaded mounter: fuse
I1108 10:59:02.197931 2528431 mount_linux.go:284] Detected umount with safe 'not mounted' behavior
I1108 10:59:02.198322 2528431 server.go:126] Listening for connections on address: &net.UnixAddr{Name:"//csi/csi.sock", Net:"unix"}
I1108 10:59:03.049856 2528431 utils.go:195] ID: 1 GRPC call: /csi.v1.Identity/GetPluginInfo
I1108 10:59:03.050920 2528431 utils.go:206] ID: 1 GRPC request: {}
I1108 10:59:03.050938 2528431 identityserver-default.go:39] ID: 1 Using default GetPluginInfo
I1108 10:59:03.051092 2528431 utils.go:212] ID: 1 GRPC response: {"name":"cephfs.csi.ceph.com","vendor_version":"v3.9.0"}
I1108 10:59:03.190397 2528431 utils.go:195] ID: 2 GRPC call: /csi.v1.Node/NodeGetInfo
I1108 10:59:03.190533 2528431 utils.go:206] ID: 2 GRPC request: {}
I1108 10:59:03.190564 2528431 nodeserver-default.go:51] ID: 2 Using default NodeGetInfo
I1108 10:59:03.190706 2528431 utils.go:212] ID: 2 GRPC response: {"accessible_topology":{},"node_id":"rvasek-1-27-6-2-qqbsjsnaopix-node-0"}

... creating a Consumer Pod now ...

I1108 11:00:08.624520 2528431 utils.go:195] ID: 3 Req-ID: 4e3a7638-a627-488c-9b53-e396b1a9fdb7 GRPC call: /csi.v1.Node/NodeStageVolume
I1108 11:00:08.625407 2528431 utils.go:206] ID: 3 Req-ID: 4e3a7638-a627-488c-9b53-e396b1a9fdb7 GRPC request: {"secrets":"***stripped***","staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/cephfs.manila.csi.openstack.org/b6110aa34e2b984d9aebeb152de08974617a411c3d956c4b90c640a7c320886c/globalmount","volume_capability":{"AccessType":{"Mount":{}},"access_mode":{"mode":5}},"volume_context":{"monitors":"188.185.66.208:6790,188.184.94.56:6790,188.184.86.25:6790","mounter":"fuse","provisionVolume":"false","rootPath":"/volumes/_nogroup/8462984d-1b9a-4138-959c-a054cac3f574/abe8b3ed-c133-4811-bd09-1592fa880aff"},"volume_id":"4e3a7638-a627-488c-9b53-e396b1a9fdb7"}
I1108 11:00:08.625922 2528431 nodeserver.go:293] ID: 3 Req-ID: 4e3a7638-a627-488c-9b53-e396b1a9fdb7 cephfs: mounting volume 4e3a7638-a627-488c-9b53-e396b1a9fdb7 with Ceph FUSE driver
I1108 11:00:08.684975 2528431 cephcmds.go:105] ID: 3 Req-ID: 4e3a7638-a627-488c-9b53-e396b1a9fdb7 command succeeded: ceph-fuse [/var/lib/kubelet/plugins/kubernetes.io/csi/cephfs.manila.csi.openstack.org/b6110aa34e2b984d9aebeb152de08974617a411c3d956c4b90c640a7c320886c/globalmount -m 188.185.66.208:6790,188.184.94.56:6790,188.184.86.25:6790 -c /etc/ceph/ceph.conf -n client.pvc-26c27800-98e5-4ae8-8430-7c1e1c70deda --keyfile=***stripped*** -r /volumes/_nogroup/8462984d-1b9a-4138-959c-a054cac3f574/abe8b3ed-c133-4811-bd09-1592fa880aff -o nonempty]
I1108 11:00:08.685227 2528431 nodeserver.go:248] ID: 3 Req-ID: 4e3a7638-a627-488c-9b53-e396b1a9fdb7 cephfs: successfully mounted volume 4e3a7638-a627-488c-9b53-e396b1a9fdb7 to /var/lib/kubelet/plugins/kubernetes.io/csi/cephfs.manila.csi.openstack.org/b6110aa34e2b984d9aebeb152de08974617a411c3d956c4b90c640a7c320886c/globalmount
I1108 11:00:08.685497 2528431 utils.go:212] ID: 3 Req-ID: 4e3a7638-a627-488c-9b53-e396b1a9fdb7 GRPC response: {}
I1108 11:00:08.705041 2528431 utils.go:195] ID: 4 Req-ID: 4e3a7638-a627-488c-9b53-e396b1a9fdb7 GRPC call: /csi.v1.Node/NodePublishVolume
I1108 11:00:08.705860 2528431 utils.go:206] ID: 4 Req-ID: 4e3a7638-a627-488c-9b53-e396b1a9fdb7 GRPC request: {"staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/cephfs.manila.csi.openstack.org/b6110aa34e2b984d9aebeb152de08974617a411c3d956c4b90c640a7c320886c/globalmount","target_path":"/var/lib/kubelet/pods/a05a510a-de69-4793-b5a4-d36ab83df792/volumes/kubernetes.io~csi/pvc-26c27800-98e5-4ae8-8430-7c1e1c70deda/mount","volume_capability":{"AccessType":{"Mount":{}},"access_mode":{"mode":5}},"volume_context":{"monitors":"188.185.66.208:6790,188.184.94.56:6790,188.184.86.25:6790","mounter":"fuse","provisionVolume":"false","rootPath":"/volumes/_nogroup/8462984d-1b9a-4138-959c-a054cac3f574/abe8b3ed-c133-4811-bd09-1592fa880aff"},"volume_id":"4e3a7638-a627-488c-9b53-e396b1a9fdb7"}
I1108 11:00:08.716707 2528431 cephcmds.go:105] ID: 4 Req-ID: 4e3a7638-a627-488c-9b53-e396b1a9fdb7 command succeeded: mount [-o bind,_netdev /var/lib/kubelet/plugins/kubernetes.io/csi/cephfs.manila.csi.openstack.org/b6110aa34e2b984d9aebeb152de08974617a411c3d956c4b90c640a7c320886c/globalmount /var/lib/kubelet/pods/a05a510a-de69-4793-b5a4-d36ab83df792/volumes/kubernetes.io~csi/pvc-26c27800-98e5-4ae8-8430-7c1e1c70deda/mount]
I1108 11:00:08.717337 2528431 nodeserver.go:530] ID: 4 Req-ID: 4e3a7638-a627-488c-9b53-e396b1a9fdb7 cephfs: successfully bind-mounted volume 4e3a7638-a627-488c-9b53-e396b1a9fdb7 to /var/lib/kubelet/pods/a05a510a-de69-4793-b5a4-d36ab83df792/volumes/kubernetes.io~csi/pvc-26c27800-98e5-4ae8-8430-7c1e1c70deda/mount
I1108 11:00:08.717942 2528431 utils.go:212] ID: 4 Req-ID: 4e3a7638-a627-488c-9b53-e396b1a9fdb7 GRPC response: {}
I1108 11:00:44.879709 2528431 utils.go:195] ID: 5 GRPC call: /csi.v1.Node/NodeGetVolumeStats
I1108 11:00:44.879775 2528431 utils.go:206] ID: 5 GRPC request: {"volume_id":"4e3a7638-a627-488c-9b53-e396b1a9fdb7","volume_path":"/var/lib/kubelet/pods/a05a510a-de69-4793-b5a4-d36ab83df792/volumes/kubernetes.io~csi/pvc-26c27800-98e5-4ae8-8430-7c1e1c70deda/mount"}
I1108 11:00:44.884865 2528431 utils.go:212] ID: 5 GRPC response: {"usage":[{"available":1073741824,"total":1073741824,"unit":1}]}

--- deleting the subvol now ---

I1108 11:02:16.305622 2528431 utils.go:195] ID: 6 GRPC call: /csi.v1.Node/NodeGetVolumeStats
I1108 11:02:16.305702 2528431 utils.go:206] ID: 6 GRPC request: {"volume_id":"4e3a7638-a627-488c-9b53-e396b1a9fdb7","volume_path":"/var/lib/kubelet/pods/a05a510a-de69-4793-b5a4-d36ab83df792/volumes/kubernetes.io~csi/pvc-26c27800-98e5-4ae8-8430-7c1e1c70deda/mount"}
E1108 11:02:16.511753 2528431 utils.go:210] ID: 6 GRPC error: rpc error: code = InvalidArgument desc = failed to get stat for targetpath "/var/lib/kubelet/pods/a05a510a-de69-4793-b5a4-d36ab83df792/volumes/kubernetes.io~csi/pvc-26c27800-98e5-4ae8-8430-7c1e1c70deda/mount": stat /var/lib/kubelet/pods/a05a510a-de69-4793-b5a4-d36ab83df792/volumes/kubernetes.io~csi/pvc-26c27800-98e5-4ae8-8430-7c1e1c70deda/mount: no such file or directory

--- deleting the consumer Pod now ---

I1108 11:02:22.761006 2528431 utils.go:195] ID: 7 Req-ID: 4e3a7638-a627-488c-9b53-e396b1a9fdb7 GRPC call: /csi.v1.Node/NodeUnpublishVolume
I1108 11:02:22.761090 2528431 utils.go:206] ID: 7 Req-ID: 4e3a7638-a627-488c-9b53-e396b1a9fdb7 GRPC request: {"target_path":"/var/lib/kubelet/pods/a05a510a-de69-4793-b5a4-d36ab83df792/volumes/kubernetes.io~csi/pvc-26c27800-98e5-4ae8-8430-7c1e1c70deda/mount","volume_id":"4e3a7638-a627-488c-9b53-e396b1a9fdb7"}
E1108 11:02:22.763044 2528431 nodeserver.go:550] ID: 7 Req-ID: 4e3a7638-a627-488c-9b53-e396b1a9fdb7 stat failed: stat /var/lib/kubelet/pods/a05a510a-de69-4793-b5a4-d36ab83df792/volumes/kubernetes.io~csi/pvc-26c27800-98e5-4ae8-8430-7c1e1c70deda/mount: no such file or directory
I1108 11:02:22.763057 2528431 nodeserver.go:554] ID: 7 Req-ID: 4e3a7638-a627-488c-9b53-e396b1a9fdb7 targetPath: /var/lib/kubelet/pods/a05a510a-de69-4793-b5a4-d36ab83df792/volumes/kubernetes.io~csi/pvc-26c27800-98e5-4ae8-8430-7c1e1c70deda/mount has already been deleted
I1108 11:02:22.763091 2528431 utils.go:212] ID: 7 Req-ID: 4e3a7638-a627-488c-9b53-e396b1a9fdb7 GRPC response: {}
I1108 11:02:23.364253 2528431 utils.go:195] ID: 8 Req-ID: 4e3a7638-a627-488c-9b53-e396b1a9fdb7 GRPC call: /csi.v1.Node/NodeUnpublishVolume
I1108 11:02:23.364296 2528431 utils.go:206] ID: 8 Req-ID: 4e3a7638-a627-488c-9b53-e396b1a9fdb7 GRPC request: {"target_path":"/var/lib/kubelet/pods/a05a510a-de69-4793-b5a4-d36ab83df792/volumes/kubernetes.io~csi/pvc-26c27800-98e5-4ae8-8430-7c1e1c70deda/mount","volume_id":"4e3a7638-a627-488c-9b53-e396b1a9fdb7"}
E1108 11:02:23.365797 2528431 nodeserver.go:550] ID: 8 Req-ID: 4e3a7638-a627-488c-9b53-e396b1a9fdb7 stat failed: stat /var/lib/kubelet/pods/a05a510a-de69-4793-b5a4-d36ab83df792/volumes/kubernetes.io~csi/pvc-26c27800-98e5-4ae8-8430-7c1e1c70deda/mount: no such file or directory
I1108 11:02:23.365812 2528431 nodeserver.go:554] ID: 8 Req-ID: 4e3a7638-a627-488c-9b53-e396b1a9fdb7 targetPath: /var/lib/kubelet/pods/a05a510a-de69-4793-b5a4-d36ab83df792/volumes/kubernetes.io~csi/pvc-26c27800-98e5-4ae8-8430-7c1e1c70deda/mount has already been deleted
I1108 11:02:23.365844 2528431 utils.go:212] ID: 8 Req-ID: 4e3a7638-a627-488c-9b53-e396b1a9fdb7 GRPC response: {}

... retries to NodeUnpublishVolume never succeed ...

--- running `umount /var/lib/kubelet/pods/a05a510a-de69-4793-b5a4-d36ab83df792/volumes/kubernetes.io~csi/pvc-26c27800-98e5-4ae8-8430-7c1e1c70deda/mount` on the node ---

I1108 11:03:26.593905 2528431 utils.go:195] ID: 14 Req-ID: 4e3a7638-a627-488c-9b53-e396b1a9fdb7 GRPC call: /csi.v1.Node/NodeUnpublishVolume
I1108 11:03:26.594122 2528431 utils.go:206] ID: 14 Req-ID: 4e3a7638-a627-488c-9b53-e396b1a9fdb7 GRPC request: {"target_path":"/var/lib/kubelet/pods/a05a510a-de69-4793-b5a4-d36ab83df792/volumes/kubernetes.io~csi/pvc-26c27800-98e5-4ae8-8430-7c1e1c70deda/mount","volume_id":"4e3a7638-a627-488c-9b53-e396b1a9fdb7"}
I1108 11:03:26.594802 2528431 utils.go:212] ID: 14 Req-ID: 4e3a7638-a627-488c-9b53-e396b1a9fdb7 GRPC response: {}
I1108 11:03:26.694972 2528431 utils.go:195] ID: 15 Req-ID: 4e3a7638-a627-488c-9b53-e396b1a9fdb7 GRPC call: /csi.v1.Node/NodeUnstageVolume
I1108 11:03:26.695226 2528431 utils.go:206] ID: 15 Req-ID: 4e3a7638-a627-488c-9b53-e396b1a9fdb7 GRPC request: {"staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/cephfs.manila.csi.openstack.org/b6110aa34e2b984d9aebeb152de08974617a411c3d956c4b90c640a7c320886c/globalmount","volume_id":"4e3a7638-a627-488c-9b53-e396b1a9fdb7"}
E1108 11:03:26.696968 2528431 nodeserver.go:619] ID: 15 Req-ID: 4e3a7638-a627-488c-9b53-e396b1a9fdb7 stat failed: stat /var/lib/kubelet/plugins/kubernetes.io/csi/cephfs.manila.csi.openstack.org/b6110aa34e2b984d9aebeb152de08974617a411c3d956c4b90c640a7c320886c/globalmount: no such file or directory
I1108 11:03:26.697055 2528431 nodeserver.go:623] ID: 15 Req-ID: 4e3a7638-a627-488c-9b53-e396b1a9fdb7 targetPath: /var/lib/kubelet/plugins/kubernetes.io/csi/cephfs.manila.csi.openstack.org/b6110aa34e2b984d9aebeb152de08974617a411c3d956c4b90c640a7c320886c/globalmount has already been deleted
I1108 11:03:26.697154 2528431 utils.go:212] ID: 15 Req-ID: 4e3a7638-a627-488c-9b53-e396b1a9fdb7 GRPC response: {}
I1108 11:03:27.301824 2528431 utils.go:195] ID: 16 Req-ID: 4e3a7638-a627-488c-9b53-e396b1a9fdb7 GRPC call: /csi.v1.Node/NodeUnstageVolume
I1108 11:03:27.302034 2528431 utils.go:206] ID: 16 Req-ID: 4e3a7638-a627-488c-9b53-e396b1a9fdb7 GRPC request: {"staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/cephfs.manila.csi.openstack.org/b6110aa34e2b984d9aebeb152de08974617a411c3d956c4b90c640a7c320886c/globalmount","volume_id":"4e3a7638-a627-488c-9b53-e396b1a9fdb7"}
E1108 11:03:27.303093 2528431 nodeserver.go:619] ID: 16 Req-ID: 4e3a7638-a627-488c-9b53-e396b1a9fdb7 stat failed: stat /var/lib/kubelet/plugins/kubernetes.io/csi/cephfs.manila.csi.openstack.org/b6110aa34e2b984d9aebeb152de08974617a411c3d956c4b90c640a7c320886c/globalmount: no such file or directory
I1108 11:03:27.303108 2528431 nodeserver.go:623] ID: 16 Req-ID: 4e3a7638-a627-488c-9b53-e396b1a9fdb7 targetPath: /var/lib/kubelet/plugins/kubernetes.io/csi/cephfs.manila.csi.openstack.org/b6110aa34e2b984d9aebeb152de08974617a411c3d956c4b90c640a7c320886c/globalmount has already been deleted
I1108 11:03:27.303378 2528431 utils.go:212] ID: 16 Req-ID: 4e3a7638-a627-488c-9b53-e396b1a9fdb7 GRPC response: {}

... retries NodeUnstageVolume never succeed ...

--- running `umount /var/lib/kubelet/plugins/kubernetes.io/csi/cephfs.manila.csi.openstack.org/b6110aa34e2b984d9aebeb152de08974617a411c3d956c4b90c640a7c320886c/globalmount`  on the node ---

I1108 11:07:36.590901 2528431 utils.go:195] ID: 24 Req-ID: 4e3a7638-a627-488c-9b53-e396b1a9fdb7 GRPC call: /csi.v1.Node/NodeUnstageVolume
I1108 11:07:36.590937 2528431 utils.go:206] ID: 24 Req-ID: 4e3a7638-a627-488c-9b53-e396b1a9fdb7 GRPC request: {"staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/cephfs.manila.csi.openstack.org/b6110aa34e2b984d9aebeb152de08974617a411c3d956c4b90c640a7c320886c/globalmount","volume_id":"4e3a7638-a627-488c-9b53-e396b1a9fdb7"}
I1108 11:07:36.590972 2528431 utils.go:212] ID: 24 Req-ID: 4e3a7638-a627-488c-9b53-e396b1a9fdb7 GRPC response: {}

As seen in the log notes, it is possible to proceed with the Pod deletion if both vol publish and staging paths are unmounted manually.

I have prepared a patch for this issue and will send it shortly.

github-actions[bot] commented 10 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

github-actions[bot] commented 10 months ago

This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation.