ceph / ceph-csi

CSI driver for Ceph
Apache License 2.0
1.27k stars 539 forks source link

CephFS multiple attachments to same host results in "an operation with the given Volume ID already exists" #3511

Closed JohnKiller closed 1 year ago

JohnKiller commented 1 year ago

Describe the bug

When the container orchestrator launches multiple instances on the same host that all needs the same CephFS volume, if they are started too close toghether, the node plugin errors with an operation with the given Volume ID already exists. When the orchestrator tries again after some time, it then works.

Since cephfs does support multi-write, it shouldn't matter that the node is still mounting the volume multiple times concurrently.

Environment details

Steps to reproduce

Steps to reproduce the behavior:

Launch multiple instances that needs to mount the same CephFS volume on the same host. If they start too close together, some will fail to mount

Actual results

Some allocations won't mount the volume.

Expected behavior

All allocations should mount the volume.

Logs

I1109 15:45:31.395759       1 utils.go:195] ID: 8 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC call: /csi.v1.Node/NodeUnpublishVolume
I1109 15:45:31.395947       1 utils.go:206] ID: 8 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC request: {"target_path":"/local/csi/per-alloc/87054419-bbad-a1f1-3b80-fef216d6ab63/lagrondaia/rw-file-system-multi-node-multi-writer","volume_id":"0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003"}
E1109 15:45:31.396104       1 nodeserver.go:487] ID: 8 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 stat failed: stat /local/csi/per-alloc/87054419-bbad-a1f1-3b80-fef216d6ab63/lagrondaia/rw-file-system-multi-node-multi-writer: no such file or directory
I1109 15:45:31.396144       1 nodeserver.go:491] ID: 8 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 targetPath: /local/csi/per-alloc/87054419-bbad-a1f1-3b80-fef216d6ab63/lagrondaia/rw-file-system-multi-node-multi-writer has already been deleted
I1109 15:45:31.396205       1 utils.go:212] ID: 8 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC response: {}
I1109 15:45:31.396939       1 utils.go:195] ID: 9 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC call: /csi.v1.Node/NodeUnstageVolume
I1109 15:45:31.397069       1 utils.go:206] ID: 9 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC request: {"staging_target_path":"/local/csi/staging/lagrondaia/rw-file-system-multi-node-multi-writer","volume_id":"0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003"}
I1109 15:45:31.397244       1 utils.go:212] ID: 9 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC response: {}
I1109 15:45:37.762508       1 utils.go:195] ID: 10 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC call: /csi.v1.Node/NodeStageVolume
I1109 15:45:37.762839       1 utils.go:206] ID: 10 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC request: {"secrets":"***stripped***","staging_target_path":"/local/csi/staging/lagrondaia/rw-file-system-multi-node-multi-writer","volume_capability":{"AccessType":{"Mount":{}},"access_mode":{"mode":5}},"volume_context":{"clusterID":"7955282e-39e6-4db4-87d0-153b0ce2f37c","fsName":"cephfs","mounter":"fuse","subvolumeName":"csi-vol-80def62f-30ec-11ed-901a-0242ac110003","subvolumePath":"/volumes/csi/csi-vol-80def62f-30ec-11ed-901a-0242ac110003/1b116438-9f4d-4d34-a0e8-cc280c29a024"},"volume_id":"0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003"}
I1109 15:45:37.763027       1 utils.go:195] ID: 11 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC call: /csi.v1.Node/NodeStageVolume
I1109 15:45:37.763187       1 utils.go:206] ID: 11 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC request: {"secrets":"***stripped***","staging_target_path":"/local/csi/staging/lagrondaia/rw-file-system-multi-node-multi-writer","volume_capability":{"AccessType":{"Mount":{}},"access_mode":{"mode":5}},"volume_context":{"clusterID":"7955282e-39e6-4db4-87d0-153b0ce2f37c","fsName":"cephfs","mounter":"fuse","subvolumeName":"csi-vol-80def62f-30ec-11ed-901a-0242ac110003","subvolumePath":"/volumes/csi/csi-vol-80def62f-30ec-11ed-901a-0242ac110003/1b116438-9f4d-4d34-a0e8-cc280c29a024"},"volume_id":"0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003"}
E1109 15:45:37.763238       1 nodeserver.go:136] ID: 11 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 an operation with the given Volume ID 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 already exists
E1109 15:45:37.763271       1 utils.go:210] ID: 11 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 already exists
I1109 15:45:37.769488       1 utils.go:195] ID: 12 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC call: /csi.v1.Node/NodeUnpublishVolume
I1109 15:45:37.769605       1 utils.go:206] ID: 12 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC request: {"target_path":"/local/csi/per-alloc/1ef16f39-49cb-bd7f-2c50-c720fc66f914/lagrondaia/rw-file-system-multi-node-multi-writer","volume_id":"0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003"}
E1109 15:45:37.769653       1 nodeserver.go:487] ID: 12 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 stat failed: stat /local/csi/per-alloc/1ef16f39-49cb-bd7f-2c50-c720fc66f914/lagrondaia/rw-file-system-multi-node-multi-writer: no such file or directory
I1109 15:45:37.769691       1 nodeserver.go:491] ID: 12 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 targetPath: /local/csi/per-alloc/1ef16f39-49cb-bd7f-2c50-c720fc66f914/lagrondaia/rw-file-system-multi-node-multi-writer has already been deleted
I1109 15:45:37.769711       1 utils.go:212] ID: 12 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC response: {}
I1109 15:45:37.770267       1 utils.go:195] ID: 13 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC call: /csi.v1.Node/NodeUnstageVolume
I1109 15:45:37.772198       1 utils.go:206] ID: 13 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC request: {"staging_target_path":"/local/csi/staging/lagrondaia/rw-file-system-multi-node-multi-writer","volume_id":"0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003"}
E1109 15:45:37.772242       1 nodeserver.go:540] ID: 13 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 an operation with the given Volume ID 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 already exists
E1109 15:45:37.772266       1 utils.go:210] ID: 13 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 already exists
I1109 15:45:37.777967       1 utils.go:195] ID: 14 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC call: /csi.v1.Node/NodeUnpublishVolume
I1109 15:45:37.778070       1 utils.go:206] ID: 14 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC request: {"target_path":"/local/csi/per-alloc/1ef16f39-49cb-bd7f-2c50-c720fc66f914/lagrondaia/rw-file-system-multi-node-multi-writer","volume_id":"0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003"}
E1109 15:45:37.778149       1 nodeserver.go:487] ID: 14 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 stat failed: stat /local/csi/per-alloc/1ef16f39-49cb-bd7f-2c50-c720fc66f914/lagrondaia/rw-file-system-multi-node-multi-writer: no such file or directory
I1109 15:45:37.778194       1 nodeserver.go:491] ID: 14 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 targetPath: /local/csi/per-alloc/1ef16f39-49cb-bd7f-2c50-c720fc66f914/lagrondaia/rw-file-system-multi-node-multi-writer has already been deleted
I1109 15:45:37.778225       1 utils.go:212] ID: 14 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC response: {}
I1109 15:45:37.778899       1 utils.go:195] ID: 15 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC call: /csi.v1.Node/NodeUnstageVolume
I1109 15:45:37.778985       1 utils.go:206] ID: 15 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC request: {"staging_target_path":"/local/csi/staging/lagrondaia/rw-file-system-multi-node-multi-writer","volume_id":"0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003"}
E1109 15:45:37.779016       1 nodeserver.go:540] ID: 15 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 an operation with the given Volume ID 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 already exists
E1109 15:45:37.779035       1 utils.go:210] ID: 15 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 already exists
I1109 15:45:37.802858       1 omap.go:88] ID: 10 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 got omap values: (pool="cephfs_metadata", namespace="csi", name="csi.volume.80def62f-30ec-11ed-901a-0242ac110003"): map[csi.imagename:csi-vol-80def62f-30ec-11ed-901a-0242ac110003 csi.volname:lagrondaia]
I1109 15:45:37.822515       1 nodeserver.go:247] ID: 10 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 cephfs: mounting volume 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 with Ceph FUSE driver
I1109 15:45:37.883819       1 cephcmds.go:105] ID: 10 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 command succeeded: ceph-fuse [/local/csi/staging/lagrondaia/rw-file-system-multi-node-multi-writer -m 192.168.110.101,192.168.110.102,192.168.110.103 -c /etc/ceph/ceph.conf -n client.admin --keyfile=***stripped*** -r /volumes/csi/csi-vol-80def62f-30ec-11ed-901a-0242ac110003/1b116438-9f4d-4d34-a0e8-cc280c29a024 -o nonempty --client_mds_namespace=cephfs]
I1109 15:45:37.883934       1 nodeserver.go:206] ID: 10 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 cephfs: successfully mounted volume 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 to /local/csi/staging/lagrondaia/rw-file-system-multi-node-multi-writer
I1109 15:45:37.884122       1 utils.go:212] ID: 10 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC response: {}
I1109 15:45:37.885413       1 utils.go:195] ID: 16 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC call: /csi.v1.Node/NodePublishVolume
I1109 15:45:37.885743       1 utils.go:206] ID: 16 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC request: {"secrets":"***stripped***","staging_target_path":"/local/csi/staging/lagrondaia/rw-file-system-multi-node-multi-writer","target_path":"/local/csi/per-alloc/1f178f99-0da5-3516-6670-d3796a85e77d/lagrondaia/rw-file-system-multi-node-multi-writer","volume_capability":{"AccessType":{"Mount":{}},"access_mode":{"mode":5}},"volume_context":{"clusterID":"7955282e-39e6-4db4-87d0-153b0ce2f37c","fsName":"cephfs","mounter":"fuse","subvolumeName":"csi-vol-80def62f-30ec-11ed-901a-0242ac110003","subvolumePath":"/volumes/csi/csi-vol-80def62f-30ec-11ed-901a-0242ac110003/1b116438-9f4d-4d34-a0e8-cc280c29a024"},"volume_id":"0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003"}
I1109 15:45:37.889954       1 cephcmds.go:105] ID: 16 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 command succeeded: mount [-o bind,_netdev /local/csi/staging/lagrondaia/rw-file-system-multi-node-multi-writer /local/csi/per-alloc/1f178f99-0da5-3516-6670-d3796a85e77d/lagrondaia/rw-file-system-multi-node-multi-writer]
I1109 15:45:37.890020       1 nodeserver.go:467] ID: 16 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 cephfs: successfully bind-mounted volume 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 to /local/csi/per-alloc/1f178f99-0da5-3516-6670-d3796a85e77d/lagrondaia/rw-file-system-multi-node-multi-writer
I1109 15:45:37.890080       1 utils.go:212] ID: 16 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC response: {}
I1109 15:45:39.780737       1 utils.go:195] ID: 17 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC call: /csi.v1.Node/NodeUnpublishVolume
I1109 15:45:39.780908       1 utils.go:206] ID: 17 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC request: {"target_path":"/local/csi/per-alloc/1ef16f39-49cb-bd7f-2c50-c720fc66f914/lagrondaia/rw-file-system-multi-node-multi-writer","volume_id":"0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003"}
E1109 15:45:39.780944       1 nodeserver.go:487] ID: 17 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 stat failed: stat /local/csi/per-alloc/1ef16f39-49cb-bd7f-2c50-c720fc66f914/lagrondaia/rw-file-system-multi-node-multi-writer: no such file or directory
I1109 15:45:39.780957       1 nodeserver.go:491] ID: 17 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 targetPath: /local/csi/per-alloc/1ef16f39-49cb-bd7f-2c50-c720fc66f914/lagrondaia/rw-file-system-multi-node-multi-writer has already been deleted
I1109 15:45:39.780975       1 utils.go:212] ID: 17 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC response: {}
I1109 15:46:00.241259       1 utils.go:195] ID: 18 GRPC call: /csi.v1.Identity/Probe
I1109 15:46:00.241374       1 utils.go:206] ID: 18 GRPC request: {}
I1109 15:46:00.241408       1 utils.go:212] ID: 18 GRPC response: {}
I1109 15:46:00.244573       1 utils.go:195] ID: 19 GRPC call: /csi.v1.Identity/Probe
I1109 15:46:00.244633       1 utils.go:206] ID: 19 GRPC request: {}
I1109 15:46:00.244657       1 utils.go:212] ID: 19 GRPC response: {}
I1109 15:46:00.245332       1 utils.go:195] ID: 20 GRPC call: /csi.v1.Node/NodeGetCapabilities
I1109 15:46:00.245434       1 utils.go:206] ID: 20 GRPC request: {}
I1109 15:46:00.245561       1 utils.go:212] ID: 20 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}}]}
I1109 15:46:07.810929       1 utils.go:195] ID: 21 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC call: /csi.v1.Node/NodePublishVolume
I1109 15:46:07.811167       1 utils.go:206] ID: 21 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC request: {"secrets":"***stripped***","staging_target_path":"/local/csi/staging/lagrondaia/rw-file-system-multi-node-multi-writer","target_path":"/local/csi/per-alloc/366ef5a4-05e9-6c1a-6bfa-eb7f97569eb0/lagrondaia/rw-file-system-multi-node-multi-writer","volume_capability":{"AccessType":{"Mount":{}},"access_mode":{"mode":5}},"volume_context":{"clusterID":"7955282e-39e6-4db4-87d0-153b0ce2f37c","fsName":"cephfs","mounter":"fuse","subvolumeName":"csi-vol-80def62f-30ec-11ed-901a-0242ac110003","subvolumePath":"/volumes/csi/csi-vol-80def62f-30ec-11ed-901a-0242ac110003/1b116438-9f4d-4d34-a0e8-cc280c29a024"},"volume_id":"0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003"}
I1109 15:46:07.816906       1 cephcmds.go:105] ID: 21 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 command succeeded: mount [-o bind,_netdev /local/csi/staging/lagrondaia/rw-file-system-multi-node-multi-writer /local/csi/per-alloc/366ef5a4-05e9-6c1a-6bfa-eb7f97569eb0/lagrondaia/rw-file-system-multi-node-multi-writer]
I1109 15:46:07.816970       1 nodeserver.go:467] ID: 21 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 cephfs: successfully bind-mounted volume 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 to /local/csi/per-alloc/366ef5a4-05e9-6c1a-6bfa-eb7f97569eb0/lagrondaia/rw-file-system-multi-node-multi-writer
I1109 15:46:07.817006       1 utils.go:212] ID: 21 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC response: {}
Madhu-1 commented 1 year ago
[🎩︎]mrajanna@fedora kubernetes-csi-addons $]cat log | grep -i "NodeStageVolume"
I1109 15:45:37.762508       1 utils.go:195] ID: 10 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC call: /csi.v1.Node/NodeStageVolume
I1109 15:45:37.763027       1 utils.go:195] ID: 11 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC call: /csi.v1.Node/NodeStageVolume
[🎩︎]mrajanna@fedora kubernetes-csi-addons $]cat log | grep -i "ID: 10"
I1109 15:45:37.762508       1 utils.go:195] ID: 10 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC call: /csi.v1.Node/NodeStageVolume
I1109 15:45:37.762839       1 utils.go:206] ID: 10 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC request: {"secrets":"***stripped***","staging_target_path":"/local/csi/staging/lagrondaia/rw-file-system-multi-node-multi-writer","volume_capability":{"AccessType":{"Mount":{}},"access_mode":{"mode":5}},"volume_context":{"clusterID":"7955282e-39e6-4db4-87d0-153b0ce2f37c","fsName":"cephfs","mounter":"fuse","subvolumeName":"csi-vol-80def62f-30ec-11ed-901a-0242ac110003","subvolumePath":"/volumes/csi/csi-vol-80def62f-30ec-11ed-901a-0242ac110003/1b116438-9f4d-4d34-a0e8-cc280c29a024"},"volume_id":"0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003"}
I1109 15:45:37.802858       1 omap.go:88] ID: 10 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 got omap values: (pool="cephfs_metadata", namespace="csi", name="csi.volume.80def62f-30ec-11ed-901a-0242ac110003"): map[csi.imagename:csi-vol-80def62f-30ec-11ed-901a-0242ac110003 csi.volname:lagrondaia]
I1109 15:45:37.822515       1 nodeserver.go:247] ID: 10 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 cephfs: mounting volume 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 with Ceph FUSE driver
I1109 15:45:37.883819       1 cephcmds.go:105] ID: 10 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 command succeeded: ceph-fuse [/local/csi/staging/lagrondaia/rw-file-system-multi-node-multi-writer -m 192.168.110.101,192.168.110.102,192.168.110.103 -c /etc/ceph/ceph.conf -n client.admin --keyfile=***stripped*** -r /volumes/csi/csi-vol-80def62f-30ec-11ed-901a-0242ac110003/1b116438-9f4d-4d34-a0e8-cc280c29a024 -o nonempty --client_mds_namespace=cephfs]
I1109 15:45:37.883934       1 nodeserver.go:206] ID: 10 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 cephfs: successfully mounted volume 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 to /local/csi/staging/lagrondaia/rw-file-system-multi-node-multi-writer
I1109 15:45:37.884122       1 utils.go:212] ID: 10 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC response: {}
[🎩︎]mrajanna@fedora kubernetes-csi-addons $]cat log | grep -i "ID: 11"
I1109 15:45:37.763027       1 utils.go:195] ID: 11 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC call: /csi.v1.Node/NodeStageVolume
I1109 15:45:37.763187       1 utils.go:206] ID: 11 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC request: {"secrets":"***stripped***","staging_target_path":"/local/csi/staging/lagrondaia/rw-file-system-multi-node-multi-writer","volume_capability":{"AccessType":{"Mount":{}},"access_mode":{"mode":5}},"volume_context":{"clusterID":"7955282e-39e6-4db4-87d0-153b0ce2f37c","fsName":"cephfs","mounter":"fuse","subvolumeName":"csi-vol-80def62f-30ec-11ed-901a-0242ac110003","subvolumePath":"/volumes/csi/csi-vol-80def62f-30ec-11ed-901a-0242ac110003/1b116438-9f4d-4d34-a0e8-cc280c29a024"},"volume_id":"0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003"}
E1109 15:45:37.763238       1 nodeserver.go:136] ID: 11 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 an operation with the given Volume ID 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 already exists
E1109 15:45:37.763271       1 utils.go:210] ID: 11 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 already exists

You can see the CO is sending two NodeStageRequest for the same volume at the same time (only difference is millisecond times). As per the CSI specification NodeStage should make sure the volume is mounted on the given node only once if already mounted we should return succecss https://github.com/container-storage-interface/spec/blob/master/spec.md#nodestagevolume. To avoid consequences of mounting same volume twice we have a lock per volume at the csi driver. you will see operation already exist error until ongoing first request completes.

[🎩︎]mrajanna@fedora kubernetes-csi-addons $]cat log | grep -i "operation"
E1109 15:45:37.763238       1 nodeserver.go:136] ID: 11 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 an operation with the given Volume ID 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 already exists
E1109 15:45:37.763271       1 utils.go:210] ID: 11 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 already exists
E1109 15:45:37.772242       1 nodeserver.go:540] ID: 13 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 an operation with the given Volume ID 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 already exists
E1109 15:45:37.772266       1 utils.go:210] ID: 13 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 already exists
E1109 15:45:37.779016       1 nodeserver.go:540] ID: 15 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 an operation with the given Volume ID 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 already exists
E1109 15:45:37.779035       1 utils.go:210] ID: 15 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 already exists
[🎩︎]mrajanna@fedora kubernetes-csi-addons $]
[🎩︎]mrajanna@fedora kubernetes-csi-addons $]
[🎩︎]mrajanna@fedora kubernetes-csi-addons $]cat log | grep -i "ID: 13"
I1109 15:45:37.770267       1 utils.go:195] ID: 13 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC call: /csi.v1.Node/NodeUnstageVolume
I1109 15:45:37.772198       1 utils.go:206] ID: 13 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC request: {"staging_target_path":"/local/csi/staging/lagrondaia/rw-file-system-multi-node-multi-writer","volume_id":"0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003"}
E1109 15:45:37.772242       1 nodeserver.go:540] ID: 13 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 an operation with the given Volume ID 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 already exists
E1109 15:45:37.772266       1 utils.go:210] ID: 13 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 already exists
[🎩︎]mrajanna@fedora kubernetes-csi-addons $]cat log | grep -i "ID: 15"
I1109 15:45:37.778899       1 utils.go:195] ID: 15 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC call: /csi.v1.Node/NodeUnstageVolume
I1109 15:45:37.778985       1 utils.go:206] ID: 15 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC request: {"staging_target_path":"/local/csi/staging/lagrondaia/rw-file-system-multi-node-multi-writer","volume_id":"0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003"}
E1109 15:45:37.779016       1 nodeserver.go:540] ID: 15 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 an operation with the given Volume ID 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 already exists
E1109 15:45:37.779035       1 utils.go:210] ID: 15 Req-ID: 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0024-7955282e-39e6-4db4-87d0-153b0ce2f37c-0000000000000001-80def62f-30ec-11ed-901a-0242ac110003 already exists

Same explanation as above, but here it's NodeUnstage Request.

JohnKiller commented 1 year ago

Thanks for your response. So it's a problem with Nomad that should have waited for the first allocation to complete before trying to mount it again? I'll open an issue there

Madhu-1 commented 1 year ago

Thanks for your response. So it's a problem with Nomad that should have waited for the first allocation to complete before trying to mount it again? I'll open an issue there

IMO It's more of an enhancement, not a problem.

JohnKiller commented 1 year ago

Eh it causes downtime during node draining where multiple tasks are being rescheduled at once