ceph / ceph-csi

CSI driver for Ceph
Apache License 2.0
1.27k stars 539 forks source link

API call not implemented server-side: No handler found for 'fs subvolume metadata set' in ceph 15.2.17 #3347

Closed wanghongzhou closed 1 year ago

wanghongzhou commented 2 years ago

Describe the bug

failed to provision volume with StorageClass "csi-cephfs-sc": rpc error: code = Internal desc = failed to set metadata key "csi.storage.k8s.io/pvc/name", value "cephfs-pvc" on subvolume &{0xc000948160 a6893219-4fa6-437e-9de1-79c77c835fdb true 0xc0002f6648}: API call not implemented server-side: No handler found for 'fs subvolume metadata set'

Environment details

Additional context

8月 27 22:30:44 ceph-node-1 ceph-mgr[1974609]: 2022-08-27T22:30:44.308+0800 7f6978a68700 -1 mgr.server reply reply (22) Invalid argument No handler found for 'fs subvolume metadata set' 8月 27 22:35:00 ceph-node-1 ceph-mgr[1974609]: 2022-08-27T22:35:00.376+0800 7f6979269700 -1 client.0 error registering admin socket command: (17) File exists 8月 27 22:35:00 ceph-node-1 ceph-mgr[1974609]: 2022-08-27T22:35:00.376+0800 7f6979269700 -1 client.0 error registering admin socket command: (17) File exists 8月 27 22:35:00 ceph-node-1 ceph-mgr[1974609]: 2022-08-27T22:35:00.376+0800 7f6979269700 -1 client.0 error registering admin socket command: (17) File exists 8月 27 22:35:00 ceph-node-1 ceph-mgr[1974609]: 2022-08-27T22:35:00.376+0800 7f6979269700 -1 client.0 error registering admin socket command: (17) File exists 8月 27 22:35:00 ceph-node-1 ceph-mgr[1974609]: 2022-08-27T22:35:00.376+0800 7f6979269700 -1 client.0 error registering admin socket command: (17) File exists 8月 27 22:35:00 ceph-node-1 ceph-mgr[1974609]: 2022-08-27T22:35:00.497+0800 7f6978a68700 -1 mgr.server reply reply (22) Invalid argument No handler found for 'fs subvolume metadata set' 8月 27 22:40:00 ceph-node-1 ceph-mgr[1974609]: 2022-08-27T22:40:00.553+0800 7f6979269700 -1 client.0 error registering admin socket command: (17) File exists 8月 27 22:40:00 ceph-node-1 ceph-mgr[1974609]: 2022-08-27T22:40:00.553+0800 7f6979269700 -1 client.0 error registering admin socket command: (17) File exists 8月 27 22:40:00 ceph-node-1 ceph-mgr[1974609]: 2022-08-27T22:40:00.553+0800 7f6979269700 -1 client.0 error registering admin socket command: (17) File exists 8月 27 22:40:00 ceph-node-1 ceph-mgr[1974609]: 2022-08-27T22:40:00.553+0800 7f6979269700 -1 client.0 error registering admin socket command: (17) File exists 8月 27 22:40:00 ceph-node-1 ceph-mgr[1974609]: 2022-08-27T22:40:00.553+0800 7f6979269700 -1 client.0 error registering admin socket command: (17) File exists 8月 27 22:40:00 ceph-node-1 ceph-mgr[1974609]: 2022-08-27T22:40:00.646+0800 7f6978a68700 -1 mgr.server reply reply (22) Invalid argument No handler found for 'fs subvolume metadata set' 8月 27 22:41:02 ceph-node-1 ceph-mgr[1974609]: 2022-08-27T22:41:02.648+0800 7f6978a68700 -1 mgr.server reply reply (22) Invalid argument No handler found for 'fs subvolume metadata set' 8月 27 22:45:00 ceph-node-1 ceph-mgr[1974609]: 2022-08-27T22:45:00.701+0800 7f6979269700 -1 client.0 error registering admin socket command: (17) File exists 8月 27 22:45:00 ceph-node-1 ceph-mgr[1974609]: 2022-08-27T22:45:00.701+0800 7f6979269700 -1 client.0 error registering admin socket command: (17) File exists 8月 27 22:45:00 ceph-node-1 ceph-mgr[1974609]: 2022-08-27T22:45:00.701+0800 7f6979269700 -1 client.0 error registering admin socket command: (17) File exists 8月 27 22:45:00 ceph-node-1 ceph-mgr[1974609]: 2022-08-27T22:45:00.701+0800 7f6979269700 -1 client.0 error registering admin socket command: (17) File exists 8月 27 22:45:00 ceph-node-1 ceph-mgr[1974609]: 2022-08-27T22:45:00.701+0800 7f6979269700 -1 client.0 error registering admin socket command: (17) File exists 8月 27 22:45:00 ceph-node-1 ceph-mgr[1974609]: 2022-08-27T22:45:00.843+0800 7f6978a68700 -1 mgr.server reply reply (22) Invalid argument No handler found for 'fs subvolume metadata set' 8月 27 22:47:08 ceph-node-1 systemd[1]: [/usr/lib/systemd/system/ceph-mgr@.service:15] Unknown lvalue 'LockPersonality' in section 'Service' 8月 27 22:47:08 ceph-node-1 systemd[1]: [/usr/lib/systemd/system/ceph-mgr@.service:18] Unknown lvalue 'MemoryDenyWriteExecute' in section 'Service' 8月 27 22:47:08 ceph-node-1 systemd[1]: [/usr/lib/systemd/system/ceph-mgr@.service:21] Unknown lvalue 'ProtectControlGroups' in section 'Service' 8月 27 22:47:08 ceph-node-1 systemd[1]: [/usr/lib/systemd/system/ceph-mgr@.service:23] Unknown lvalue 'ProtectKernelModules' in section 'Service' 8月 27 22:47:08 ceph-node-1 systemd[1]: [/usr/lib/systemd/system/ceph-mgr@.service:24] Unknown lvalue 'ProtectKernelTunables' in section 'Service' 8月 27 22:50:00 ceph-node-1 ceph-mgr[1974609]: 2022-08-27T22:50:00.924+0800 7f6979269700 -1 client.0 error registering admin socket command: (17) File exists 8月 27 22:50:00 ceph-node-1 ceph-mgr[1974609]: 2022-08-27T22:50:00.924+0800 7f6979269700 -1 client.0 error registering admin socket command: (17) File exists 8月 27 22:50:00 ceph-node-1 ceph-mgr[1974609]: 2022-08-27T22:50:00.924+0800 7f6979269700 -1 client.0 error registering admin socket command: (17) File exists 8月 27 22:50:00 ceph-node-1 ceph-mgr[1974609]: 2022-08-27T22:50:00.924+0800 7f6979269700 -1 client.0 error registering admin socket command: (17) File exists 8月 27 22:50:00 ceph-node-1 ceph-mgr[1974609]: 2022-08-27T22:50:00.924+0800 7f6979269700 -1 client.0 error registering admin socket command: (17) File exists 8月 27 22:50:01 ceph-node-1 ceph-mgr[1974609]: 2022-08-27T22:50:01.017+0800 7f6978a68700 -1 mgr.server reply reply (22) Invalid argument No handler found for 'fs subvolume metadata set' 8月 27 22:55:01 ceph-node-1 ceph-mgr[1974609]: 2022-08-27T22:55:01.081+0800 7f6979269700 -1 client.0 error registering admin socket command: (17) File exists 8月 27 22:55:01 ceph-node-1 ceph-mgr[1974609]: 2022-08-27T22:55:01.081+0800 7f6979269700 -1 client.0 error registering admin socket command: (17) File exists 8月 27 22:55:01 ceph-node-1 ceph-mgr[1974609]: 2022-08-27T22:55:01.081+0800 7f6979269700 -1 client.0 error registering admin socket command: (17) File exists 8月 27 22:55:01 ceph-node-1 ceph-mgr[1974609]: 2022-08-27T22:55:01.082+0800 7f6979269700 -1 client.0 error registering admin socket command: (17) File exists 8月 27 22:55:01 ceph-node-1 ceph-mgr[1974609]: 2022-08-27T22:55:01.082+0800 7f6979269700 -1 client.0 error registering admin socket command: (17) File exists 8月 27 22:55:01 ceph-node-1 ceph-mgr[1974609]: 2022-08-27T22:55:01.162+0800 7f6978a68700 -1 mgr.server reply reply (22) Invalid argument No handler found for 'fs subvolume metadata set' 8月 27 22:56:02 ceph-node-1 ceph-mgr[1974609]: 2022-08-27T22:56:02.694+0800 7f6978a68700 -1 mgr.server reply reply (22) Invalid argument No handler found for 'fs subvolume metadata set' 8月 27 23:00:01 ceph-node-1 ceph-mgr[1974609]: 2022-08-27T23:00:01.227+0800 7f6979269700 -1 client.0 error registering admin socket command: (17) File exists 8月 27 23:00:01 ceph-node-1 ceph-mgr[1974609]: 2022-08-27T23:00:01.227+0800 7f6979269700 -1 client.0 error registering admin socket command: (17) File exists 8月 27 23:00:01 ceph-node-1 ceph-mgr[1974609]: 2022-08-27T23:00:01.227+0800 7f6979269700 -1 client.0 error registering admin socket command: (17) File exists 8月 27 23:00:01 ceph-node-1 ceph-mgr[1974609]: 2022-08-27T23:00:01.227+0800 7f6979269700 -1 client.0 error registering admin socket command: (17) File exists 8月 27 23:00:01 ceph-node-1 ceph-mgr[1974609]: 2022-08-27T23:00:01.227+0800 7f6979269700 -1 client.0 error registering admin socket command: (17) File exists 8月 27 23:00:01 ceph-node-1 ceph-mgr[1974609]: 2022-08-27T23:00:01.526+0800 7f6978a68700 -1 mgr.server reply reply (22) Invalid argument No handler found for 'fs subvolume metadata set'

wanghongzhou commented 2 years ago

I have found the problem and need to set - "--extra-create-metadata=false"

Bengrunt commented 2 years ago

Hello, I would suggest reopening this issue since although using the --extra-create-metadata=false parameter is a good workaround it does not solve the issue. FYI, We're using a Ceph 16.2.9 cluster (Pacific) and are using CSI 3.7.1.

From what we understood looking at this Ceph PR and the Ceph source code, the fs subvolume metadata set command is not available before Ceph 17.2.x (which also contradicts the official Ceph documentation, where the command is mentioned for the Pacific release documentation for instance).

Anyway, I feel like the experience of upgrading from CSI 3.6.0 to 3.7.x should not end up with something like the error message below when attempting to provision CephFS volumes:

failed to provision volume with StorageClass "csi-cephfs-sc": rpc error: code = Internal desc = failed to set metadata key "csi.storage.k8s.io/pv/name", value "pvc-88c7287c-f95c-4acf-bfb0-9f99197ef5af" on subvolume &{0xc000e14420 XXX_cephfs_cluster3 true 0xc000136840}: API call not implemented server-side: No handler found for 'fs subvolume metadata set'

It looks like this PR would have help avoid this situation, but it seems like it did not work (at least for our versions) ? And maybe https://github.com/ceph/ceph-csi/issues/3390 is related to this as well, not so sure though.

Lastly, the workaround parameter is not customizable in the CSI chart (here and here). So we have to manually override it post release. I'm not sure that's supposed to be added as a value in the chart, but if so I'd gladly open a PR to support this.

Thanks ! :)

Madhu-1 commented 2 years ago

Hello, I would suggest reopening this issue since although using the --extra-create-metadata=false parameter is a good workaround it does not solve the issue. FYI, We're using a Ceph 16.2.9 cluster (Pacific) and are using CSI 3.7.1.

From what we understood looking at this Ceph PR and the Ceph source code, the fs subvolume metadata set command is not available before Ceph 17.2.x (which also contradicts the official Ceph documentation, where the command is mentioned for the Pacific release documentation for instance).

Anyway, I feel like the experience of upgrading from CSI 3.6.0 to 3.7.x should not end up with something like the error message below when attempting to provision CephFS volumes:

failed to provision volume with StorageClass "csi-cephfs-sc": rpc error: code = Internal desc = failed to set metadata key "csi.storage.k8s.io/pv/name", value "pvc-88c7287c-f95c-4acf-bfb0-9f99197ef5af" on subvolume &{0xc000e14420 XXX_cephfs_cluster3 true 0xc000136840}: API call not implemented server-side: No handler found for 'fs subvolume metadata set'

It looks like this PR would have help avoid this situation, but it seems like it did not work (at least for our versions) ? And maybe #3390 is related to this as well, not so sure though.

You mean you get error with 3.7.1 release also?

Lastly, the workaround parameter is not customizable in the CSI chart (here and here). So we have to manually override it post release. I'm not sure that's supposed to be added as a value in the chart, but if so I'd gladly open a PR to support this.

Thanks ! :)

This is configurable in the helm charts https://github.com/ceph/ceph-csi/blob/devel/charts/ceph-csi-cephfs/templates/provisioner-deployment.yaml#L131

Madhu-1 commented 2 years ago

opening for now, i will try to reproduce it with specified ceph version

Madhu-1 commented 2 years ago

https://github.com/ceph/ceph-csi/pull/3423 should fix the problem.

Bengrunt commented 2 years ago

You mean you get error with 3.7.1 release also?

Yes, that's the release we experienced this issue with.

This is configurable in the helm charts https://github.com/ceph/ceph-csi/blob/devel/charts/ceph-csi-cephfs/templates/provisioner-deployment.yaml#L131

It does not seem to be the parameter we need to set to workaround the issue. We had to set --extra-create-metadata=false in order for PVC binding to succeed.

Madhu-1 commented 2 years ago

You mean you get error with 3.7.1 release also?

Yes, that's the release we experienced this issue with.

Should be fixed in next release 3.7.2 (maybe next week we will release it)

This is configurable in the helm charts https://github.com/ceph/ceph-csi/blob/devel/charts/ceph-csi-cephfs/templates/provisioner-deployment.yaml#L131

It does not seem to be the parameter we need to set to workaround the issue. We had to set --extra-create-metadata=false in order for PVC binding to succeed.

It is the right parameter to set to disable metadata operation at cephcsi.

Bengrunt commented 1 year ago

You mean you get error with 3.7.1 release also?

Yes, that's the release we experienced this issue with.

Should be fixed in next release 3.7.2 (maybe next week we will release it)

This is configurable in the helm charts https://github.com/ceph/ceph-csi/blob/devel/charts/ceph-csi-cephfs/templates/provisioner-deployment.yaml#L131

It does not seem to be the parameter we need to set to workaround the issue. We had to set --extra-create-metadata=false in order for PVC binding to succeed.

It is the right parameter to set to disable metadata operation at cephcsi.

I'm afraid to say that release 3.7.2 and #3423 did not solve the issue for us. 😢
Unless we set provisioner.setmedata: false in the cephfs chart values, we cannot provision cephfs volumes on clusters that do not support setting cephfs volumes metadata (ie. Ceph releases prior to v17.2.x).

Madhu-1 commented 1 year ago

You mean you get error with 3.7.1 release also?

Yes, that's the release we experienced this issue with.

Should be fixed in next release 3.7.2 (maybe next week we will release it)

This is configurable in the helm charts https://github.com/ceph/ceph-csi/blob/devel/charts/ceph-csi-cephfs/templates/provisioner-deployment.yaml#L131

It does not seem to be the parameter we need to set to workaround the issue. We had to set --extra-create-metadata=false in order for PVC binding to succeed.

It is the right parameter to set to disable metadata operation at cephcsi.

I'm afraid to say that release 3.7.2 and #3423 did not solve the issue for us. cry Unless we set provisioner.setmedata: false in the cephfs chart values, we cannot provision cephfs volumes on clusters that do not support setting cephfs volumes metadata (ie. Ceph releases prior to v17.2.x).

@Bengrunt sorry to hear that but i have tested it manually see the results here https://github.com/ceph/ceph-csi/pull/3423#issue-1404509117, can you please check you have the v3.7.2 image? can you try to repull or test it on some test cluster? please provide logs and the ceph version i can retry it again with same version.

Bengrunt commented 1 year ago

I'm so sorry, Ansible helm module is so unreliable ... So yeah, I checked and was using the 3.7.1 images... It works now.

Sorry for the noise. I'll be more careful next time.