gotoweb commented 1 month ago

Describe the bug

In response to a gRPC /csi.v1.Node/NodeStageVolume request from the csi plugin, the volume mount fails with the following error.

failed to mount volume 0001-0024-1d88c854-9fa3-4806-b80f-5bbd29e03756-0000000000000002-5b9d7afa-c0c9-46a0-8ac4-2f60aea1dd9f: an error (exit status 1) occurred while running modprobe args: [ceph] Check dmesg logs if required.
GRPC error: rpc error: code = Internal desc = an error (exit status 1) occurred while running modprobe args: [ceph]

When I checked dmesg, I found the log Invalid ELF header magic: != \x7fELF I hope this isn't a bug, but it seems to be out of my control.

Environment details

Image/version of Ceph CSI driver : docker image csi-node-driver-registrar:v2.9.3, cephcsi:canary
Helm chart version : N/A
Kernel version : 6.2.16-3-pve (proxmox)
Mounter used for mounting PVC (for cephFS its fuse or kernel. for rbd its krbd or rbd-nbd) : kernel
Kubernetes cluster version : v1.29.4+k3s1
Ceph cluster version : 17.2.7

Steps to reproduce

I deployed the csi plugin and cephFS driver using this manual as a reference.
I created and deployed a storageclass that uses the cephfs.csi.ceph.com provisioner.
I created a PVC that uses that storageclass.
The provisioner works fine. All PVCs are bound.

Actual results

Pods attempting to mount the volume received the following error message from the kubelet

Warning  FailedMount  6m9s (x18 over 26m)  kubelet            MountVolume.MountDevice failed for volume "pvc-2c77ab9f-d45b-4dde-a548-b9db686aaf7a" : rpc error: code = Internal desc = an error (exit status 1) occurred while running modprobe args: [ceph]

I found the following message from the cephfs plugin pod.

I0509 14:27:18.135224    2769 utils.go:198] ID: 231 Req-ID: 0001-0024-1d88c854-9fa3-4806-b80f-5bbd29e03756-0000000000000002-5b9d7afa-c0c9-46a0-8ac4-2f60aea1dd9f GRPC call: /csi.v1.Node/NodeStageVolume
I0509 14:27:18.135573    2769 utils.go:199] ID: 231 Req-ID: 0001-0024-1d88c854-9fa3-4806-b80f-5bbd29e03756-0000000000000002-5b9d7afa-c0c9-46a0-8ac4-2f60aea1dd9f GRPC request: {"secrets":"***stripped***","staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/cephfs.csi.ceph.com/7a95dbced311aedf38c5f71b8028c278f7a6a48f70c6c2b3814b0465126aad76/globalmount","volume_capability":{"AccessType":{"Mount":{}},"access_mode":{"mode":7}},"volume_context":{"clusterID":"1d88c854-9fa3-4806-b80f-5bbd29e03756","fsName":"kubernetes","storage.kubernetes.io/csiProvisionerIdentity":"1715255922910-1802-cephfs.csi.ceph.com","subvolumeName":"csi-vol-5b9d7afa-c0c9-46a0-8ac4-2f60aea1dd9f","subvolumePath":"/volumes/k8svolgroup/csi-vol-5b9d7afa-c0c9-46a0-8ac4-2f60aea1dd9f/b5aa4c29-2c53-44e1-a630-7638b5ab8a6b"},"volume_id":"0001-0024-1d88c854-9fa3-4806-b80f-5bbd29e03756-0000000000000002-5b9d7afa-c0c9-46a0-8ac4-2f60aea1dd9f"}
I0509 14:27:18.140009    2769 omap.go:89] ID: 231 Req-ID: 0001-0024-1d88c854-9fa3-4806-b80f-5bbd29e03756-0000000000000002-5b9d7afa-c0c9-46a0-8ac4-2f60aea1dd9f got omap values: (pool="cephfs.kubernetes.meta", namespace="csi", name="csi.volume.5b9d7afa-c0c9-46a0-8ac4-2f60aea1dd9f"): map[csi.imagename:csi-vol-5b9d7afa-c0c9-46a0-8ac4-2f60aea1dd9f csi.volname:pvc-f8743f50-09f3-4837-a52a-beec09bd58a2 csi.volume.owner:clickhouse]
I0509 14:27:18.467484    2769 volumemounter.go:126] requested mounter: , chosen mounter: kernel
I0509 14:27:18.468296    2769 nodeserver.go:313] ID: 231 Req-ID: 0001-0024-1d88c854-9fa3-4806-b80f-5bbd29e03756-0000000000000002-5b9d7afa-c0c9-46a0-8ac4-2f60aea1dd9f cephfs: mounting volume 0001-0024-1d88c854-9fa3-4806-b80f-5bbd29e03756-0000000000000002-5b9d7afa-c0c9-46a0-8ac4-2f60aea1dd9f with Ceph kernel client
I0509 14:27:18.471796    2769 cephcmds.go:98] ID: 231 Req-ID: 0001-0024-1d88c854-9fa3-4806-b80f-5bbd29e03756-0000000000000002-5b9d7afa-c0c9-46a0-8ac4-2f60aea1dd9f an error (exit status 1) occurred while running modprobe args: [ceph]
E0509 14:27:18.471865    2769 nodeserver.go:323] ID: 231 Req-ID: 0001-0024-1d88c854-9fa3-4806-b80f-5bbd29e03756-0000000000000002-5b9d7afa-c0c9-46a0-8ac4-2f60aea1dd9f failed to mount volume 0001-0024-1d88c854-9fa3-4806-b80f-5bbd29e03756-0000000000000002-5b9d7afa-c0c9-46a0-8ac4-2f60aea1dd9f: an error (exit status 1) occurred while running modprobe args: [ceph] Check dmesg logs if required.
E0509 14:27:18.472265    2769 utils.go:203] ID: 231 Req-ID: 0001-0024-1d88c854-9fa3-4806-b80f-5bbd29e03756-0000000000000002-5b9d7afa-c0c9-46a0-8ac4-2f60aea1dd9f GRPC error: rpc error: code = Internal desc = an error (exit status 1) occurred while running modprobe args: [ceph]

The dmesg log looks like this:

Invalid ELF header magic: != \x7fELF

Expected behavior

Interestingly, it mounts successfully on other K8S clusters using the same Ceph cluster. I was able to check the logs from that cephfs plugin.

The storageclass and configmap (config.json) of the k8s cluster where the error occurs, and the k8s cluster that is working correctly, match completely.

I0509 14:34:08.422119    3880 utils.go:164] ID: 234363 Req-ID: 0001-0024-1d88c854-9fa3-4806-b80f-5bbd29e03756-0000000000000002-578e1493-2c49-4534-bef0-efebdd508943 GRPC call: /csi.v1.Node/NodeStageVolume
I0509 14:34:08.422185    3880 utils.go:165] ID: 234363 Req-ID: 0001-0024-1d88c854-9fa3-4806-b80f-5bbd29e03756-0000000000000002-578e1493-2c49-4534-bef0-efebdd508943 GRPC request: {"secrets":"***stripped***","staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/cephfs.csi.ceph.com/275cb089e63eed1215b93768fe531957cc5ee0434b1473a781144f8ceaa8671c/globalmount","volume_capability":{"AccessType":{"Mount":{}},"access_mode":{"mode":7}},"volume_context":{"clusterID":"1d88c854-9fa3-4806-b80f-5bbd29e03756","fsName":"kubernetes","storage.kubernetes.io/csiProvisionerIdentity":"1715156287441-4003-cephfs.csi.ceph.com","subvolumeName":"csi-vol-578e1493-2c49-4534-bef0-efebdd508943","subvolumePath":"/volumes/k8svolgroup/csi-vol-578e1493-2c49-4534-bef0-efebdd508943/290fab63-facd-41fa-8eb5-c05173c0cae4"},"volume_id":"0001-0024-1d88c854-9fa3-4806-b80f-5bbd29e03756-0000000000000002-578e1493-2c49-4534-bef0-efebdd508943"}
I0509 14:34:08.435295    3880 omap.go:88] ID: 234363 Req-ID: 0001-0024-1d88c854-9fa3-4806-b80f-5bbd29e03756-0000000000000002-578e1493-2c49-4534-bef0-efebdd508943 got omap values: (pool="cephfs.kubernetes.meta", namespace="csi", name="csi.volume.578e1493-2c49-4534-bef0-efebdd508943"): map[csi.imagename:csi-vol-578e1493-2c49-4534-bef0-efebdd508943 csi.volname:pvc-d9aebf6f-bebb-463e-8919-4efc15f5ac6d csi.volume.owner:kafka]
I0509 14:34:08.438241    3880 volumemounter.go:126] requested mounter: , chosen mounter: kernel
I0509 14:34:08.438335    3880 nodeserver.go:312] ID: 234363 Req-ID: 0001-0024-1d88c854-9fa3-4806-b80f-5bbd29e03756-0000000000000002-578e1493-2c49-4534-bef0-efebdd508943 cephfs: mounting volume 0001-0024-1d88c854-9fa3-4806-b80f-5bbd29e03756-0000000000000002-578e1493-2c49-4534-bef0-efebdd508943 with Ceph kernel client
I0509 14:34:28.594111    3880 cephcmds.go:105] ID: 234363 Req-ID: 0001-0024-1d88c854-9fa3-4806-b80f-5bbd29e03756-0000000000000002-578e1493-2c49-4534-bef0-efebdd508943 command succeeded: mount [-t ceph 192.168.123.63:6789,192.168.123.3:6789,192.168.123.101:6789:/volumes/k8svolgroup/csi-vol-578e1493-2c49-4534-bef0-efebdd508943/290fab63-facd-41fa-8eb5-c05173c0cae4 /var/lib/kubelet/plugins/kubernetes.io/csi/cephfs.csi.ceph.com/275cb089e63eed1215b93768fe531957cc5ee0434b1473a781144f8ceaa8671c/globalmount -o name=admin,secretfile=/tmp/csi/keys/keyfile-1337678247,mds_namespace=kubernetes,_netdev]
I0509 14:34:28.594173    3880 nodeserver.go:252] ID: 234363 Req-ID: 0001-0024-1d88c854-9fa3-4806-b80f-5bbd29e03756-0000000000000002-578e1493-2c49-4534-bef0-efebdd508943 cephfs: successfully mounted volume 0001-0024-1d88c854-9fa3-4806-b80f-5bbd29e03756-0000000000000002-578e1493-2c49-4534-bef0-efebdd508943 to /var/lib/kubelet/plugins/kubernetes.io/csi/cephfs.csi.ceph.com/275cb089e63eed1215b93768fe531957cc5ee0434b1473a781144f8ceaa8671c/globalmount
I0509 14:34:28.594227    3880 utils.go:171] ID: 234363 Req-ID: 0001-0024-1d88c854-9fa3-4806-b80f-5bbd29e03756-0000000000000002-578e1493-2c49-4534-bef0-efebdd508943 GRPC response: {}

Madhu-1 commented 1 month ago

@gotoweb this might be cause of https://github.com/ceph/ceph-csi/issues/4138?

Madhu-1 commented 1 month ago

Are you able to load the ceph module manually from the node?

gotoweb commented 1 month ago

@Madhu-1 I don't think so. The volume mounts fine on another k8s cluster run as same version of kubelet. I haven't tried to load the ceph module manually, I just use built-in ceph module on proxmox. I'll try to change version of driver/csi images...

nixpanic commented 1 month ago

@gotoweb, the Ceph-CSI driver loads the kernel module that is provided by a host-path volume. If the module is already loaded (or built-in) , it should not try to load it again.

Commit ab87045afb0c15ca4d30ae01003ed0f331843181 checks for the support of the cephfs filesystem, it is included in Ceph-CSI 3.11 and was backported to 3.10 with #4381.

github-actions[bot] commented 3 days ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

ceph / ceph-csi

When calling NodeStageVolume, a modprobe error occurs #4610

Describe the bug

Environment details

Steps to reproduce

Actual results

Expected behavior