ceph / ceph-csi

CSI driver for Ceph
Apache License 2.0
1.25k stars 537 forks source link

Nomad ceph-csi plugin error #3880

Closed clumbo closed 1 year ago

clumbo commented 1 year ago

Hi

On the host system nomad client I can successfully load the controller and nodes plugin to create volumes.

Steps to reproduce the behavior:

Setup a Proxmox LCX container in priveliged mode Install CEPH Pacific

Run the controller/Run the nodes success Create a cephfs volume via commandline

nomad create volume....

try attach the volume to a task/container and see the following error

I0602 11:59:02.652239       1 utils.go:195] ID: 62 Req-ID: 0001-0024-20ea907e-f6a8-49fe-b528-95cae7eaf802-0000000000000003-bff50437-b90c-433d-9b65-b5cb07b4106e GRPC call: /csi.v1.Node/NodeStageVolume
I0602 11:59:02.652333       1 utils.go:206] ID: 62 Req-ID: 0001-0024-20ea907e-f6a8-49fe-b528-95cae7eaf802-0000000000000003-bff50437-b90c-433d-9b65-b5cb07b4106e GRPC request: {"secrets":"***stripped***","staging_target_path":"/local/csi/staging/traefik_data/rw-file-system-multi-node-multi-writer","volume_capability":{"AccessType":{"Mount":{}},"access_mode":{"mode":5}},"volume_context":{"clusterID":"20ea907e-f6a8-49fe-b528-95cae7eaf802","fsName":"nomad","imageFeatures":"layering","pool":"nomad-data","subvolumeName":"csi-vol-bff50437-b90c-433d-9b65-b5cb07b4106e","subvolumePath":"/volumes/csi/csi-vol-bff50437-b90c-433d-9b65-b5cb07b4106e/98d2bc6a-2499-4875-abac-5d02980060d1"},"volume_id":"0001-0024-20ea907e-f6a8-49fe-b528-95cae7eaf802-0000000000000003-bff50437-b90c-433d-9b65-b5cb07b4106e"}
I0602 11:59:02.655455       1 omap.go:88] ID: 62 Req-ID: 0001-0024-20ea907e-f6a8-49fe-b528-95cae7eaf802-0000000000000003-bff50437-b90c-433d-9b65-b5cb07b4106e got omap values: (pool="nomad-metadata", namespace="csi", name="csi.volume.bff50437-b90c-433d-9b65-b5cb07b4106e"): map[csi.imagename:csi-vol-bff50437-b90c-433d-9b65-b5cb07b4106e csi.volname:traefik_data]
I0602 11:59:02.728700       1 volumemounter.go:126] requested mounter: , chosen mounter: kernel
I0602 11:59:02.728745       1 nodeserver.go:293] ID: 62 Req-ID: 0001-0024-20ea907e-f6a8-49fe-b528-95cae7eaf802-0000000000000003-bff50437-b90c-433d-9b65-b5cb07b4106e cephfs: mounting volume 0001-0024-20ea907e-f6a8-49fe-b528-95cae7eaf802-0000000000000003-bff50437-b90c-433d-9b65-b5cb07b4106e with Ceph kernel client
I0602 11:59:02.729912       1 cephcmds.go:98] ID: 62 Req-ID: 0001-0024-20ea907e-f6a8-49fe-b528-95cae7eaf802-0000000000000003-bff50437-b90c-433d-9b65-b5cb07b4106e an error (exit status 1) occurred while running modprobe args: [ceph]
E0602 11:59:02.729925       1 nodeserver.go:319] ID: 62 Req-ID: 0001-0024-20ea907e-f6a8-49fe-b528-95cae7eaf802-0000000000000003-bff50437-b90c-433d-9b65-b5cb07b4106e failed to mount volume 0001-0024-20ea907e-f6a8-49fe-b528-95cae7eaf802-0000000000000003-bff50437-b90c-433d-9b65-b5cb07b4106e: an error (exit status 1) occurred while running modprobe args: [ceph] Check dmesg logs if required.
E0602 11:59:02.729946       1 utils.go:210] ID: 62 Req-ID: 0001-0024-20ea907e-f6a8-49fe-b528-95cae7eaf802-0000000000000003-bff50437-b90c-433d-9b65-b5cb07b4106e GRPC error: rpc error: code = Internal desc = an error (exit status 1) occurred while running modprobe args: [ceph]
I0602 11:59:03.141220       1 utils.go:195] ID: 63 Req-ID: 0001-0024-20ea907e-f6a8-49fe-b528-95cae7eaf802-0000000000000003-bff50437-b90c-433d-9b65-b5cb07b4106e GRPC call: /csi.v1.Node/NodeUnpublishVolume
I0602 11:59:03.141264       1 utils.go:206] ID: 63 Req-ID: 0001-0024-20ea907e-f6a8-49fe-b528-95cae7eaf802-0000000000000003-bff50437-b90c-433d-9b65-b5cb07b4106e GRPC request: {"target_path":"/local/csi/per-alloc/7d54d131-2ac4-d031-47fc-14d1ad4e2326/traefik_data/rw-file-system-multi-node-multi-writer","volume_id":"0001-0024-20ea907e-f6a8-49fe-b528-95cae7eaf802-0000000000000003-bff50437-b90c-433d-9b65-b5cb07b4106e"}
E0602 11:59:03.141284       1 nodeserver.go:543] ID: 63 Req-ID: 0001-0024-20ea907e-f6a8-49fe-b528-95cae7eaf802-0000000000000003-bff50437-b90c-433d-9b65-b5cb07b4106e stat failed: stat /local/csi/per-alloc/7d54d131-2ac4-d031-47fc-14d1ad4e2326/traefik_data/rw-file-system-multi-node-multi-writer: no such file or directory
I0602 11:59:03.141289       1 nodeserver.go:547] ID: 63 Req-ID: 0001-0024-20ea907e-f6a8-49fe-b528-95cae7eaf802-0000000000000003-bff50437-b90c-433d-9b65-b5cb07b4106e targetPath: /local/csi/per-alloc/7d54d131-2ac4-d031-47fc-14d1ad4e2326/traefik_data/rw-file-system-multi-node-multi-writer has already been deleted
I0602 11:59:03.141299       1 utils.go:212] ID: 63 Req-ID: 0001-0024-20ea907e-f6a8-49fe-b528-95cae7eaf802-0000000000000003-bff50437-b90c-433d-9b65-b5cb07b4106e GRPC response: {}
I0602 11:59:03.141800       1 utils.go:195] ID: 64 Req-ID: 0001-0024-20ea907e-f6a8-49fe-b528-95cae7eaf802-0000000000000003-bff50437-b90c-433d-9b65-b5cb07b4106e GRPC call: /csi.v1.Node/NodeUnstageVolume
I0602 11:59:03.141832       1 utils.go:206] ID: 64 Req-ID: 0001-0024-20ea907e-f6a8-49fe-b528-95cae7eaf802-0000000000000003-bff50437-b90c-433d-9b65-b5cb07b4106e GRPC request: {"staging_target_path":"/local/csi/staging/traefik_data/rw-file-system-multi-node-multi-writer","volume_id":"0001-0024-20ea907e-f6a8-49fe-b528-95cae7eaf802-0000000000000003-bff50437-b90c-433d-9b65-b5cb07b4106e"}

A very vague internal error occurs

0000000000000003-bff50437-b90c-433d-9b65-b5cb07b4106e an error (exit status 1) occurred while running modprobe args: [ceph]

Are there any debugging strategies to get more information?

Thanks

nixpanic commented 1 year ago
an error (exit status 1) occurred while running modprobe args: [ceph]

This suggests that loading the ceph kernel module failed. You will need to check if /lib/modules/... is available in the running container, and if the container has sufficient permissions to load it.

clumbo commented 1 year ago

The problem seems to be I run in an lxc container and it seems to always modprobe the module without checking if its already loaded

its already loaded on the host machine

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

github-actions[bot] commented 1 year ago

This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation.