ikons commented 3 years ago

Describe the bug

csi-rbdplugin-provisioner cannot create persistent volume on a local CEPH cluster. I have followed the setup listed here: https://docs.ceph.com/en/latest/rbd/rbd-kubernetes/ and everything has been configured without any issue. When I create a persistent volume claim, either for a block device or for a file system, then the output of the kubectl describe pvc command shows the following error: Warning ProvisioningFailed 56m (x3 over 159m) rbd.csi.ceph.com_csi-rbdplugin-provisioner-69f64bccbc-7tg6x_b7eca0ae-37b6-4697-a727-8f3b9d8b9af3 failed to provision volume with StorageClass "csi-rbd-sc": rpc error: code = DeadlineExceeded desc = context deadline exceeded Normal Provisioning 6m14s (x49 over 162m) rbd.csi.ceph.com_csi-rbdplugin-provisioner-69f64bccbc-7tg6x_b7eca0ae-37b6-4697-a727-8f3b9d8b9af3 External provisioner is provisioning volume for claim "default/rbd-pvc" Normal ExternalProvisioning 2m30s (x642 over 162m) persistentvolume-controller waiting for a volume to be created, either by external provisioner "rbd.csi.ceph.com" or manually created by system administrator

and the provider does not attempt to connect at all with the ceph cluster. Is there anything wrong with the rbdplugin or provisioner pods?

Environment details

Image/version of Ceph CSI driver :
Helm chart version: not installed
Kernel version :
Mounter used for mounting PVC (for cephfs its fuse or kernel. for rbd its krbd or rbd-nbd) :
Kubernetes cluster version : Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.7", GitCommit:"132a687512d7fb058d0f5890f07d4121b3f0a2e2", GitTreeState:"clean", BuildDate:"2021-05-12T12:40:09Z", GoVersion:"go1.15.12", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.7", GitCommit:"132a687512d7fb058d0f5890f07d4121b3f0a2e2", GitTreeState:"clean", BuildDate:"2021-05-12T12:32:49Z", GoVersion:"go1.15.12", Compiler:"gc", Platform:"linux/amd64"}
Ceph cluster version : 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc)

Steps to reproduce

Steps to reproduce the behavior: following the howto from here: https://docs.ceph.com/en/latest/rbd/rbd-kubernetes/

I installed the rbd-plugin, created the ceph pool, got the security credentials and created the storage class. I created two persistent volume claims and launched two containers that use those claims.

Actual results

Both containers are in the pending state, and both claims are in the pending state.

Expected behavior

Expected to create two persistent storages and attach them to the two containers.

Logs

If the issue is in PVC creation, deletion, cloning please attach complete logs of below containers.

csi-provisioner and csi-rbdplugin/csi-cephfsplugin container logs from the provisioner pod. rbdplugin-provisioner.log

If the issue is in PVC resize please attach complete logs of below containers.

csi-resizer and csi-rbdplugin/csi-cephfsplugin container logs from the provisioner pod.

If the issue is in snapshot creation and deletion please attach complete logs of below containers.

csi-snapshotter and csi-rbdplugin/csi-cephfsplugin container logs from the provisioner pod.

If the issue is in PVC mounting please attach complete logs of below containers.

csi-rbdplugin/csi-cephfsplugin and driver-registrar container logs from plugin pod from the node where the mount is failing.
if required attach dmesg logs.

Note:- If its a rbd issue please provide only rbd related logs, if its a cephfs issue please provide cephfs logs.

Additional context

Add any other context about the problem here.

For example:

Any existing bug report which describe about the similar issue/behavior

Rakshith-R commented 3 years ago

hey @ikons, Can you please add csi-rbdplugin-provisioner pod / csi-rbdplugin container logs, storageclass.yaml, csi-config-map.yaml and secrets.yaml which you have created. Refer to steps provided at https://github.com/ceph/ceph-csi/blob/devel/docs/deploy-rbd.md#deployment-with-kubernetes and see if you've missed anything. Go through https://github.com/ceph/ceph-csi/issues/2104, if any of these steps help isolate the problem.

ikons commented 3 years ago

Hi, thanks for the prompt reply. I attach the requested files. It seems it does not attempt to contact at all the CEPH cluster, although everything is configured correctly and mon IPs and ports can be contacted. I see more users face similar issues.

requested_files.zip

Rakshith-R commented 3 years ago

Hi, thanks for the prompt reply. I attach the requested files. It seems it does not attempt to contact at all the CEPH cluster, although everything is configured correctly and mon IPs and ports can be contacted. I see more users face similar issues.

requested_files.zip

@ikons , I don't see anything wrong in the logs or other files.

Can you try to exec into the csi-rbdplugin container & manually try executing ceph cmds similar to https://github.com/ceph/ceph-csi/issues/2104#issuecomment-851968816. And also check on Ceph cluster health.

ikons commented 3 years ago

Hi @Rakshith-R thanks for the reply. Indeed I run the ceph osd lspools command on the plugin container, and it hang, it could not communicate with the CEPH mon. The mon has a private and a public IP, and it could be contacted only from the public IP (although the ping to the private ip worked). When I changed the config to contact the public IP everything worked.

ceph / ceph-csi

CEPH rbd plugin does not communicate with CEPH cluster #2301

Describe the bug

Environment details

Steps to reproduce

Actual results

Expected behavior

Logs

Additional context