ceph / ceph-csi

CSI driver for Ceph
Apache License 2.0
1.27k stars 539 forks source link

CEPH rbd plugin does not communicate with CEPH cluster #2301

Closed ikons closed 3 years ago

ikons commented 3 years ago

Describe the bug

csi-rbdplugin-provisioner cannot create persistent volume on a local CEPH cluster. I have followed the setup listed here: https://docs.ceph.com/en/latest/rbd/rbd-kubernetes/ and everything has been configured without any issue. When I create a persistent volume claim, either for a block device or for a file system, then the output of the kubectl describe pvc command shows the following error: Warning ProvisioningFailed 56m (x3 over 159m) rbd.csi.ceph.com_csi-rbdplugin-provisioner-69f64bccbc-7tg6x_b7eca0ae-37b6-4697-a727-8f3b9d8b9af3 failed to provision volume with StorageClass "csi-rbd-sc": rpc error: code = DeadlineExceeded desc = context deadline exceeded Normal Provisioning 6m14s (x49 over 162m) rbd.csi.ceph.com_csi-rbdplugin-provisioner-69f64bccbc-7tg6x_b7eca0ae-37b6-4697-a727-8f3b9d8b9af3 External provisioner is provisioning volume for claim "default/rbd-pvc" Normal ExternalProvisioning 2m30s (x642 over 162m) persistentvolume-controller waiting for a volume to be created, either by external provisioner "rbd.csi.ceph.com" or manually created by system administrator

and the provider does not attempt to connect at all with the ceph cluster. Is there anything wrong with the rbdplugin or provisioner pods?

Environment details

Steps to reproduce

Steps to reproduce the behavior: following the howto from here: https://docs.ceph.com/en/latest/rbd/rbd-kubernetes/

I installed the rbd-plugin, created the ceph pool, got the security credentials and created the storage class. I created two persistent volume claims and launched two containers that use those claims.

Actual results

Both containers are in the pending state, and both claims are in the pending state.

Expected behavior

Expected to create two persistent storages and attach them to the two containers.

Logs

If the issue is in PVC creation, deletion, cloning please attach complete logs of below containers.

If the issue is in PVC resize please attach complete logs of below containers.

If the issue is in snapshot creation and deletion please attach complete logs of below containers.

If the issue is in PVC mounting please attach complete logs of below containers.

Note:- If its a rbd issue please provide only rbd related logs, if its a cephfs issue please provide cephfs logs.

Additional context

Add any other context about the problem here.

For example:

Any existing bug report which describe about the similar issue/behavior

Rakshith-R commented 3 years ago

hey @ikons, Can you please add csi-rbdplugin-provisioner pod / csi-rbdplugin container logs, storageclass.yaml, csi-config-map.yaml and secrets.yaml which you have created. Refer to steps provided at https://github.com/ceph/ceph-csi/blob/devel/docs/deploy-rbd.md#deployment-with-kubernetes and see if you've missed anything. Go through https://github.com/ceph/ceph-csi/issues/2104, if any of these steps help isolate the problem.

ikons commented 3 years ago

Hi, thanks for the prompt reply. I attach the requested files. It seems it does not attempt to contact at all the CEPH cluster, although everything is configured correctly and mon IPs and ports can be contacted. I see more users face similar issues.

requested_files.zip

Rakshith-R commented 3 years ago

Hi, thanks for the prompt reply. I attach the requested files. It seems it does not attempt to contact at all the CEPH cluster, although everything is configured correctly and mon IPs and ports can be contacted. I see more users face similar issues.

requested_files.zip

@ikons , I don't see anything wrong in the logs or other files.

Can you try to exec into the csi-rbdplugin container & manually try executing ceph cmds similar to https://github.com/ceph/ceph-csi/issues/2104#issuecomment-851968816. And also check on Ceph cluster health.

ikons commented 3 years ago

Hi @Rakshith-R thanks for the reply. Indeed I run the ceph osd lspools command on the plugin container, and it hang, it could not communicate with the CEPH mon. The mon has a private and a public IP, and it could be contacted only from the public IP (although the ping to the private ip worked). When I changed the config to contact the public IP everything worked.