kmadac commented 4 months ago

Describe the bug

I have an issue with cluster-mapping and with using mirrored RBD volumes by ceph-csi in case of disaster?

In the test environment I'm trying to use mirrored rbd volumes on k8s with ceph-csi. I have a cluster-mapping.json in place where primary pool and ceph id is mapped to secondary ceph. I have also config.json with list of mons for both cephs. The issue is that during failover to secondary site, when I manually create PV/PVC the same way as was on primary side, cluster-mapping is not applied during NodeStageVolume (at least what I can see in the code) and ceph-csi still tries to access inaccessible primary cluster, which is unsucessfull and indefinitely stucks application pods in ContainerCreting phase. When I manualy create PV on secondary site with correct volumeHandle, then it works. Why the cluster-mapping.json is needed then if volumeHandle still needs to be manually changed in case of failover? Shouldn't it be applied also during call of NodeStageVolume?

Environment details

Image/version of Ceph CSI driver : canary
Helm chart version : manifest deployment
Kernel version :
Mounter used for mounting PVC (for cephFS its fuse or kernel. for rbd its krbd or rbd-nbd) : rbd-nbd
Kubernetes cluster version : v1.28.7+k3s1
Ceph cluster version : 17.2.6

Steps to reproduce

Setup details: deploy two ceph cluster. Primary and secondary
Deploy k8s cluster whicha has connectivity to both cephs
Create rbd 'kubernetes' pool on both cephs
Setup rbd-mirror between both cluster for volume 'kubernetes
Deploy ceph-csi on k8s cluster and integrate with primary ceph.
Deploy app 'helm install dokuwiki oci://registry-1.docker.io/bitnamicharts/dokuwiki -n dokuwiki --set global.storageClass=csi-rbd-sc,service.type=NodePort'
Wait till rbd image is synced
Stop k8s cluster
Demote kubernetes pool on primary cluster, promote kubernetes pool on secondary ceph cluster
Start k8s cluster
Change ceph id in storageclass
Delete ceph-csi pods to initiate restart of csi
Delete application pod

csi-config-map

---
apiVersion: v1
kind: ConfigMap
data:
  config.json: |-
    [
      {
        "clusterID": "2459c716-dd81-11ee-a184-525400150bec",
        "monitors": [
          "192.168.121.11:6789",
          "192.168.121.122:6789",
          "192.168.121.97:6789"
        ]
      },
      {
        "clusterID": "9fa7df9e-dd71-11ee-93b5-52540070c99e",
        "monitors": [
          "192.168.121.98:6789",
          "192.168.121.8:6789",
          "192.168.121.136:6789"
        ]
      }
    ]
  cluster-mapping.json: |-
    [          
      {       
        "clusterIDMapping": {                          
          "2459c716-dd81-11ee-a184-525400150bec": "9fa7df9e-dd71-11ee-93b5-52540070c99e"                                                                                                                                                                                                                                                       
        },                                                                                                                                                                                                                                                                                                                                     
        "RBDPoolIDMapping": [{             
          "3": "5"                            
        }]              
      }                                                   
    ]
metadata:
  name: ceph-csi-config

Actual results

PV and PVC are in Bound state, but application Pod is stucked in ContainerCreation and csi pods show following errors:

Warning  FailedMount  4m29s (x39 over 3h55m)  kubelet  MountVolume.MountDevice failed for volume "pvc-9978d8bc-9053-4cde-bf11-baba5f2df774" : rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0024-2459c716-dd81-11ee-a184-525400150bec-0000000000000003-f1d89947-0fff-447b-a190-6fe68539253a already exists

And I can see also error log in csi-rbdplugin pod where I can see that it tries to connect to primary ceph which is down:

error generating volume 0001-0024-2459c716-dd81-11ee-a184-525400150bec-0000000000000003-f1d89947-0fff-447b-a190-6fe68539253a: failed to establish the connection: failed to get connection: connecting failed: rados: ret=-110, Connection timed out

Expected behavior

Remount will be done successfully from secondary cluster and application will start.

Madhu-1 commented 4 months ago

error generating volume 0001-0024-2459c716-dd81-11ee-a184-525400150bec-0000000000000003-f1d89947-0fff-447b-a190-6fe68539253a: failed to establish the connection: failed to get connection: connecting failed: rados: ret=-110, Connection timed out

@kmadac is 2459c716-dd81-11ee-a184-525400150bec in the configmap is pointing to the new monitor details in the cluster you are failing over to? if no, you need to do that as well.

Madhu-1 commented 4 months ago

If you are looking for mapping you handle it you can remove the 2459c716-dd81-11ee-a184-525400150bec from the config.json and see if that works as well.

kmadac commented 4 months ago

I can confirm that putting secondary mon ip addresses to primary ceph id worked.

Here is final csi config map:

---
apiVersion: v1
kind: ConfigMap
data:
  config.json: |-
    [
      {
        "clusterID": "2459c716-dd81-11ee-a184-525400150bec",
        "monitors": [
          "192.168.121.98:6789",
          "192.168.121.8:6789",
          "192.168.121.136:6789"
        ]
      },
      {
        "clusterID": "9fa7df9e-dd71-11ee-93b5-52540070c99e",
        "monitors": [
          "192.168.121.98:6789",
          "192.168.121.8:6789",
          "192.168.121.136:6789"
        ]
      }
    ]
  cluster-mapping.json: |-
    [          
      {       
        "clusterIDMapping": {                          
          "2459c716-dd81-11ee-a184-525400150bec": "9fa7df9e-dd71-11ee-93b5-52540070c99e"                                                                                                                                                                                                                                                       
        },                                                                                                                                                                                                                                                                                                                                     
        "RBDPoolIDMapping": [{             
          "3": "5"                            
        }]              
      }                                                   
    ]
metadata:
  name: ceph-csi-config

Thank you very much, I'm closing the issue. Maybe just a question. Is it documented somewhere. I read the documentation, but maybe I missed it.

Madhu-1 commented 4 months ago

We might have missed adding it to the document, please feel free to open PR to add missing details. Thank you :)

ceph / ceph-csi

Cluster-mapping during failover not applied #4493

Describe the bug

Environment details

Steps to reproduce

Actual results

Expected behavior