Closed kmadac closed 4 months ago
error generating volume 0001-0024-2459c716-dd81-11ee-a184-525400150bec-0000000000000003-f1d89947-0fff-447b-a190-6fe68539253a: failed to establish the connection: failed to get connection: connecting failed: rados: ret=-110, Connection timed out
@kmadac is 2459c716-dd81-11ee-a184-525400150bec
in the configmap is pointing to the new monitor details in the cluster you are failing over to? if no, you need to do that as well.
If you are looking for mapping you handle it you can remove the 2459c716-dd81-11ee-a184-525400150bec
from the config.json and see if that works as well.
I can confirm that putting secondary mon ip addresses to primary ceph id worked.
Here is final csi config map:
---
apiVersion: v1
kind: ConfigMap
data:
config.json: |-
[
{
"clusterID": "2459c716-dd81-11ee-a184-525400150bec",
"monitors": [
"192.168.121.98:6789",
"192.168.121.8:6789",
"192.168.121.136:6789"
]
},
{
"clusterID": "9fa7df9e-dd71-11ee-93b5-52540070c99e",
"monitors": [
"192.168.121.98:6789",
"192.168.121.8:6789",
"192.168.121.136:6789"
]
}
]
cluster-mapping.json: |-
[
{
"clusterIDMapping": {
"2459c716-dd81-11ee-a184-525400150bec": "9fa7df9e-dd71-11ee-93b5-52540070c99e"
},
"RBDPoolIDMapping": [{
"3": "5"
}]
}
]
metadata:
name: ceph-csi-config
Thank you very much, I'm closing the issue. Maybe just a question. Is it documented somewhere. I read the documentation, but maybe I missed it.
We might have missed adding it to the document, please feel free to open PR to add missing details. Thank you :)
Describe the bug
I have an issue with cluster-mapping and with using mirrored RBD volumes by ceph-csi in case of disaster?
In the test environment I'm trying to use mirrored rbd volumes on k8s with ceph-csi. I have a cluster-mapping.json in place where primary pool and ceph id is mapped to secondary ceph. I have also config.json with list of mons for both cephs. The issue is that during failover to secondary site, when I manually create PV/PVC the same way as was on primary side, cluster-mapping is not applied during NodeStageVolume (at least what I can see in the code) and ceph-csi still tries to access inaccessible primary cluster, which is unsucessfull and indefinitely stucks application pods in ContainerCreting phase. When I manualy create PV on secondary site with correct volumeHandle, then it works. Why the cluster-mapping.json is needed then if volumeHandle still needs to be manually changed in case of failover? Shouldn't it be applied also during call of NodeStageVolume?
Environment details
fuse
orkernel
. for rbd itskrbd
orrbd-nbd
) : rbd-nbdSteps to reproduce
csi-config-map
Actual results
PV and PVC are in Bound state, but application Pod is stucked in ContainerCreation and csi pods show following errors:
And I can see also error log in csi-rbdplugin pod where I can see that it tries to connect to primary ceph which is down:
Expected behavior
Remount will be done successfully from secondary cluster and application will start.