Closed FSangouard closed 3 months ago
It is not required to connect to the secondary nodes in the same order. It can connect to any node in secondary volume and sync. The Primary Volume type and Secondary volume type need not be same. You can create the primary volume as Replica 3 and secondary volume as Arbiter or distributed Replicate with more bricks of small size.
Change detection happens in the Primary nodes and sync always happens through the Gluster mount so connecting node doesn't matter.
Georep monitor process checks if any connection is failed then it tries to connect to other available secondary node to continue syncing.
Just to be sure I understand correctly, if for example node1A is connected to node1B, and then node1B fails, node1A will try to connect to node2B or node3B ?
I thought each node remained connected to a single node, but only one worker was active at a time, and if the active worker couldn't sync, another worker would become active in its place.
Yes. Only one worker among the Replica bricks will be Active and other two will be Passive since all the bricks will have the same data in Primary Bricks. If the Passive worker goes down then the one among the passive will become Active.
If a worker is Active and failing to sync, then check the respective worker's log file to see if any errors.
OK, thank you for the clarification!
Description of problem:
When using georeplication between two clusters, one would expect that the mapping between nodes of the primary and the secondary could be deduced from the list of bricks for the replicated volume on each cluster. For exemple, I have a 3-node cluster A and a 3-node cluster B, and a replicated volume with one brick per node in each cluster. If the list of bricks for the volume in cluster A goes like this:
node1A node2A node3A
and the list in cluster B goes like this:
node1B node2B node3B
I would expect the georeplication session to open connections between the nodes like this:
node1A > node1B node2A > node2B node3A > node3B
However, that is not guaranteed because in monitor.py a set is created from the list of secondary bricks, which may change the order of the items. According to my tests, the order is not random because it changed only when I recreated the secondary cluster, it remained the same across restarts of the georeplication session, so I think it is based on some hash of the values, which is hard to predict, and above all, not controllable by the user since the values contain uuids generated during volume creation.
The exact command to reproduce the issue: No single command in particular, just create a georeplication session as per documentation and you should observe this. In case the mapping match, try recreating the volume to reroll new hashes until you see it.
The full output of the command that failed: N/A
Expected results: The mapping between nodes matches what you get when putting side by side the lists of bricks on both clusters as returned by volume info command.
Mandatory info: - The output of the
gluster volume info
command:- The output of the
gluster volume status
command:- The output of the
gluster volume heal info
command:- Provide logs present on following locations of client and server nodes - /var/log/glusterfs/ I added some debugging statements in
monitor.py
to diagnose the problem, here's an excerpt showing what happens:-Is there any crash ? Provide the backtrace and coredump N/A
Additional info:
I tested replacing the
set
constructor with thelist
constructor in monitor.py and I got the expected results, so I think the fix could be quite simple, but maybe there are side effects I do not know about.The operating system / glusterfs version:
GlusterFS 9.4 (but the affected code is still there in devel) CentOS 7.4