Open yongman opened 3 years ago
The newer write redis-cli -h 192.168.2.234 -p 20000 set b b3
be overwritten by older request sync redis-cli -h 192.168.2.234 -p 21000 set b b1
I had the same issue on data consistency after finishing partial resynchronization. I attached the logs I got during network failure and restored.
Node A (172.19.22.6)
// network failure
5909:5919:S 03 Aug 2022 07:09:28.393 # MASTER timeout: no data nor PING received...
5909:5919:S 03 Aug 2022 07:09:28.394 # Connection with master lost.
5909:5919:S 03 Aug 2022 07:09:28.394 * Caching the disconnected master state.
5909:5919:S 03 Aug 2022 07:09:28.394 * Connecting to MASTER 172.19.222.1:6379
5909:5919:S 03 Aug 2022 07:09:28.394 * MASTER <-> REPLICA sync started
5909:5919:S 03 Aug 2022 07:09:30.404 # Disconnecting timedout replica (streaming sync): 172.19.222.1:6379
5909:5919:S 03 Aug 2022 07:09:30.404 # Connection with replica 172.19.222.1:6379 lost.
// network restored
5909:5919:S 03 Aug 2022 07:10:29.680 # Timeout connecting to the MASTER...
5909:5919:S 03 Aug 2022 07:10:30.322 * Replica 172.19.222.1:6379 asks for synchronization
5909:5919:S 03 Aug 2022 07:10:30.323 * Partial resynchronization request from 172.19.222.1:6379 accepted. Sending 822 bytes of backlog starting from offset 3892.
5909:5919:S 03 Aug 2022 07:10:30.686 * Connecting to MASTER 172.19.222.1:6379
5909:5919:S 03 Aug 2022 07:10:30.686 * MASTER <-> REPLICA sync started
5909:5919:S 03 Aug 2022 07:10:30.687 * Non blocking connect for SYNC fired the event.
5909:5919:S 03 Aug 2022 07:10:30.687 * Master replied to PING, replication can continue...
5909:5919:S 03 Aug 2022 07:10:30.688 * Trying a partial resynchronization (request 90601618c9ce7846ccdaf1b3f4e4f709a8910715:3791).
5909:5919:S 03 Aug 2022 07:10:30.689 * Successful partial resynchronization with master.
5909:5919:S 03 Aug 2022 07:10:30.689 * MASTER <-> REPLICA sync: Master accepted a Partial Resynchronization.
Node B (172.19.22.1)
// network failure
5862:5872:S 03 Aug 2022 07:09:28.070 # MASTER timeout: no data nor PING received...
5862:5872:S 03 Aug 2022 07:09:28.070 # Connection with master lost.
5862:5872:S 03 Aug 2022 07:09:28.070 * Caching the disconnected master state.
5862:5872:S 03 Aug 2022 07:09:28.070 * Connecting to MASTER 172.19.222.6:6379
5862:5872:S 03 Aug 2022 07:09:28.070 * MASTER <-> REPLICA sync started
5862:5872:S 03 Aug 2022 07:09:31.081 # Disconnecting timedout replica (streaming sync): 172.19.222.6:6379
5862:5872:S 03 Aug 2022 07:09:31.081 # Connection with replica 172.19.222.6:6379 lost.
// network restored
5862:5872:S 03 Aug 2022 07:10:29.316 # Timeout connecting to the MASTER...
5862:5872:S 03 Aug 2022 07:10:30.320 * Connecting to MASTER 172.19.222.6:6379
5862:5872:S 03 Aug 2022 07:10:30.320 * MASTER <-> REPLICA sync started
5862:5872:S 03 Aug 2022 07:10:30.321 * Non blocking connect for SYNC fired the event.
5862:5872:S 03 Aug 2022 07:10:30.322 * Master replied to PING, replication can continue...
5862:5872:S 03 Aug 2022 07:10:30.323 * Trying a partial resynchronization (request 59c44e7640ddd739ac878c711b3e18523458cc61:3892).
5862:5872:S 03 Aug 2022 07:10:30.323 * Successful partial resynchronization with master.
5862:5872:S 03 Aug 2022 07:10:30.323 * MASTER <-> REPLICA sync: Master accepted a Partial Resynchronization.
5862:5872:S 03 Aug 2022 07:10:30.689 * Replica 172.19.222.6:6379 asks for synchronization
5862:5872:S 03 Aug 2022 07:10:30.689 * Partial resynchronization request from 172.19.222.6:6379 accepted. Sending 2351 bytes of backlog starting from offset 3791.
And according to the comment in the config file (keydb.conf), what does incorrect ordering mean? If the sequences that I did result in incorrect ordering, It is possible to lose data becasue first master will win. Is that possible scenario?
# Uncomment the option below to enable Active Active support. Note that
# replicas will still sync in the normal way and incorrect ordering when
# bringing up replicas can result in data loss (the first master will win).
active-replica yes
@hellojaewon conflicts are resolved per key not per server. What happens when they resync is the timestamp of the last write to the key will be compared and the most recent will “win”. But it’s not true that at a specific server will win.
@JohnSully I agree with the conflict resolution. Sorry about making you confused. My question and situation are same as @yongman described. So, please let me know, if you make it.
@JohnSully Is there any progress on the issue about data consistency?
Is there anyone still experiencing this issue, will be helpful to understand how to prioritize this.
I've been evaluating keydb active-active replication for a project where I need a 2 node cluster, and I think I'm seeing the same issue.
I'm using two docker containers (A and B) and creating a split brain scenario by connecting/disconnecting containers from the docker network. With the nodes in contact everything works as expected.
I'm creating the split brain then setting the same pair of keys on both instances in a specific sequence before re attaching the container.
The key name signifies the order of issuing set commands to the containers (ab means set on A then B)
Sequence of commands: set ab a // on instance A set ab b // on instance B set ba b // on instance B set ba a // on instance A
The keys are read back and with the split brain still in effect I see each instance only has the local update.
On A: ab = a ba = a
On B: ab = b ba = b
The container is then re-connected and the split brain resolves. Both sides do a partial sync. I'm expecting the most recent write of each key to win out over the older write on the other node.
Expected - both instances have the most recent write: ab = b ba = a
Observed results - values differ and the sync'd data from the remote node appears to overwrite more recent data on the local node in each instance: On A: ab = b ba = b
On B: ab = a ba = a
I can confirm what rjbsw wrote is still an issue in 6.3.4
Describe the bug
Active replication can not guarantee data consistency in simple scenario.
To reproduce
Steps to reproduce the behavior and/or a minimal code sample.
Expected behavior
A description of what you expected to happen. Same value of same key in different keydb instances.
Additional information
simulate shell script
output as follows
Any additional information that is relevant to the problem.