Snapchat / KeyDB

A Multithreaded Fork of Redis
https://keydb.dev
BSD 3-Clause "New" or "Revised" License
11.49k stars 578 forks source link

[BUG] Active replication can not guarantee data consistency #366

Open yongman opened 3 years ago

yongman commented 3 years ago

Describe the bug

Active replication can not guarantee data consistency in simple scenario.

To reproduce

Steps to reproduce the behavior and/or a minimal code sample.

  1. Setup active replica in localhost with port 20000 and 21000 instance.
  2. Simulate split brain with iptables command.
  3. Send different value to the same key.
  4. Restore network split-brain.
  5. Get value from different instance with different value.

Expected behavior

A description of what you expected to happen. Same value of same key in different keydb instances.

Additional information

simulate shell script

#! /bin/bash
echo "begin brain split"

iptables -t filter -I INPUT 1 -s 127.0.0.1/32 -p tcp --dport 20000 -j DROP
iptables -t filter -I INPUT 1 -s 127.0.0.1/32 -p tcp --dport 21000 -j DROP

sleep 1

echo "set different value to key b"
# use ip address instead of localhost
redis-cli -h 192.168.2.234 -p 20000 set b b2
sleep 1
redis-cli -h 192.168.2.234 -p 21000 set b b1
sleep 1
redis-cli -h 192.168.2.234 -p 20000 set b b3
sleep 1

echo "restore network"
iptables -t filter -D INPUT 1
iptables -t filter -D INPUT 1

sleep 10
echo "get key b from keydb 20000"
echo `redis-cli -p 20000 get b`

echo "get key b from keydb 21000"
echo `redis-cli -p 21000 get b`

output as follows

begin brain split
set different value to key b
OK
OK
OK
restore network
get key b from keydb 20000
b1
get key b from keydb 21000
b3

Any additional information that is relevant to the problem.

yongman commented 3 years ago

The newer write redis-cli -h 192.168.2.234 -p 20000 set b b3 be overwritten by older request sync redis-cli -h 192.168.2.234 -p 21000 set b b1

hellojaewon commented 2 years ago

I had the same issue on data consistency after finishing partial resynchronization. I attached the logs I got during network failure and restored.

Node A (172.19.22.6)

// network failure

5909:5919:S 03 Aug 2022 07:09:28.393 # MASTER timeout: no data nor PING received...
5909:5919:S 03 Aug 2022 07:09:28.394 # Connection with master lost.
5909:5919:S 03 Aug 2022 07:09:28.394 * Caching the disconnected master state.
5909:5919:S 03 Aug 2022 07:09:28.394 * Connecting to MASTER 172.19.222.1:6379
5909:5919:S 03 Aug 2022 07:09:28.394 * MASTER <-> REPLICA sync started
5909:5919:S 03 Aug 2022 07:09:30.404 # Disconnecting timedout replica (streaming sync): 172.19.222.1:6379
5909:5919:S 03 Aug 2022 07:09:30.404 # Connection with replica 172.19.222.1:6379 lost.

// network restored

5909:5919:S 03 Aug 2022 07:10:29.680 # Timeout connecting to the MASTER...
5909:5919:S 03 Aug 2022 07:10:30.322 * Replica 172.19.222.1:6379 asks for synchronization
5909:5919:S 03 Aug 2022 07:10:30.323 * Partial resynchronization request from 172.19.222.1:6379 accepted. Sending 822 bytes of backlog starting from offset 3892.
5909:5919:S 03 Aug 2022 07:10:30.686 * Connecting to MASTER 172.19.222.1:6379
5909:5919:S 03 Aug 2022 07:10:30.686 * MASTER <-> REPLICA sync started
5909:5919:S 03 Aug 2022 07:10:30.687 * Non blocking connect for SYNC fired the event.
5909:5919:S 03 Aug 2022 07:10:30.687 * Master replied to PING, replication can continue...
5909:5919:S 03 Aug 2022 07:10:30.688 * Trying a partial resynchronization (request 90601618c9ce7846ccdaf1b3f4e4f709a8910715:3791).
5909:5919:S 03 Aug 2022 07:10:30.689 * Successful partial resynchronization with master.
5909:5919:S 03 Aug 2022 07:10:30.689 * MASTER <-> REPLICA sync: Master accepted a Partial Resynchronization.

Node B (172.19.22.1)

// network failure

5862:5872:S 03 Aug 2022 07:09:28.070 # MASTER timeout: no data nor PING received...
5862:5872:S 03 Aug 2022 07:09:28.070 # Connection with master lost.
5862:5872:S 03 Aug 2022 07:09:28.070 * Caching the disconnected master state.
5862:5872:S 03 Aug 2022 07:09:28.070 * Connecting to MASTER 172.19.222.6:6379
5862:5872:S 03 Aug 2022 07:09:28.070 * MASTER <-> REPLICA sync started
5862:5872:S 03 Aug 2022 07:09:31.081 # Disconnecting timedout replica (streaming sync): 172.19.222.6:6379
5862:5872:S 03 Aug 2022 07:09:31.081 # Connection with replica 172.19.222.6:6379 lost.

// network restored

5862:5872:S 03 Aug 2022 07:10:29.316 # Timeout connecting to the MASTER...
5862:5872:S 03 Aug 2022 07:10:30.320 * Connecting to MASTER 172.19.222.6:6379
5862:5872:S 03 Aug 2022 07:10:30.320 * MASTER <-> REPLICA sync started
5862:5872:S 03 Aug 2022 07:10:30.321 * Non blocking connect for SYNC fired the event.
5862:5872:S 03 Aug 2022 07:10:30.322 * Master replied to PING, replication can continue...
5862:5872:S 03 Aug 2022 07:10:30.323 * Trying a partial resynchronization (request 59c44e7640ddd739ac878c711b3e18523458cc61:3892).
5862:5872:S 03 Aug 2022 07:10:30.323 * Successful partial resynchronization with master.
5862:5872:S 03 Aug 2022 07:10:30.323 * MASTER <-> REPLICA sync: Master accepted a Partial Resynchronization.
5862:5872:S 03 Aug 2022 07:10:30.689 * Replica 172.19.222.6:6379 asks for synchronization
5862:5872:S 03 Aug 2022 07:10:30.689 * Partial resynchronization request from 172.19.222.6:6379 accepted. Sending 2351 bytes of backlog starting from offset 3791.

And according to the comment in the config file (keydb.conf), what does incorrect ordering mean? If the sequences that I did result in incorrect ordering, It is possible to lose data becasue first master will win. Is that possible scenario?

# Uncomment the option below to enable Active Active support.  Note that
# replicas will still sync in the normal way and incorrect ordering when
# bringing up replicas can result in data loss (the first master will win).
active-replica yes
JohnSully commented 2 years ago

@hellojaewon conflicts are resolved per key not per server. What happens when they resync is the timestamp of the last write to the key will be compared and the most recent will “win”. But it’s not true that at a specific server will win.

hellojaewon commented 2 years ago

@JohnSully I agree with the conflict resolution. Sorry about making you confused. My question and situation are same as @yongman described. So, please let me know, if you make it.

hellojaewon commented 2 years ago

@JohnSully Is there any progress on the issue about data consistency?

msotheeswaran-sc commented 1 year ago

Is there anyone still experiencing this issue, will be helpful to understand how to prioritize this.

rjbsw commented 1 year ago

I've been evaluating keydb active-active replication for a project where I need a 2 node cluster, and I think I'm seeing the same issue.

I'm using two docker containers (A and B) and creating a split brain scenario by connecting/disconnecting containers from the docker network. With the nodes in contact everything works as expected.

I'm creating the split brain then setting the same pair of keys on both instances in a specific sequence before re attaching the container.

The key name signifies the order of issuing set commands to the containers (ab means set on A then B)

Sequence of commands: set ab a // on instance A set ab b // on instance B set ba b // on instance B set ba a // on instance A

The keys are read back and with the split brain still in effect I see each instance only has the local update.

On A: ab = a ba = a

On B: ab = b ba = b

The container is then re-connected and the split brain resolves. Both sides do a partial sync. I'm expecting the most recent write of each key to win out over the older write on the other node.

Expected - both instances have the most recent write: ab = b ba = a

Observed results - values differ and the sync'd data from the remote node appears to overwrite more recent data on the local node in each instance: On A: ab = b ba = b

On B: ab = a ba = a

kristopher-h commented 11 months ago

I can confirm what rjbsw wrote is still an issue in 6.3.4