Snapchat / KeyDB

A Multithreaded Fork of Redis
https://keydb.dev
BSD 3-Clause "New" or "Revised" License
11.02k stars 564 forks source link

[BUG] active-active bugs with 3 servers, server connecting to 127.0.0.1 #786

Closed BHare1985 closed 3 months ago

BHare1985 commented 4 months ago

Describe the bug

I have one server in particular (server 2 of 3) that keeps trying to connect to itself as a master and then gets "Reading from master: Connection timed out". This is a huge issue for me, as it makes keydb respond VERY slowly, like 1000-2000ms when this issue is happening. Once I restart keydb it would fix itself and work for another ~24 hours. Here is the setup, each server is on a different VPS

server-1 (1.1.1.1)

bind 0.0.0.0
port 27081
protected-mode no
requirepass somepassword
masterauth somepassword
multi-master yes
active-replica yes
replicaof 2.2.2.2 27081
replicaof 3.3.3.3 27081

server-2 (2.2.2.2)

bind 0.0.0.0
port 27081
protected-mode no
requirepass somepassword
masterauth somepassword
multi-master yes
active-replica yes
replicaof 1.1.1.1 27081
replicaof 3.3.3.3 27081

server-3 (3.3.3.3)

bind 0.0.0.0
port 27081
protected-mode no
requirepass somepassword
masterauth somepassword
multi-master yes
active-replica yes
replicaof 1.1.1.1 27081
replicaof 2.2.2.2 27081

I noticed that after sometime, normally everyday server-2 started to connect to master 127.0.0.1, but only after it had CONFIG REWRITE failed: Permission denied. So I read that on Debian that you sometimes have to chown /etc/keydb with keydb:keydb

Once I did this, the issue didn't go away but instead of connecting to 127.0.0.1 it started to try to connect to 2.2.2.2 (which was the public IP of the server). I looked at the config and this is what server-2 config was now:

requirepass "somepassword"
masterauth "somepassword"
multi-master yes
active-replica yes

# Generated by CONFIG REWRITE
user default on #eb7c1e1e710f242b32ee1d89ddf139ed9a22b480eb0ed10db99244f19ab3dac6 ~* &* +@all

replicaof 2.2.2.2 27081

To reproduce

Seems to be intermittent and only happens to server-2

Expected behavior

The server only tries to replicate the servers listed as "replicaof" and not try to set the master as itself.

Additional information

Attached is a log file of what happened, with the IP addresses changed to fit the explanation log.txt

BHare1985 commented 4 months ago

After double checking the configurations I realized that server-1 was setup incorrectly with

replicaof 1.1.1.1 27081
replicaof 3.3.3.3 27081

I am not sure if this contributed the issue I've been seeing. I am surprised server-1 hasn't complained about the issue in the log file but server-2 is the one that had issues. I will continue to monitor and see if server-2 issues come back now that server-1 is fixed

BHare1985 commented 3 months ago

Turns out this was caused by having redis-sentinel still installed and running. I initially installed redis and sentinel before switching to keydb, must of forgot to uninstall sentinel.