Snapchat / KeyDB

A Multithreaded Fork of Redis
https://keydb.dev
BSD 3-Clause "New" or "Revised" License
11.02k stars 564 forks source link

Hanging up on production env #787

Open aminiun opened 4 months ago

aminiun commented 4 months ago

Hi! I am trying to migrate from redis to keydb. I have set up keydb for our staging env for a while, and everything looks fine there. But when I switch from redis to keydb in out production env, It works fine for a few minutes, then it hangs up. Then no clients can connect to keydb and all connection requests face timeout. Even I am not able to restart keydb service through systemctl.

I have no log from keydb to provide here, all I have is my settings, which I think might be the issue:

bind 127.0.0.1 ip                                                                                                                                                                                                         
protected-mode yes                                                                                                                                                                                                                     
port 6379                                                                                                                                                                                                                              
tcp-backlog 511                                                                                                                                                                                                                        
timeout 0                                                                                                                                                                                                                              
tcp-keepalive 300                                                                                                                                                                                                                      
daemonize yes                                                                                                                                                                                                                          
supervised systemd                                                                                                                                                                                                                     
pidfile /var/run/keydb/keydb-server.pid                                                                                                                                                                                                
loglevel verbose                                                                                                                                                                                                                       
logfile /var/log/keydb/keydb-server.log                                                                                                                                                                                                
databases 16                                                                                                                                                                                                                           
always-show-logo yes                                                                                                                                                                                                                   
set-proc-title yes                                                                                                                                                                                                                     
proc-title-template "{title} {listen-addr} {server-mode}"                                                                                                                                                                              
save 900 1                                                                                                                                                                                                                             
save 300 10                                                                                                                                                                                                                            
save 60 10000                                                                                                                                                                                                                          
stop-writes-on-bgsave-error yes                                                                                                                                                                                                        
rdbcompression yes
rdbchecksum yes
dbfilename dump.rdb
rdb-del-sync-files no
dir /var/lib/keydb
replica-serve-stale-data yes
replica-read-only yes
repl-diskless-sync no
repl-diskless-sync-delay 5
repl-diskless-load disabled
repl-disable-tcp-nodelay no
replica-priority 100
acllog-max-len 128
requirepass test
maxmemory 6gb
maxmemory-policy allkeys-lru
lazyfree-lazy-eviction no
lazyfree-lazy-expire no
lazyfree-lazy-server-del no
replica-lazy-flush no
lazyfree-lazy-user-del no
lazyfree-lazy-user-flush no
oom-score-adj no
oom-score-adj-values 0 200 800
disable-thp yes
appendonly no
appendfilename "appendonly.aof"
appendfsync everysec
no-appendfsync-on-rewrite no
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
aof-load-truncated yes
aof-use-rdb-preamble yes
lua-time-limit 5000
slowlog-log-slower-than 10000
slowlog-max-len 128
latency-monitor-threshold 0
notify-keyspace-events ""
hash-max-ziplist-entries 512
hash-max-ziplist-value 64
list-max-ziplist-size -2
list-compress-depth 0
set-max-intset-entries 512
zset-max-ziplist-entries 128
zset-max-ziplist-value 64
hll-sparse-max-bytes 3000
stream-node-max-bytes 4096
stream-node-max-entries 100
activerehashing yes
client-output-buffer-limit normal 0 0 0
client-output-buffer-limit replica 256mb 64mb 60
client-output-buffer-limit pubsub 32mb 8mb 60
hz 10
dynamic-hz yes
aof-rewrite-incremental-fsync yes
rdb-save-incremental-fsync yes
jemalloc-bg-thread yes
server-threads 2
replica-weighting-factor 2

Also note that my server has 8 core CPU and 8g ram.

s4m4n commented 4 months ago
  1. Share stack trace
  2. What's your operating system?
  3. Which KeyDB version are you running?
  4. Is there anything interesting in keydb logs? /var/log/keydb/keydb-server.log
  5. Set up monitoring on your server, and make sure there are enough resources for keydb to operate. also, you can monitor keydb for r/w per minute or total DB size using 3rd party solutions.
  6. If you are running the same version of KeyDB/ OS on your production and staging, the only suspicious thing would be your workload.

P.S. I have had a keydb cluster running in production for +3years, we had only a few crashes which we fixed by tuning the config file. I've seen crashes happening when you are saving a big DB on a disk (data persistence mode/save).