Snapchat / KeyDB

A Multithreaded Fork of Redis
https://keydb.dev
BSD 3-Clause "New" or "Revised" License
11.02k stars 564 forks source link

[BUG] keydb 6.3.4 - it was just hang #794

Open rnz opened 3 months ago

rnz commented 3 months ago

Describe the bug 2024-02-22 keydb-server v6.3.4 started as replica of redis 2024-02-26 promoted to master (by hand - change domain name, set replicaof no one) 2024-05-08 keydb-server just hang, last messages in log:

...
3897121:501:C 08 Mar 2024 11:55:19.604 * DB saved on disk
3897121:501:C 08 Mar 2024 11:55:21.681 * RDB: 1554 MB of memory used by copy-on-write
478:501:M 08 Mar 2024 11:55:22.983 * Background saving terminated with success
478:501:M 08 Mar 2024 11:56:23.005 * 10000 changes in 60 seconds. Saving...
478:501:M 08 Mar 2024 11:56:24.160 * Background saving started by pid 3898059
478:501:M 08 Mar 2024 11:56:24.160 * Background saving started
3898059:501:C 08 Mar 2024 11:59:03.932 * DB saved on disk
3898059:501:C 08 Mar 2024 11:59:06.049 * RDB: 1583 MB of memory used by copy-on-write
478:501:M 08 Mar 2024 11:59:07.184 * Background saving terminated with success
478:501:M 08 Mar 2024 12:00:08.081 * 10000 changes in 60 seconds. Saving...
478:signal-handler (1709901262) Received SIGTERM scheduling shutdown...
478:signal-handler (1709901262) Received SIGTERM scheduling shutdown...
3900116:3900115:C 08 Mar 2024 12:35:00.092 # oO0OoO0OoO0Oo KeyDB is starting oO0OoO0OoO0Oo
3900116:3900115:C 08 Mar 2024 12:35:00.092 # KeyDB version=6.3.4, bits=64, commit=00000000, modified=0, pid=3900116, just started
3900116:3900115:C 08 Mar 2024 12:35:00.092 # Configuration loaded
3900116:3900115:M 08 Mar 2024 12:35:00.103 * monotonic clock: POSIX clock_gettime
... #motd skipped
3900116:3900115:M 08 Mar 2024 12:35:00.344 # Server initialized
3900116:3900115:M 08 Mar 2024 12:35:00.346 * Loading RDB produced by version 6.3.4
3900116:3900115:M 08 Mar 2024 12:35:00.347 * RDB age 2316 seconds
3900116:3900115:M 08 Mar 2024 12:35:00.347 * RDB memory usage when created 12861.42 Mb
3900116:3900115:M 08 Mar 2024 12:36:42.987 # Done loading RDB, keys loaded: 0, keys expired: 0.
3900116:3900115:M 08 Mar 2024 12:36:42.987 * DB loaded from disk: 102.641 seconds
3900116:3900115:M 08 Mar 2024 12:36:43.067 # Warning: server-threads is set to 14.  This is above the maximum recommend value of 4, please ensure you've verified this is actually faster on yo
ur machine.
3900116:3900508:M 08 Mar 2024 12:36:43.068 * Thread 0 alive.
3900116:3900509:M 08 Mar 2024 12:36:43.068 * Thread 1 alive.
3900116:3900510:M 08 Mar 2024 12:36:43.068 * Thread 2 alive.
3900116:3900511:M 08 Mar 2024 12:36:43.068 * Thread 3 alive.
3900116:3900512:M 08 Mar 2024 12:36:43.068 * Thread 4 alive.
3900116:3900513:M 08 Mar 2024 12:36:43.069 * Thread 5 alive.
3900116:3900514:M 08 Mar 2024 12:36:43.069 * Thread 6 alive.
3900116:3900515:M 08 Mar 2024 12:36:43.069 * Thread 7 alive.
3900116:3900517:M 08 Mar 2024 12:36:43.069 * Thread 9 alive.
3900116:3900516:M 08 Mar 2024 12:36:43.069 * Thread 8 alive.
3900116:3900519:M 08 Mar 2024 12:36:43.070 * Thread 11 alive.
3900116:3900518:M 08 Mar 2024 12:36:43.070 * Thread 10 alive.
3900116:3900521:M 08 Mar 2024 12:36:43.070 * Thread 13 alive.
3900116:3900520:M 08 Mar 2024 12:36:43.070 * Thread 12 alive.
3900116:3900508:M 08 Mar 2024 12:36:44.424 * 10000 changes in 60 seconds. Saving...
3900116:3900508:M 08 Mar 2024 12:36:44.840 * Background saving started by pid 3900532
3900116:3900508:M 08 Mar 2024 12:36:44.840 * Background saving started
3900532:3900508:C 08 Mar 2024 12:38:59.659 * DB saved on disk
...

keydb-cli info - hang systemd was trying restart - it was send SIGTERM, keydb add message to log about SIGTERM, but not stopped, I send SIGTERM by cli with same result - message in keydb log and process keydb-server still running keydb-server killed by SIGKILL

To reproduce I don't know

Expected behavior keydb-server not hang

Additional information

host: AMD EPYC 7702 64-Core Processor (2 Sockets)
host kernel: Linux 5.15.111-1-pve #1 SMP PVE 5.15.111-1 (2023-08-18T08:57Z)
host os: debian 11.7
host proxmox: PVE Manager Version pve-manager/7.4-16/0f39f621
host numa domains: 
    NUMA node0 CPU(s):                  0-63,128-191
    NUMA node1 CPU(s):                  64-127,192-255
container: LXC unprivileged, nesting=1 
container os: Debian 11.9
rnz commented 3 months ago

Hi!

Today the keydb hung for the third time, after switching from redis to keydb 6.3.4:

2024-02-22 keydb-server setup and run, and sync with redis as replica 2024-02-26 promote keydbserver to master and switch clients (by disable replica and change domain name) 2024-03-08 hang (connections is not working), main proccess fork keydb-rdb-bgsave and it was normal finished, main process was not execute shutdown by SIGTERM 2024-03-16 hang (connections is not working), main proccess fork keydb-rdb-bgsave and it was normal finished, main process was not execute shutdown by SIGTERM 2024-03-19 hang (connections is not working), main proccess fork keydb-rdb-bgsave and it was normal finished, main process was not execute shutdown by SIGTERM