Snapchat / KeyDB

A Multithreaded Fork of Redis
https://keydb.dev
BSD 3-Clause "New" or "Revised" License
11.02k stars 564 forks source link

[CRASH] Crash migrating from redis to keydb version 6.3.X #736

Open marcocapetta opened 8 months ago

marcocapetta commented 8 months ago

Hi All,

we recently noticed a problem in debian bookworm migrating from redis to keydb.

Our setup is a two node system with redis replica master-slave, in particular:

This is the config we have in redis: replicaof 192.168.255.251 6379 replica-serve-stale-data yes replica-read-only yes

The scope of migrating to Keydb is in particular to obtain a master-master replica. The procedure we use for the migration is the following:

  1. stop redis on node B
  2. copy on node B redis aof and rdb files to keydb folders
  3. start keydb on node B as replica + master
  4. stop redis on node A
  5. copy on node A redis aof and rdb files to keydb folders
  6. start keydb on node A as replica + master

This is the config we have in keydb: multi-master yes active-replica yes replicaof 192.168.255.251 6379 replica-serve-stale-data yes

The procedure was working perfectly fine in debian bullseye with:

but it fails in debian bookworm with:

In particular the problem is that, after having migrated the first node (B in our case) and thus while having a server running on redis and the second running on keydb, then keydb server crashes. In particular I noticed that if I remove the 'multi-master' and 'active-replica' lines from the keydb config, the issue disappear, but of course this is not what we need.

To exclude that the problem could be caused by a different version of the operating system or of redis, we created for bookworm the keydb package of version 6.2.2 (the one we used in bullseye). I can confirm that in this case the above procedure still works perfectly fine. So the problem looks related to the Keydb version 6.3.2 and above.

Crash report

Reading symbols from /usr/bin/keydb-server...
Reading symbols from /usr/lib/debug/.build-id/2b/7a4e39d8121079a680825a9b7e0d9e923e0f13.debug...
[New LWP 110254]
[New LWP 110248]
[New LWP 110251]
[New LWP 110249]
[New LWP 110253]
[New LWP 110250]
[New LWP 110256]
[New LWP 110252]
[New LWP 110244]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/bin/keydb-server 127.0.0.1:6379       '.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007f71ea0a9d3c in ?? () from /lib/x86_64-linux-gnu/libc.so.6
[Current thread is 1 (Thread 0x7f71e33fd6c0 (LWP 110254))]
(gdb) bt full
#0  0x00007f71ea0a9d3c in ?? () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#1  0x00007f71ea05af32 in raise () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#2  0x00007f71ea045472 in abort () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#3  0x0000560e6d6bc7be in bugReportEnd (killViaSignal=0, sig=0) at /build/keydb-6.3.4/src/debug.cpp:2099
        act = {__sigaction_handler = {sa_handler = 0x0, sa_sigaction = 0x0}, sa_mask = {__val = {8665512232487505152, 0, 2, 140127120643776, 94619973765664, 94619972122287, 8665512232487505152, 
              140127120643776, 8665512232487505152, 94619973745760, 94619972131578, 94619973745760, 94621350756352, 94619972122287, 8665512232487505152, 94619972131578}}, sa_flags = 1836870842, 
          sa_restorer = 0x7f71e8ea52a0}
#4  0x0000560e6d770e21 in redisDbPersistentData::processChanges (this=0x7f71e8ea52a0, fSnapshot=<optimized out>) at /build/keydb-6.3.4/src/db.cpp:2935
No locals.
#5  0x0000560e6d733dc9 in beforeSleep (eventLoop=<optimized out>) at /build/keydb-6.3.4/src/server.cpp:2951
        idb = 0
        storage_process_latency = <optimized out>
        locker = {m_fArmed = true}
        iel = 0
        zmalloc_used = <optimized out>
        vecdb = std::vector of length 0, capacity 0
        aof_state = <optimized out>
        commit_latency = <optimized out>
        fSentReplies = <optimized out>
        ul = {_M_device = 0x7f71e8f73c40, _M_owns = false}
        fFirstRun = false
#6  0x0000560e6d729a78 in aeProcessEvents (eventLoop=eventLoop@entry=0x7f71e8ee7300, flags=flags@entry=27) at /build/keydb-6.3.4/src/ae.cpp:755
        ulock = {_M_device = 0x560e6deaf940 <g_lock>, _M_owns = false}
        j = <optimized out>
        tv = {tv_sec = 0, tv_usec = 24042}
        tvp = 0x7f71e33fada0
        usUntilTimer = <optimized out>
        processed = 0
        numevents = <optimized out>
#7  0x0000560e6d72a0ce in aeMain (eventLoop=0x7f71e8ee7300) at /build/keydb-6.3.4/src/ae.cpp:823
No locals.
#8  0x0000560e6d741855 in workerThreadMain (parg=<optimized out>) at /build/keydb-6.3.4/src/server.cpp:7386
        iel = 0
        el = 0x7f71e8ee7300
#9  0x00007f71ea0a8044 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#10 0x00007f71ea127880 in clone () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.

I can easily replicate the issue, so I can provide any additional details if needed.

Thank you Marco