Snapchat / KeyDB

A Multithreaded Fork of Redis
https://keydb.dev
BSD 3-Clause "New" or "Revised" License
11.34k stars 572 forks source link

[BUG] Slave crash on dictRehash #792

Open keithchew opened 7 months ago

keithchew commented 7 months ago

I encountered this crash on a slave node:

7:96:S 28 Feb 2024 21:24:38.284 # KeyDB 6.3.4 crashed by signal: 11, si_code: 1
7:96:S 28 Feb 2024 21:24:38.284 # Accessing address: 0xffffffffffffffff
7:96:S 28 Feb 2024 21:24:38.284 # Crashed running the instruction at: 0x555565361d84

------ STACK TRACE ------
EIP:
/opt/KeyDB/bin/keydb-server *:6379(dictSdsHash(void const*)+0x4) [0x555565361d84]

Backtrace:
/lib/x86_64-linux-gnu/libpthread.so.0(+0x14420) [0x7f30ef17e420]
/opt/KeyDB/bin/keydb-server *:6379(dictSdsHash(void const*)+0x4) [0x555565361d84]
/opt/KeyDB/bin/keydb-server *:6379(dictRehash+0x8f) [0x55556535e10f]
/opt/KeyDB/bin/keydb-server *:6379(+0x226632) [0x555565360632]
/opt/KeyDB/bin/keydb-server *:6379(redisDbPersistentData::incrementallyRehash()+0x49) [0x5555653606a9]
/opt/KeyDB/bin/keydb-server *:6379(databasesCron(bool)+0x2e3) [0x555565362ee3]
/opt/KeyDB/bin/keydb-server *:6379(serverCronLite(aeEventLoop*, long long, void*)+0x9b) [0x555565364cab]
/opt/KeyDB/bin/keydb-server *:6379(aeProcessEvents+0x235) [0x55556535ad85]
/opt/KeyDB/bin/keydb-server *:6379(aeMain+0x3e) [0x55556535b73e]
/opt/KeyDB/bin/keydb-server *:6379(workerThreadMain(void*)+0x12b) [0x55556537472b]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8609) [0x7f30ef172609]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x43) [0x7f30ef097133]

I have not traced the root cause of the crash, but thought I'd report a bug here first. Will post an update once I dig a bit deeper into this.

keithchew commented 7 months ago

Another crash, but from main thread:

=== KEYDB BUG REPORT START: Cut & paste starting from here ===
7:95:S 28 Feb 2024 21:45:56.344 # KeyDB 6.3.4 crashed by signal: 7, si_code: 128
7:95:S 28 Feb 2024 21:45:56.344 # Accessing address: (nil)
7:95:S 28 Feb 2024 21:45:56.344 # Crashed running the instruction at: 0x558be35a3106

------ STACK TRACE ------
EIP:
/opt/KeyDB/bin/keydb-server *:6379(dictRehash+0x86) [0x558be35a3106]

Backtrace:
/lib/x86_64-linux-gnu/libpthread.so.0(+0x14420) [0x7f5e8943b420]
/opt/KeyDB/bin/keydb-server *:6379(dictRehash+0x86) [0x558be35a3106]
/opt/KeyDB/bin/keydb-server *:6379(+0x226632) [0x558be35a5632]
/opt/KeyDB/bin/keydb-server *:6379(redisDbPersistentData::incrementallyRehash()+0x49) [0x558be35a56a9]
/opt/KeyDB/bin/keydb-server *:6379(databasesCron(bool)+0x2e3) [0x558be35a7ee3]
/opt/KeyDB/bin/keydb-server *:6379(serverCron(aeEventLoop*, long long, void*)+0x2e3) [0x558be35a8b23]
/opt/KeyDB/bin/keydb-server *:6379(aeProcessEvents+0x235) [0x558be359fd85]
/opt/KeyDB/bin/keydb-server *:6379(aeMain+0x3e) [0x558be35a073e]
/opt/KeyDB/bin/keydb-server *:6379(workerThreadMain(void*)+0x12b) [0x558be35b972b]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8609) [0x7f5e8942f609]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x43) [0x7f5e89354133]
keithchew commented 1 week ago

I tried setting this config option:

enable-async-rehash no

to disable async rehash and it seems to be stable. Will keep testing and if OK, will try to dig deeper to finding the root cause of the crash.

keithchew commented 22 hours ago

Hmm, just got another crash. Seems like the only workaround at the moment is to use:

activerehashing no

Will update here if I find anything else useful.