Snapchat / KeyDB

A Multithreaded Fork of Redis
https://keydb.dev
BSD 3-Clause "New" or "Revised" License
11.02k stars 564 forks source link

[BUG] crash on del with expiry #805

Open keithchew opened 3 months ago

keithchew commented 3 months ago

Hi

Testing on v6.3.4 and encountered this stack trace:

7:72:M 24 Mar 2024 23:00:10.861 # === ASSERTION FAILED ===
7:72:M 24 Mar 2024 23:00:10.861 # ==> db.cpp:1833 'm_numexpires > 0' is not true

------ STACK TRACE ------
Backtrace:
/opt/KeyDB/bin/keydb-server *:6379(redisDbPersistentData::removeExpire(redisObject*, dict_iter)+0x9d) [0x55820527995d]
/opt/KeyDB/bin/keydb-server *:6379(redisDbPersistentData::syncDelete(redisObject*)+0x29e) [0x558205273ace]
/opt/KeyDB/bin/keydb-server *:6379(delGenericCommand(client*, int)+0x8c) [0x5582052740dc]
/opt/KeyDB/bin/keydb-server *:6379(call(client*, int)+0xa7) [0x558205242157]
/opt/KeyDB/bin/keydb-server *:6379(execCommand(client*)+0x19f) [0x5582052db8cf]
/opt/KeyDB/bin/keydb-server *:6379(call(client*, int)+0xa7) [0x558205242157]
/opt/KeyDB/bin/keydb-server *:6379(processCommand(client*, int)+0x7c7) [0x558205243fc7]
/opt/KeyDB/bin/keydb-server *:6379(processCommandAndResetClient(client*, int)+0x69) [0x558205262b79]
/opt/KeyDB/bin/keydb-server *:6379(processInputBuffer(client*, bool, int)+0x254) [0x558205268d44]
/opt/KeyDB/bin/keydb-server *:6379(processClients()+0xe6) [0x558205268ee6]
/opt/KeyDB/bin/keydb-server *:6379(beforeSleep(aeEventLoop*)+0x2db) [0x55820523fe9b]
/opt/KeyDB/bin/keydb-server *:6379(aeProcessEvents+0x400) [0x5582052315c0]
/opt/KeyDB/bin/keydb-server *:6379(aeMain+0x3e) [0x558205231dae]
/opt/KeyDB/bin/keydb-server *:6379(workerThreadMain(void*)+0x12b) [0x55820524b24b]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8609) [0x7f8b0fd70609]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x43) [0x7f8b0fc93353]

I have traced through the code in syncDelete, and it looks like this scenario happens when m_numexpires is 0 and ensure() in db.cpp cannot find the record from flash (ie where stat_storage_provider_read_misses is incremented). In this condition, ensure() will not increment m_numexpires from 0 and the crash happens.

Not sure the best way to work around this, perhaps syncDelete should not assert m_numexpires > 0 if there is a read miss from flash?

keithchew commented 3 months ago

Apologies, I was actually testing the async_flash branch, and it looks like it is missing a ++m_numexpires there. Have patched my local and error is no longer happening.