Open keithchew opened 6 months ago
I managed to trigger another dead lock, this time with logging and stack trace:
------ STACK TRACE ------
Backtrace:
/opt/KeyDB/bin/keydb-server *:6379(+0x36a7d9) [0x55fd894637d9]
/opt/KeyDB/bin/keydb-server *:6379(RocksDBStorageProvider::count() const+0x1c) [0x55fd8944ea0c]
/opt/KeyDB/bin/keydb-server *:6379(StorageCache::count() const+0x46) [0x55fd89454d16]
/opt/KeyDB/bin/keydb-server *:6379(dbTotalServerKeyCount()+0x53) [0x55fd89359733]
/opt/KeyDB/bin/keydb-server *:6379(redisFork(int)+0x88f) [0x55fd89331eff]
/opt/KeyDB/bin/keydb-server *:6379(RM_Fork(void (*)(int, int, void*), void*)+0x18) [0x55fd89414bd8]
/opt/redis-stack/lib/redisearch.so(+0x1953af) [0x7f0f74a423af]
/opt/redis-stack/lib/redisearch.so(+0x1984b3) [0x7f0f74a454b3]
/opt/redis-stack/lib/redisearch.so(+0x15ba27) [0x7f0f74a08a27]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8609) [0x7f0fa3279609]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x43) [0x7f0fa319e133]
release exclusive [1] m_writeCount [0]
!!! ERROR: Deadlock detected !!!
52: (0x7f0fa1e86d70) RocksDBStorageProvider
70: (0x7f0fa1e3111c) StorageCache
printing backtrace for thread 52
!!! KeyDB Will Now Crash !!!
=== KEYDB BUG REPORT START: Cut & paste starting from here ===
7:52:M 31 Dec 2023 19:27:05.020 # ------------------------------------------------
7:52:M 31 Dec 2023 19:27:05.020 # !!! Software Failure. Press left mouse button to continue
7:52:M 31 Dec 2023 19:27:05.020 # Guru Meditation: Deadlock detected #fastlock.cpp:268
------ STACK TRACE ------
Backtrace:
/opt/KeyDB/bin/keydb-server *:6379(fastlock_sleep+0x49f) [0x55fd8943fe6f]
/opt/KeyDB/bin/keydb-server *:6379(+0x36a7d9) [0x55fd894637d9]
/opt/KeyDB/bin/keydb-server *:6379(RocksDBStorageProvider::count() const+0x1c) [0x55fd8944ea0c]
/opt/KeyDB/bin/keydb-server *:6379(StorageCache::count() const+0x46) [0x55fd89454d16]
/opt/KeyDB/bin/keydb-server *:6379(dbTotalServerKeyCount()+0x53) [0x55fd89359733]
/opt/KeyDB/bin/keydb-server *:6379(redisFork(int)+0x88f) [0x55fd89331eff]
/opt/KeyDB/bin/keydb-server *:6379(RM_Fork(void (*)(int, int, void*), void*)+0x18) [0x55fd89414bd8]
/opt/redis-stack/lib/redisearch.so(+0x1953af) [0x7f0f74a423af]
/opt/redis-stack/lib/redisearch.so(+0x1984b3) [0x7f0f74a454b3]
/opt/redis-stack/lib/redisearch.so(+0x15ba27) [0x7f0f74a08a27]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8609) [0x7f0fa3279609]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x43) [0x7f0fa319e133]
------ INFO OUTPUT ------
It looks like it is coming from this line when the fork happens:
g_pserver->stat_current_save_keys_total = dbTotalServerKeyCount();
Not sure how to workaround this, if anyone has any input, that'll be much appreciated.
OK, I finally tracked it down.
My test module uses the RedisModule_CreateTimer() for periodic callbacks, and when the main thread in KeyDB calls the callback, the module uses another thread to call RedisModule_Fork(). To ensure there is no dead lock in KeyDB, I had had to do 2 things:
- In module, call RedisModule_ThreadSafeContextLock() and RedisModule_ThreadSafeContextUnlock() before/after RedisModule_Fork()
- In KeyDB, skip aeAcquireForkLock() call
I believe some of the standard modules in Redis use the same pattern as the test module, so it can be a gotcha for anyone using Redis modules within KeyDB. If you experience a complete freeze on KeyDB, then likely you have struck this bug. It took over 35,000 calls to aeAcquireForkLock (10-12 days) to trigger the dead lock for me, so not something that is triggered quickly.
Will keep testing and post any updates here.
Working on v6.3.4, I have a test module which calls RedisModule_Fork() periodically. I encountered a condition where KeyDB completely freezes, and the logs have stopped logged. Adding some logging, I tracked it down to redisFork() in server.cpp:
when it freezes, "aeAcquireForkLock start" is printed but not the corresponding finish. Looking into this method, I can see a comment in the code:
The comment says it should release the internal lock and try again, but it looks like it is only trying again without any release?
I am still tracking this down, but if anyone has experienced the same issue, or has any advice on this, happy to take any suggestions to try out.
Note: I have Save "" in the config, so there is no background saving happening. From the logs, I cannot see other places that is trying to call the fork, so this deadlock scenario is a bit strange.