Snapchat / KeyDB

A Multithreaded Fork of Redis
https://keydb.dev
BSD 3-Clause "New" or "Revised" License
11.02k stars 564 forks source link

[CRASH] RedisModule_Scan crashes with GlobalLocksAcquired #747

Open keithchew opened 7 months ago

keithchew commented 7 months ago

Crash report

I wrote a test module to perform RedisModule_Scan, and if the data is large enough or if this is called enough times, I get a crash. Module snippet is:

...
  RedisModule_ThreadSafeContextLock(ctx);
  RedisModule_Log(ctx, "notice", "Test Module Scanning");

  while (RedisModule_Scan(ctx, cursor, (RedisModuleScanCB)My_ScanProc, scanner)) {
    RedisModule_ThreadSafeContextUnlock(ctx);
    sched_yield();
    RedisModule_ThreadSafeContextLock(ctx);
  }
  RedisModule_Log(ctx, "notice", "Scanning completed");

  RedisModule_ThreadSafeContextUnlock(ctx);
...

Crash log:

9:58:M 22 Nov 2023 06:12:48.921 # === ASSERTION FAILED ===
9:58:M 22 Nov 2023 06:12:48.921 # ==> tls.cpp:1295 '!GlobalLocksAcquired()' is not true

------ STACK TRACE ------

Backtrace:
/opt/KeyDB/bin/keydb-server *:6379(tlsProcessPendingData()+0xbc) [0x560b6dc5732c]
/opt/KeyDB/bin/keydb-server *:6379(beforeSleep(aeEventLoop*)+0x6b) [0x560b6db357fb]
/opt/KeyDB/bin/keydb-server *:6379(aeProcessEvents+0x400) [0x560b6db2ba20]
/opt/KeyDB/bin/keydb-server *:6379(aeMain+0x3e) [0x560b6db2c20e]
/opt/KeyDB/bin/keydb-server *:6379(workerThreadMain(void*)+0x12b) [0x560b6db448bb]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8609) [0x7f944e81f609]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x43) [0x7f944e744133]

Is module scanning not supported properly in KeyDB?

This crash feels similar to this one here:

https://github.com/Snapchat/KeyDB/issues/452

I am guessing RedisModule_Scan runs in the background causing some kind of contention with the main thread.

keithchew commented 7 months ago

Since I am not using TLS, I have commented out tlsProcessPendingData:

void beforeSleep(struct aeEventLoop *eventLoop) {
    ...
    // tlsProcessPendingData();

from server.cpp

So far, haven't experienced the crash. Is this a bug in tlsProcessPendingData(), ie it thinks it has the global lock but it doesn't?