Snapchat / KeyDB

A Multithreaded Fork of Redis
https://keydb.dev
BSD 3-Clause "New" or "Revised" License
11.34k stars 572 forks source link

[CRASH] prepareClientToWrite #867

Open keithchew opened 1 week ago

keithchew commented 1 week ago

Testing on async_flash branch, but this bug has been around before that:

7:63:M 25 Sep 2024 22:19:36.640 # === ASSERTION FAILED ===
7:63:M 25 Sep 2024 22:19:36.640 # ==> networking.cpp:300 'c->conn == nullptr || c->lock.fOwnLock()' is not true

------ STACK TRACE ------

Backtrace:
/opt/KeyDB/bin/keydb-server *:6379(prepareClientToWrite(client*)+0x139) [0x55bb6d4c2189]
/opt/KeyDB/bin/keydb-server *:6379(addReplyProto(client*, char const*, unsigned long)+0x17) [0x55bb6d4c2307]
/opt/KeyDB/bin/keydb-server *:6379(keysCommandCore(client*, redisDbPersistentDataSnapshot const*, char*, char*)+0x1a8) [0x55bb6d4db0d8]
/opt/KeyDB/bin/keydb-server *:6379(AsyncWorkQueue::WorkerThreadMain()+0x450) [0x55bb6d5d0110]
/lib/x86_64-linux-gnu/libstdc++.so.6(+0xd6df4) [0x7fbb43866df4]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8609) [0x7fbb4360b609]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x43) [0x7fbb4352e353]

Looking at the code, it looks like the assert check should be done after the fast exit checks? ie:

...
    if (!c->conn) return C_ERR; /* Fake client for AOF loading. */

    bool fAsync = !FCorrectThread(c);  <--- MOVE this from start of method above to here
...

This will not fix the crash if the fast exit checks pass, so will try it on my local first and see if the crash happens again.

keithchew commented 1 week ago

I can confirm that the above did not solve the root cause of the crash. I reviewed the code a bit more, and here is likely the correct fix in keysCommandCore() in db.cpp:

    aeAcquireLock();
    std::unique_lock<decltype(cIn->lock)> lock(cIn->lock); <---- ADD THIS
    addReplyProto(cIn, c->buf, c->bufpos);
...
    lock.unlock(); <---- ADD THIS
    aeReleaseLock();

The above will ensure the

c->lock.fOwnLock()

condition is satisfied. Will continue testing to make sure all is stable.

keithchew commented 1 day ago

After reviewing the code a bit more on locks, I have tidied up the above a little:

  AeLocker ae;
  std::unique_lock<decltype(cIn->lock)> lock(cIn->lock, std::defer_lock);
  bool fAsync = !FCorrectThread(cIn); 
  if (!fAsync) {
    lock.lock();
    ae.arm(c);
  } else {
    aeAcquireLock();
  }
...
  if (!fAsync) {
    ae.disarm();
    lock.unlock();
  } else {
    aeReleaseLock();
  }

Based on my testing, so far the keys command is solid and no crashes or deadlocks...