baidu / braft

An industrial-grade C++ implementation of RAFT consensus algorithm based on brpc, widely used inside Baidu to build highly-available distributed systems.
Apache License 2.0
3.84k stars 862 forks source link

NodeImpl::_mutex死锁 #456

Open amoxic opened 1 month ago

amoxic commented 1 month ago

braft 版本

commit id:3cae30fb67cb9e988650500522c6d64ae609f2aa

现象

braft内部线程都阻塞在对同一个NodeImpl::_mutex(地址 0x38a4b48) 加锁操作上,通过gdb查看锁的持有者,发现死锁了。 KxXRgJ5KjP

补充信息:

  1. 业务有线程分别阻塞在 Node::is_leader_lease_valid ,Node::apply ,Node::get_status调用,内部也是在等锁
  2. 主要堆栈

Thread 54 (Thread 0x7f5145ffb640 (LWP 67) "worker-0"):

0 0x00007f5204401560 in __lll_lock_wait () from /lib64/libc.so.6

1 0x00007f5204407c22 in pthread_mutex_lock@@GLIBC_2.2.5 () from /lib64/libc.so.6

2 0x00000000011d3d50 in butil::Mutex::lock (this=0x38a4b48) at /xenobi/xmake_globaldir/.xmake/packages/b/brpc/1.7.0/764e837c169a438686b3af9e2050506c/include/butil/synchronization/lock.h:69

3 std::unique_lock::lock (this=0x7f5145ff5560) at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/unique_lock.h:139

4 std::unique_lock::unique_lock (__m=..., this=0x7f5145ff5560) at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/unique_lock.h:69

5 braft::NodeImpl::apply (this=0x38a47f0, tasks=0x7f5145ff5610, size=1) at /xenobi/xmake_globaldir/.xmake/cache/packages/2406/b/braft/1.1.3/source/braft/src/braft/node.cpp:1991

6 0x00000000011d4416 in braft::NodeImpl::execute_applying_tasks (meta=0x38a47f0, iter=...) at /xenobi/xmake_globaldir/.xmake/cache/packages/2406/b/braft/1.1.3/source/braft/src/braft/node.cpp:668

7 0x0000000001260d9d in bthread::ExecutionQueueBase::_execute(bthread::TaskNode, bool, int) ()

8 0x000000000126311a in bthread::ExecutionQueueBase::start_execute(bthread::TaskNode*) ()

9 0x00000000011cfc81 in bthread::ExecutionQueue::execute (handle=0x0, options=0x230c082 , task=, this=) at /xenobi/xmake_globaldir/.xmake/packages/b/brpc/1.7.0/764e837c169a438686b3af9e2050506c/include/bthread/execution_queue_inl.h:338

10 0x00000000011736a9 in braft::Node::apply (this=, task=...) at /xenobi/xmake_globaldir/.xmake/cache/packages/2406/b/braft/1.1.3/source/braft/src/braft/raft.cpp:182

Thread 23 (Thread 0x7f51c77fe640 (LWP 36) "brpc_worker:11"):

0 0x00007f5204401560 in __lll_lock_wait () from /lib64/libc.so.6

1 0x00007f5204407c22 in pthread_mutex_lock@@GLIBC_2.2.5 () from /lib64/libc.so.6

2 0x00000000011ded08 in butil::Mutex::lock (this=0x38a4b48) at /xenobi/xmake_globaldir/.xmake/packages/b/brpc/1.7.0/764e837c169a438686b3af9e2050506c/include/butil/synchronization/lock.h:69

3 std::unique_lock::lock (this=0x7f5052aeac70) at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/unique_lock.h:139

4 std::unique_lock::unique_lock (__m=..., this=0x7f5052aeac70) at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/unique_lock.h:69

5 braft::NodeImpl::handle_append_entries_request (this=0x38a47f0, cntl=cntl@entry=0x7f517c51f780, request=request@entry=0x7f516c068080, response=response@entry=0x7f516c133bb0, done=done@entry=0x7f50700c5740, from_append_entries_cache=from_append_entries_cache@entry=false) at /xenobi/xmake_globaldir/.xmake/cache/packages/2406/b/braft/1.1.3/source/braft/src/braft/node.cpp:2357

6 0x00000000011efaa3 in braft::RaftServiceImpl::append_entries (this=, cntl_base=0x7f517c51f780, request=0x7f516c068080, response=0x7f516c133bb0, done=0x7f50700c5740) at /xenobi/xmake_globaldir/.xmake/packages/b/brpc/1.7.0/764e837c169a438686b3af9e2050506c/include/brpc/closure_guard.h:55

7 0x00000000012bb155 in brpc::policy::ProcessRpcRequest(brpc::InputMessageBase*) ()

8 0x00000000012b0557 in brpc::ProcessInputMessage(void*) ()

9 0x00000000012755af in bthread::TaskGroup::task_runner(long) ()

10 0x0000000001406e31 in bthread_make_fcontext ()

11 0x0000000000000000 in ?? ()

Thread 22 (Thread 0x7f51c7fff640 (LWP 35) "brpc_worker:10"):

0 0x00007f5204401560 in __lll_lock_wait () from /lib64/libc.so.6

1 0x00007f5204407c22 in pthread_mutex_lock@@GLIBC_2.2.5 () from /lib64/libc.so.6

2 0x00000000011ded08 in butil::Mutex::lock (this=0x38a4b48) at /xenobi/xmake_globaldir/.xmake/packages/b/brpc/1.7.0/764e837c169a438686b3af9e2050506c/include/butil/synchronization/lock.h:69

3 std::unique_lock::lock (this=0x7f50502dec70) at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/unique_lock.h:139

4 std::unique_lock::unique_lock (__m=..., this=0x7f50502dec70) at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/unique_lock.h:69

5 braft::NodeImpl::handle_append_entries_request (this=0x38a47f0, cntl=cntl@entry=0x7f51b81da110, request=request@entry=0x7f51ec067c80, response=response@entry=0x7f51b81464c0, done=done@entry=0x7f50641b5c10, from_append_entries_cache=from_append_entries_cache@entry=false) at /xenobi/xmake_globaldir/.xmake/cache/packages/2406/b/braft/1.1.3/source/braft/src/braft/node.cpp:2357

6 0x00000000011efaa3 in braft::RaftServiceImpl::append_entries (this=, cntl_base=0x7f51b81da110, request=0x7f51ec067c80, response=0x7f51b81464c0, done=0x7f50641b5c10) at /xenobi/xmake_globaldir/.xmake/packages/b/brpc/1.7.0/764e837c169a438686b3af9e2050506c/include/brpc/closure_guard.h:55

7 0x00000000012bb155 in brpc::policy::ProcessRpcRequest(brpc::InputMessageBase*) ()

8 0x00000000012b0557 in brpc::ProcessInputMessage(void*) ()

9 0x00000000012755af in bthread::TaskGroup::task_runner(long) ()

10 0x0000000001406e31 in bthread_make_fcontext ()

11 0x0000000000000000 in ?? ()

ergesun commented 3 weeks ago

@amoxic

image

请教下哪来的 1.1.3 版本啊?master 代码么?

amoxic commented 3 weeks ago

@amoxic image 请教下哪来的 1.1.3 版本啊?master 代码么?

@ergesun 不好意思,是我们自己打的tag,commit id是 3cae30fb67cb9e988650500522c6d64ae609f2aa