baidu / braft

An industrial-grade C++ implementation of RAFT consensus algorithm based on brpc, widely used inside Baidu to build highly-available distributed systems.
Apache License 2.0
3.95k stars 881 forks source link

node step_down的时候可能导致死锁 #323

Open lintanghui opened 3 years ago

lintanghui commented 3 years ago

死锁时候的栈

Thread 26 (Thread 0x7f20e3744700 (LWP 7233)):
#0  0x00007f20f015728d in nanosleep () at ../sysdeps/unix/syscall-template.S:84
#1  0x00007f20f0180dc4 in usleep (useconds=<optimized out>) at ../sysdeps/posix/usleep.c:32
#2  0x0000558cca6d5378 in bthread::TaskGroup::push_rq(unsigned long) ()
#3  0x0000558cca6d37bc in bthread::TaskGroup::ready_to_run(unsigned long, bool) ()
#4  0x0000558cca6d5ac8 in int bthread::TaskGroup::start_background<false>(unsigned long*, bthread_attr_t const*, void* (*)(void*), void*) ()
#5  0x0000558cca6c97db in bthread_start_background ()
#6  0x0000558cca4d5f4c in braft::run_closure_in_bthread_nosig(google::protobuf::Closure*, bool) ()
#7  0x0000558cca4df204 in braft::ClosureQueue::clear() ()
#8  0x0000558cca4e57a3 in braft::BallotBox::clear_pending_tasks() ()
#9  0x0000558cca46bf01 in braft::NodeImpl::step_down(long, bool, butil::Status const&) ()
#10 0x0000558cca4657d0 in braft::NodeImpl::check_dead_nodes(braft::Configuration const&, long) ()
#11 0x0000558cca465a85 in braft::NodeImpl::handle_stepdown_timeout() ()
#12 0x0000558cca476b94 in braft::StepdownTimer::run() ()
#13 0x0000558cca482833 in braft::RepeatedTimerTask::on_timedout() ()
#14 0x0000558cca482adc in braft::RepeatedTimerTask::run_on_timedout_in_new_thread(void*) ()
#15 0x0000558cca6d2700 in bthread::TaskGroup::task_runner(long) ()
#16 0x0000558cca6f8861 in bthread_make_fcontext ()
#17 0x0000000000000000 in ?? ()

Thread 25 (Thread 0x7f20e3f45700 (LWP 7232)):
#0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00007f20f1378bb5 in __GI___pthread_mutex_lock (mutex=0x558ccea9bb58) at ../nptl/pthread_mutex_lock.c:80
#2  0x0000558cca6e3d01 in pthread_mutex_lock ()
#3  0x0000558cc9e93ed2 in butil::Mutex::lock() ()
#4  0x0000558cc9e95990 in std::lock_guard<butil::Mutex>::lock_guard(butil::Mutex&) ()
#5  0x0000558cca490a7e in braft::NodeImpl::leader_id() ()
#6  0x0000558cca48f72b in braft::Node::leader_id() ()
#7  0x0000558cc9de18df in node::Replica::leader[abi:cxx11]() const ()
#8  0x0000558cc9eb2c72 in node::ApplyClosure::Run() ()
#9  0x0000558cca4d5dcd in braft::run_closure(void*) ()
#10 0x0000558cca6d2700 in bthread::TaskGroup::task_runner(long) ()
#11 0x0000558cca6f8861 in bthread_make_fcontext ()
#12 0x0000000000000000 in ?? ()

Thread 24 (Thread 0x7f20e4746700 (LWP 7231)):
#0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00007f20f1378bb5 in __GI___pthread_mutex_lock (mutex=0x558ccea9bb58) at ../nptl/pthread_mutex_lock.c:80
#2  0x0000558cca6e3d01 in pthread_mutex_lock ()
#3  0x0000558cc9e93ed2 in butil::Mutex::lock() ()
#4  0x0000558cc9e95990 in std::lock_guard<butil::Mutex>::lock_guard(butil::Mutex&) ()
#5  0x0000558cca490a7e in braft::NodeImpl::leader_id() ()
#6  0x0000558cca48f72b in braft::Node::leader_id() ()
#7  0x0000558cc9de18df in node::Replica::leader[abi:cxx11]() const ()
#8  0x0000558cc9eb2c72 in node::ApplyClosure::Run() ()
#9  0x0000558cca4d5dcd in braft::run_closure(void*) ()
#10 0x0000558cca6d2700 in bthread::TaskGroup::task_runner(long) ()
#11 0x0000558cca6f8861 in bthread_make_fcontext ()
#12 0x0000000000000000 in ?? ()

gdb 查看 0x558ccea9bb58 这个mutex的持有者是 LWP7233. 高并发的情况下,如果其他节点在获取leader_id(),那么当step_down的时候可能发生死锁。

具体出问题的地方如下

void NodeImpl::handle_stepdown_timeout() {
    BAIDU_SCOPED_LOCK(_mutex);
  // ...
}
void ClosureQueue::clear() {
    std::deque<Closure*> saved_queue;
    {
        BAIDU_SCOPED_LOCK(_mutex);
        saved_queue.swap(_queue);
        _first_index = 0;
    }
    bool run_bthread = false;
    for (std::deque<Closure*>::iterator 
            it = saved_queue.begin(); it != saved_queue.end(); ++it) {
        if (*it) {
            (*it)->status().set_error(EPERM, "leader stepped down");
            run_closure_in_bthread_nosig(*it, _usercode_in_pthread);  // 这里如果_rq满了会导致线程切出然后死锁
            run_bthread = true;
        }
    }

handle_stepdown_timeout的时候会首先添加一个mutex.然后在clear_pending_task的时候会创建bthread.如果这个时候_rq满了 会进入slepp重试。进入sleep的时候会导致线程写出从而没法切回来导致死锁。

总结: 在mutex内部创建bthread,如果负载比较高可能导致线程切出从而导致死锁