chenshuo / muduo

Event-driven network library for multi-threaded Linux server in C++11
https://github.com/chenshuo/muduo
Other
14.7k stars 5.15k forks source link

在TcpClient析构时coredump #488

Open mlj1991 opened 3 years ago

mlj1991 commented 3 years ago

Linux distro and version? x86 or ARM? 32-bit or 64-bit?

x86 64-bit

Branch (cpp98/cpp11/cpp17) and version of muduo?

v1.1.0

我们在使用muduo时出现coredump的情况,排查以后发现是出现在TcpClient.cc的91行,如下图: image 这里有一个FIXME没有看明白是什么意思,所以想了解一下这个FIXME具体意义以及是否能修复

最终的问题出现在Channel.cc的71行,如下图: image 在tie_.lock()时出现coredump,应该是tie_成员变量已经不能正常访问了

具体堆栈信息如下: image

chenshuo commented 3 years ago

请提供完整能复现 coredump 的代码。

另外,你可以在 Channel 的析构函数里打印一下 this 指针,然后在 handleEvent 里同样打印 this 指针,看看 Channel 是不是在 handleEvent() 之前被析构了。

x724172556 commented 3 years ago

关于这个问题,我其实有一些想法,这个库,或者很多库,使用回调时,大多会遇到一个问题,就是绑定的对象的生命周期的问题, 我们必须非常小心的处理这些问题,比如,tcpclient->stop(),后,之后的stopinloop会在下个事件循环中执行(有的时候既有的接口无法取消这种回调),但是tcpclient在这个时候被销毁了,导致stopInLoop捆绑的指针是无效的,最终各种段错误,或者其他错误。。。既这种依赖关系反向了,,,,它期待父对象必须存在,,,,或者说这种延迟执行无法在对象销毁时合理的取消,它仍然会在稍后执行。。。。 我思考这个问题的解决方案,一个是全面引入智能指针,所有的回调的捆绑对象都应该是智能指针,不论是强制其不销毁的shared_ptr,还是可以判断对象是否已经销毁了的weak_ptr,这会减少很多这种问题 另一个方案是引入信号和槽这种机制(带生命周期监控的),如qt,或者sigc这种,他们能够监控绑定对象的生命周期,在绑定的对象呗销毁后,disconnect掉这种回调关系。 我在实际使用中,经常遇到类似的问题,其实本质上是,类,对象在销毁后,并没有完全清除所有和它相关联的资源,其他对象对其的依赖也包括在内

x724172556 commented 3 years ago

而大多数有事件循环的框架,不存在这种问题,是在于他们提供了对对象生命的监控,比如通过继承Object来使框架保持对自己的监控,通过Event驱动对象,如果Event指向的对象已经被销毁了,就不会分发这个事件了。总体来说,这种问题也算是一种bug吧

wensheng-zhang commented 2 years ago

在删除TcpClient同样也会遇到类似的问题: 最近再现率1/570

#0  0x00007f72221f8387 in raise () from /lib64/libc.so.6
#1  0x00007f72221f9a78 in abort () from /lib64/libc.so.6
#2  0x00007f72221f11a6 in __assert_fail_base () from /lib64/libc.so.6
#3  0x00007f72221f1252 in __assert_fail () from /lib64/libc.so.6
#4  0x00007f7222da9b37 in muduo::net::TcpConnection::~TcpConnection (this=0x11dcdd0, __in_chrg=<optimized out>) at TcpConnection.cc:71
#5  0x00007f7222db9092 in std::_Sp_counted_ptr<muduo::net::TcpConnection*, (__gnu_cxx::_Lock_policy)2>::_M_dispose (this=0x11dd320) at /usr/include/c++/4.8.2/bits/shared_ptr_base.h:290
#6  0x0000000000434a24 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x11dd320) at /usr/include/c++/4.8.2/bits/shared_ptr_base.h:144
#7  0x00000000004335d1 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x11d2428, __in_chrg=<optimized out>)
    at /usr/include/c++/4.8.2/bits/shared_ptr_base.h:546
#8  0x0000000000444d78 in std::__shared_ptr<muduo::net::TcpConnection, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x11d2420, __in_chrg=<optimized out>)
    at /usr/include/c++/4.8.2/bits/shared_ptr_base.h:781
#9  0x0000000000444db8 in std::shared_ptr<muduo::net::TcpConnection>::~shared_ptr (this=0x11d2420, __in_chrg=<optimized out>) at /usr/include/c++/4.8.2/bits/shared_ptr.h:93
#10 0x00007f7222dac2cc in std::_Head_base<0ul, std::shared_ptr<muduo::net::TcpConnection>, false>::~_Head_base (this=0x11d2420, __in_chrg=<optimized out>)
    at /usr/include/c++/4.8.2/tuple:129
#11 0x00007f7222dac2e6 in std::_Tuple_impl<0ul, std::shared_ptr<muduo::net::TcpConnection> >::~_Tuple_impl (this=0x11d2420, __in_chrg=<optimized out>) at /usr/include/c++/4.8.2/tuple:231
#12 0x00007f7222dac300 in std::tuple<std::shared_ptr<muduo::net::TcpConnection> >::~tuple (this=0x11d2420, __in_chrg=<optimized out>) at /usr/include/c++/4.8.2/tuple:390
#13 0x00007f7222dac31e in std::_Bind<std::function<void (std::shared_ptr<muduo::net::TcpConnection> const&)> (std::shared_ptr<muduo::net::TcpConnection>)>::~_Bind() (this=0x11d2400, 
    __in_chrg=<optimized out>) at /usr/include/c++/4.8.2/functional:1280
#14 0x00007f7222db0da7 in std::_Function_base::_Base_manager<std::_Bind<std::function<void (std::shared_ptr<muduo::net::TcpConnection> const&)> (std::shared_ptr<muduo::net::TcpConnection>)> >::_M_destroy(std::_Any_data&, std::integral_constant<bool, false>) (__victim=...) at /usr/include/c++/4.8.2/functional:1926
#15 0x00007f7222daf7a8 in std::_Function_base::_Base_manager<std::_Bind<std::function<void (std::shared_ptr<muduo::net::TcpConnection> const&)> (std::shared_ptr<muduo::net::TcpConnection>)> >::_M_manager(std::_Any_data&, std::_Any_data const&, std::_Manager_operation) (__dest=..., __source=..., __op=std::__destroy_functor) at /usr/include/c++/4.8.2/functional:1950
#16 0x000000000044499f in std::_Function_base::~_Function_base (this=0x11d90b0, __in_chrg=<optimized out>) at /usr/include/c++/4.8.2/functional:2030
#17 0x0000000000452428 in std::function<void ()>::~function() (this=0x11d90b0, __in_chrg=<optimized out>) at /usr/include/c++/4.8.2/functional:2174
#18 0x00007f7222dc1e9c in std::_Destroy<std::function<void ()> >(std::function<void ()>*) (__pointer=0x11d90b0) at /usr/include/c++/4.8.2/bits/stl_construct.h:93
#19 0x00007f7222dc1bfc in std::_Destroy_aux<false>::__destroy<std::function<void ()>*>(std::function<void ()>*, std::function<void ()>*) (__first=0x11d90b0, __last=0x11d9130)
    at /usr/include/c++/4.8.2/bits/stl_construct.h:103
#20 0x00007f7222dc16ab in std::_Destroy<std::function<void ()>*>(std::function<void ()>*, std::function<void ()>*) (__first=0x11d9090, __last=0x11d9130)
    at /usr/include/c++/4.8.2/bits/stl_construct.h:126
#21 0x00007f7222dc0e45 in std::_Destroy<std::function<void ()>*, std::function<void ()> >(std::function<void ()>*, std::function<void ()>*, std::allocator<std::function<void ()> >&) (
    __first=0x11d9090, __last=0x11d9130) at /usr/include/c++/4.8.2/bits/stl_construct.h:151
#22 0x00007f7222dc0529 in std::vector<std::function<void ()>, std::allocator<std::function<void ()> > >::~vector() (this=0x7ffc73df8960, __in_chrg=<optimized out>)
    at /usr/include/c++/4.8.2/bits/stl_vector.h:415
#23 0x00007f7222dbfd41 in muduo::net::EventLoop::doPendingFunctors (this=0x7ffc73dfaa00) at EventLoop.cc:268
#24 0x00007f7222dbf1fb in muduo::net::EventLoop::loop (this=0x7ffc73dfaa00) at EventLoop.cc:129
#25 0x000000000043f07a in main (argc=1, argv=0x7ffc73dfacc8) at main.cpp:56
hevake commented 1 year ago

针对这个问题,我在 cpp-tbox 中使用了比较另类的方法。 如果在一个对象的回调中要释放自己。它不能直接 delete,而是委托给 loop 在下一个事件中去做。

auto tobe_delete = object_ptr_;
obejct_ptr_ = nullptr;
loop->runInLoop([tobe_delete] {delete tobe_delete;});

如果是智能指针的话,就:

auto tobe_delete = std::move(object_ptr_);
loop->runInLoop([tobe_delete] {});

在我的开源项目 cpp-tbox 中,有大量的使用。 如 https://github.com/hevake/cpp-tbox/blob/master/modules/network/tcp_server.cpp 中的 L203:

void TcpServer::onTcpDisconnected(const ConnToken &client)
{
    ++d_->cb_level;
    if (d_->disconnected_cb)
        d_->disconnected_cb(client);
    --d_->cb_level;

    TcpConnection *conn = d_->conns.free(client);
    d_->wp_loop->runNext([conn] { CHECK_DELETE_OBJ(conn); });
    //! 为什么先回调,再访问后面?是为了在回调中还能访问到TcpConnection对象
}
hevake commented 1 year ago

另外,我在大部分的事件对象中都有加 cblevel,用于监控是否存在在自己的回调中析构自己的情况。 在回调的时候:

++cb_level_;
cb();
--cb_level_;

在析构的时候,就检查cb_level_是否为0。如果为0,则表示发生了这情况。就报断言。 如:https://github.com/hevake/cpp-tbox/blob/master/modules/event/timer_event_impl.cpp