apache / doris

Apache Doris is an easy-to-use, high performance and unified analytics database.
https://doris.apache.org
Apache License 2.0
12.32k stars 3.21k forks source link

[Bug] BE crash in brpc #3529

Open morningman opened 4 years ago

morningman commented 4 years ago

Describe the bug

(gdb) where
#0  bthread::id_create_impl (id=id@entry=0x7f724e708ac0, data=data@entry=0x8918c788, on_error=on_error@entry=0x0,
    on_error2=on_error2@entry=0x241de40 <brpc::Controller::HandleSocketFailed(bthread_id_t, void*, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)>)
    at /var/local/incubator-doris/thirdparty/src/incubator-brpc-0.9.5/src/bthread/id.cpp:333
#1  0x00000000025916fd in bthread_id_create2 (id=id@entry=0x7f724e708ac0, data=data@entry=0x8918c788,
    on_error=on_error@entry=0x241de40 <brpc::Controller::HandleSocketFailed(bthread_id_t, void*, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)>)
    at /var/local/incubator-doris/thirdparty/src/incubator-brpc-0.9.5/src/bthread/id.cpp:693
#2  0x000000000241880d in brpc::Controller::call_id (this=this@entry=0x8918c788) at /var/local/incubator-doris/thirdparty/src/incubator-brpc-0.9.5/src/brpc/controller.cpp:1213
#3  0x00000000024142ed in brpc::Channel::CallMethod (this=0x1233f400, method=0x11dee400, controller_base=0x8918c788, request=0x9ae40520, response=0x8918c9d8, done=0x8918c780) at /var/local/incubator-doris/thirdparty/src/incubator-brpc-0.9.5/src/brpc/channel.cpp:394
#4  0x00000000013303bf in palo::PInternalService_Stub::transmit_data (this=<optimized out>, controller=0x8918c788, request=0x9ae40520, response=0x8918c9d8, done=0x8918c780)
    at /root/incubator-doris-DORIS-0.11.42-release/gensrc/build/gen_cpp/palo_internal_service.pb.cc:319
#5  0x00000000015cefa1 in doris::DataStreamSender::Channel::send_batch (this=0x9ae40420, batch=0x42d3cc00, eos=eos@entry=false) at /root/incubator-doris-DORIS-0.11.42-release/be/src/runtime/data_stream_sender.cpp:232
#6  0x00000000015cf701 in doris::DataStreamSender::send (this=0x42d3cb60, state=0x83274a00, batch=0x1ac3d600) at /root/incubator-doris-DORIS-0.11.42-release/be/src/runtime/data_stream_sender.cpp:452

Version: 0.11.42

hxianshun commented 4 years ago

image image

Is there any message above useful? Version 0.11.43 also appeared.

acelyc111 commented 4 years ago

I saw this stack again:

Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/home/work/app/doris/c3prc-bigbi/be/package/be/lib/palo_be'.
Program terminated with signal 11, Segmentation fault.
#0  bthread::id_create_impl (id=id@entry=0x7f796e924290, data=data@entry=0x24697af08, on_error=on_error@entry=0x0,
    on_error2=on_error2@entry=0x1b9fea0 <brpc::Controller::HandleSocketFailed(bthread_id_t, void*, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)>)
    at /root/doris/doris-dev/thirdparty/src/incubator-brpc-0.9.5/src/bthread/id.cpp:333
333 /root/doris/doris-dev/thirdparty/src/incubator-brpc-0.9.5/src/bthread/id.cpp: 没有那个文件或目录.
Missing separate debuginfos, use: debuginfo-install glibc-2.17-157.el7_3.1.x86_64 libgcc-4.8.5-28.el7_5.1.x86_64 zlib-1.2.7-17.el7.x86_64
(gdb) bt
#0  bthread::id_create_impl (id=id@entry=0x7f796e924290, data=data@entry=0x24697af08, on_error=on_error@entry=0x0,
    on_error2=on_error2@entry=0x1b9fea0 <brpc::Controller::HandleSocketFailed(bthread_id_t, void*, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)>)
    at /root/doris/doris-dev/thirdparty/src/incubator-brpc-0.9.5/src/bthread/id.cpp:333
#1  0x0000000001d1387d in bthread_id_create2 (id=id@entry=0x7f796e924290, data=data@entry=0x24697af08,
    on_error=on_error@entry=0x1b9fea0 <brpc::Controller::HandleSocketFailed(bthread_id_t, void*, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)>)
    at /root/doris/doris-dev/thirdparty/src/incubator-brpc-0.9.5/src/bthread/id.cpp:693
#2  0x0000000001b9a86d in brpc::Controller::call_id (this=this@entry=0x24697af08) at /root/doris/doris-dev/thirdparty/src/incubator-brpc-0.9.5/src/brpc/controller.cpp:1213
#3  0x0000000001b9634d in brpc::Channel::CallMethod (this=0x17cab800, method=0x19060800, controller_base=0x24697af08, request=0x9272a7e0, response=0x24697b158, done=0x24697af00)
    at /root/doris/doris-dev/thirdparty/src/incubator-brpc-0.9.5/src/brpc/channel.cpp:394
#4  0x00000000013659bf in palo::PInternalService_Stub::transmit_data (this=<optimized out>, controller=0x24697af08, request=0x9272a7e0, response=0x24697b158, done=0x24697af00) at /builds/olap/doris/gensrc/build/gen_cpp/palo_internal_service.pb.cc:319
#5  0x00000000015fb4a1 in doris::DataStreamSender::Channel::send_batch (this=this@entry=0x9272a6e0, batch=batch@entry=0x0, eos=eos@entry=true) at /builds/olap/doris/be/src/runtime/data_stream_sender.cpp:232
#6  0x00000000015fc03a in doris::DataStreamSender::Channel::close_internal (this=0x9272a6e0) at /builds/olap/doris/be/src/runtime/data_stream_sender.cpp:289
#7  0x00000000015fc215 in close (state=0x107ee300, this=<optimized out>) at /builds/olap/doris/be/src/runtime/data_stream_sender.cpp:296
#8  doris::DataStreamSender::close (this=0xe713380, state=0x107ee300, exec_status=...) at /builds/olap/doris/be/src/runtime/data_stream_sender.cpp:607
#9  0x00000000010208d3 in doris::PlanFragmentExecutor::open_internal (this=this@entry=0x13bfb05f0) at /builds/olap/doris/be/src/runtime/plan_fragment_executor.cpp:326
#10 0x0000000001020acc in doris::PlanFragmentExecutor::open (this=this@entry=0x13bfb05f0) at /builds/olap/doris/be/src/runtime/plan_fragment_executor.cpp:259
#11 0x0000000000fb1267 in doris::FragmentExecState::execute (this=0x13bfb0580) at /builds/olap/doris/be/src/runtime/fragment_mgr.cpp:211
#12 0x0000000000fb2d16 in doris::FragmentMgr::exec_actual(std::shared_ptr<doris::FragmentExecState>, std::function<void (doris::PlanFragmentExecutor*)>) (this=0x507fc00, exec_state=..., cb=...) at /builds/olap/doris/be/src/runtime/fragment_mgr.cpp:394
#13 0x0000000000fb96b8 in __invoke_impl<void, void (doris::FragmentMgr::*&)(std::shared_ptr<doris::FragmentExecState>, std::function<void(doris::PlanFragmentExecutor*)>), doris::FragmentMgr*&, std::shared_ptr<doris::FragmentExecState>&, std::function<void(doris::PlanFragmentExecutor*)>&> (__t=@0xa14c24f0: 0x507fc00, __f=
    @0xa14c24b0: (void (doris::FragmentMgr::*)(doris::FragmentMgr * const, std::shared_ptr<doris::FragmentExecState>, std::function<void(doris::PlanFragmentExecutor*)>)) 0xfb2cf0 <doris::FragmentMgr::exec_actual(std::shared_ptr<doris::FragmentExecState>, std::function<void (doris::PlanFragmentExecutor*)>)>) at /usr/include/c++/7.3.0/bits/invoke.h:73
#14 __invoke<void (doris::FragmentMgr::*&)(std::shared_ptr<doris::FragmentExecState>, std::function<void(doris::PlanFragmentExecutor*)>), doris::FragmentMgr*&, std::shared_ptr<doris::FragmentExecState>&, std::function<void(doris::PlanFragmentExecutor*)>&> (__fn=
    @0xa14c24b0: (void (doris::FragmentMgr::*)(doris::FragmentMgr * const, std::shared_ptr<doris::FragmentExecState>, std::function<void(doris::PlanFragmentExecutor*)>)) 0xfb2cf0 <doris::FragmentMgr::exec_actual(std::shared_ptr<doris::FragmentExecState>, std::function<void (doris::PlanFragmentExecutor*)>)>) at /usr/include/c++/7.3.0/bits/invoke.h:95
#15 __call<void, 0, 1, 2> (__args=..., this=0xa14c24b0) at /usr/include/c++/7.3.0/functional:632
#16 operator()<> (this=0xa14c24b0) at /usr/include/c++/7.3.0/functional:718
#17 boost::detail::function::void_function_obj_invoker0<std::_Bind_result<void, void (doris::FragmentMgr::*(doris::FragmentMgr*, std::shared_ptr<doris::FragmentExecState>, std::function<void (doris::PlanFragmentExecutor*)>))(std::shared_ptr<doris::FragmentExecState>, std::function<void (doris::PlanFragmentExecutor*)>)>, void>::invoke(boost::detail::function::function_buffer&) (function_obj_ptr=...) at /var/local/thirdparty/installed/include/boost/function/function_template.hpp:159
#18 0x0000000000fb24d4 in operator() (this=0x135b53560) at /var/local/thirdparty/installed/include/boost/function/function_template.hpp:759
#19 doris::fragment_executor (param=0x135b53560) at /builds/olap/doris/be/src/runtime/fragment_mgr.cpp:419
#20 0x00007f7ac21fbdc5 in start_thread () from /lib64/libpthread.so.0
#21 0x00007f7ac250773d in clone () from /lib64/libc.so.6
(gdb)
acelyc111 commented 4 years ago

A similar coredump:

(gdb) bt
#0  0x00007fafe031f1d7 in raise () from /lib64/libc.so.6
#1  0x00007fafe03208c8 in abort () from /lib64/libc.so.6
#2  0x000000000230f3b6 in google::DumpStackTraceAndExit () at src/utilities.cc:147
#3  0x00000000023066bd in google::LogMessage::Fail () at src/logging.cc:1599
#4  0x0000000002308544 in google::LogMessage::SendToLog (this=0x7faf5a0f28a0) at src/logging.cc:1553
#5  0x00000000023061e4 in google::LogMessage::Flush (this=0x7faf5a0f28a0) at src/logging.cc:1422
#6  0x0000000002308f79 in google::LogMessageFatal::~LogMessageFatal (this=<optimized out>, __in_chrg=<optimized out>) at src/logging.cc:2125
#7  0x000000000259b0a0 in bthread::id_create_impl (id=id@entry=0x7faf5a0f2900, data=data@entry=0x83f09408, on_error=on_error@entry=0x0,
    on_error2=on_error2@entry=0x2427bb0 <brpc::Controller::HandleSocketFailed(bthread_id_t, void*, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)>)
    at /var/local/incubator-doris/thirdparty/src/incubator-brpc-0.9.5/src/bthread/id.cpp:331
#8  0x000000000259b5cd in bthread_id_create2 (id=id@entry=0x7faf5a0f2900, data=data@entry=0x83f09408,
    on_error=on_error@entry=0x2427bb0 <brpc::Controller::HandleSocketFailed(bthread_id_t, void*, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)>)
    at /var/local/incubator-doris/thirdparty/src/incubator-brpc-0.9.5/src/bthread/id.cpp:693
#9  0x000000000242257d in brpc::Controller::call_id (this=this@entry=0x83f09408) at /var/local/incubator-doris/thirdparty/src/incubator-brpc-0.9.5/src/brpc/controller.cpp:1213
#10 0x000000000241e05d in brpc::Channel::CallMethod (this=0xcd14600, method=0x10bd2400, controller_base=0x83f09408, request=0x139d77180, response=0x83f09658, done=0x83f09400) at /var/local/incubator-doris/thirdparty/src/incubator-brpc-0.9.5/src/brpc/channel.cpp:394
#11 0x000000000134fbff in palo::PInternalService_Stub::transmit_data (this=<optimized out>, controller=0x83f09408, request=0x139d77180, response=0x83f09658, done=0x83f09400) at /builds/olap/doris/gensrc/build/gen_cpp/palo_internal_service.pb.cc:319
#12 0x00000000015d8a91 in doris::DataStreamSender::Channel::send_batch (this=this@entry=0x139d77080, batch=batch@entry=0x139d77138, eos=eos@entry=true) at /builds/olap/doris/be/src/runtime/data_stream_sender.cpp:232
#13 0x00000000015d8d64 in doris::DataStreamSender::Channel::send_current_batch (this=this@entry=0x139d77080, eos=eos@entry=true) at /builds/olap/doris/be/src/runtime/data_stream_sender.cpp:275
#14 0x00000000015d9661 in doris::DataStreamSender::Channel::close_internal (this=0x139d77080) at /builds/olap/doris/be/src/runtime/data_stream_sender.cpp:287
#15 0x00000000015d9805 in close (state=0x1712ed800, this=<optimized out>) at /builds/olap/doris/be/src/runtime/data_stream_sender.cpp:296
#16 doris::DataStreamSender::close (this=0x48cc6820, state=0x1712ed800, exec_status=...) at /builds/olap/doris/be/src/runtime/data_stream_sender.cpp:607
#17 0x0000000001054f13 in doris::PlanFragmentExecutor::open_internal (this=this@entry=0x863465f0) at /builds/olap/doris/be/src/runtime/plan_fragment_executor.cpp:351
#18 0x0000000001055114 in doris::PlanFragmentExecutor::open (this=this@entry=0x863465f0) at /builds/olap/doris/be/src/runtime/plan_fragment_executor.cpp:284
#19 0x0000000000fdc7d7 in doris::FragmentExecState::execute (this=0x86346580) at /builds/olap/doris/be/src/runtime/fragment_mgr.cpp:209
#20 0x0000000000fde5f6 in doris::FragmentMgr::exec_actual(std::shared_ptr<doris::FragmentExecState>, std::function<void (doris::PlanFragmentExecutor*)>) (this=0x6e9b180, exec_state=..., cb=...) at /builds/olap/doris/be/src/runtime/fragment_mgr.cpp:393
#21 0x0000000000fe4724 in operator() (a2=<error reading variable: access outside bounds of object referenced via synthetic pointer>, a1=..., p=<optimized out>, this=<optimized out>) at /var/local/thirdparty/installed/include/boost/bind/mem_fn_template.hpp:280
#22 operator()<boost::_mfi::mf2<void, doris::FragmentMgr, std::shared_ptr<doris::FragmentExecState>, std::function<void(doris::PlanFragmentExecutor*)> >, boost::_bi::list0> (a=<synthetic pointer>, f=..., this=<optimized out>)
    at /var/local/thirdparty/installed/include/boost/bind/bind.hpp:398
#23 operator() (this=<optimized out>) at /var/local/thirdparty/installed/include/boost/bind/bind.hpp:1294
#24 boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, boost::_mfi::mf2<void, doris::FragmentMgr, std::shared_ptr<doris::FragmentExecState>, std::function<void (doris::PlanFragmentExecutor*)> >, boost::_bi::list3<boost::_bi::value<doris::FragmentMgr*>, boost::_bi::value<std::shared_ptr<doris::FragmentExecState> >, boost::_bi::value<std::function<void (doris::PlanFragmentExecutor*)> > > >, void>::invoke(boost::detail::function::function_buffer&) (function_obj_ptr=...)
    at /var/local/thirdparty/installed/include/boost/function/function_template.hpp:159
#25 0x0000000000edc7e8 in operator() (this=0x7faf5a0f2fc0) at /var/local/thirdparty/installed/include/boost/function/function_template.hpp:759
#26 doris::ThreadPool::work_thread (this=0x6e9b200, thread_id=<optimized out>) at /builds/olap/doris/be/src/util/thread_pool.hpp:120
#27 0x0000000001a20a1d in thread_proxy ()
#28 0x00007fafe00d5dc5 in start_thread () from /lib64/libpthread.so.0
#29 0x00007fafe03e173d in clone () from /lib64/libc.so.6