Icinga / icinga2

The core of our monitoring platform with a powerful configuration language and REST API.
https://icinga.com/docs/icinga2/latest
GNU General Public License v2.0
1.99k stars 573 forks source link

Signal handlers should only use async-signal-safe functions #9676

Open julianbrost opened 1 year ago

julianbrost commented 1 year ago

I just observed the following deadlock while testing #9653. In the following, you can see a nice stack trace were a signal handler is executed on a thread that is currently inside operator new (presumably holding a lock at that point) and the signal handler also does operator new and is waiting for a lock:

Thread 10 (Thread 0x7f0cfa752700 (LWP 94) "icinga2"):
#0  __lll_lock_wait_private (futex=futex@entry=0x7f0c7c000020) at ./lowlevellock.c:35
#1  0x00007f0cfcb9332b in __GI___libc_malloc (bytes=50) at malloc.c:3064
#2  0x00007f0cfcee70b5 in operator new(unsigned long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3  0x00007f0cfcf747b2 in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_assign(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00005557b175b4aa in icinga::Log::~Log() [clone .part.0] ()
#5  0x00005557b168bfcc in icinga::Application::SigUsr1Handler(int) ()
#6  <signal handler called>
#7  _int_malloc (av=av@entry=0x7f0c7c000020, bytes=bytes@entry=128) at malloc.c:4024
#8  0x00007f0cfcb93299 in __GI___libc_malloc (bytes=128) at malloc.c:3066
#9  0x00007f0cfcee70b5 in operator new(unsigned long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#10 0x00005557b1bc1852 in std::vector<std::vector<icinga::String, std::allocator<icinga::String> >, std::allocator<std::vector<icinga::String, std::allocator<icinga::String> > > >::vector(std::initializer_list<std::vector<icinga::String, std::allocator<icinga::String> > >, std::allocator<std::vector<icinga::String, std::allocator<icinga::String> > > const&) [clone .constprop.0] ()
#11 0x00005557b1b5eca8 in icinga::IcingaDB::UpdateState(boost::intrusive_ptr<icinga::Checkable> const&, icinga::IcingaDB::StateUpdate) ()
#12 0x00005557b1ba69a0 in boost::detail::function::void_function_obj_invoker3<icinga::IcingaDB::ConfigStaticInitialize()::{lambda(boost::intrusive_ptr<icinga::Checkable> const&, boost::intrusive_ptr<icinga::CheckResult> const&, boost::intrusive_ptr<icinga::MessageOrigin> const&)#8}, void, boost::intrusive_ptr<icinga::Checkable> const&, boost::intrusive_ptr<icinga::CheckResult> const&, boost::intrusive_ptr<icinga::MessageOrigin> const&>::invoke(boost::detail::function::function_buffer&, boost::intrusive_ptr<icinga::Checkable> const&, boost::intrusive_ptr<icinga::CheckResult> const&, boost::intrusive_ptr<icinga::MessageOrigin> const&) [clone .lto_priv.0] ()
#13 0x00005557b19ad4d5 in boost::signals2::detail::signal_impl<void (boost::intrusive_ptr<icinga::Checkable> const&, boost::intrusive_ptr<icinga::CheckResult> const&, boost::intrusive_ptr<icinga::MessageOrigin> const&), boost::signals2::optional_last_value<void>, int, std::less<int>, boost::function<void (boost::intrusive_ptr<icinga::Checkable> const&, boost::intrusive_ptr<icinga::CheckResult> const&, boost::intrusive_ptr<icinga::MessageOrigin> const&)>, boost::function<void (boost::signals2::connection const&, boost::intrusive_ptr<icinga::Checkable> const&, boost::intrusive_ptr<icinga::CheckResult> const&, boost::intrusive_ptr<icinga::MessageOrigin> const&)>, boost::signals2::mutex>::operator()(boost::intrusive_ptr<icinga::Checkable> const&, boost::intrusive_ptr<icinga::CheckResult> const&, boost::intrusive_ptr<icinga::MessageOrigin> const&) ()
#14 0x00005557b18ef58a in icinga::Checkable::ProcessCheckResult(boost::intrusive_ptr<icinga::CheckResult> const&, boost::intrusive_ptr<icinga::MessageOrigin> const&) ()
#15 0x00005557b1a07cbf in icinga::DummyCheckTask::ScriptFunc(boost::intrusive_ptr<icinga::Checkable> const&, boost::intrusive_ptr<icinga::CheckResult> const&, boost::intrusive_ptr<icinga::Dictionary> const&, bool) ()
#16 0x00005557b1a10d58 in std::_Function_handler<icinga::Value (std::vector<icinga::Value, std::allocator<icinga::Value> > const&), icinga::WrapFunction<void (*)(boost::intrusive_ptr<icinga::Checkable> const&, boost::intrusive_ptr<icinga::CheckResult> const&, boost::intrusive_ptr<icinga::Dictionary> const&, bool)>(void (*)(boost::intrusive_ptr<icinga::Checkable> const&, boost::intrusive_ptr<icinga::CheckResult> const&, boost::intrusive_ptr<icinga::Dictionary> const&, bool))::{lambda(std::vector<icinga::Value, std::allocator<icinga::Value> > const&)#1}>::_M_invoke(std::_Any_data const&, std::vector<icinga::Value, std::allocator<icinga::Value> > const&) ()
#17 0x00005557b16c4b6b in icinga::Function::Invoke(std::vector<icinga::Value, std::allocator<icinga::Value> > const&) ()
#18 0x00005557b18f2ff3 in icinga::CheckCommand::Execute(boost::intrusive_ptr<icinga::Checkable> const&, boost::intrusive_ptr<icinga::CheckResult> const&, boost::intrusive_ptr<icinga::Dictionary> const&, bool) ()
#19 0x00005557b18f3b1f in icinga::Checkable::ExecuteCheck() ()
#20 0x00005557b1a187bf in icinga::CheckerComponent::ExecuteCheckHelper(boost::intrusive_ptr<icinga::Checkable> const&) ()
#21 0x00005557b171deca in icinga::ThreadPool::Post<std::function<void ()> >(std::function<void ()>, icinga::SchedulerPolicy)::{lambda()#1}::operator()() const ()
#22 0x00005557b17475d7 in boost::asio::detail::executor_op<icinga::ThreadPool::Post<std::function<void ()> >(std::function<void ()>, icinga::SchedulerPolicy)::{lambda()#1}, std::allocator<void>, boost::asio::detail::scheduler_operation>::do_complete(void*, boost::asio::detail::scheduler_operation*, boost::system::error_code const&, unsigned long) ()
#23 0x00005557b1c2b1ef in boost::asio::detail::scheduler::run(boost::system::error_code&) [clone .constprop.0] ()
#24 0x00005557b1756a78 in boost::asio::detail::posix_thread::func<boost::asio::thread_pool::thread_function>::run() ()
#25 0x00005557b16815d0 in boost_asio_detail_posix_thread_function ()
#26 0x00007f0cfd015ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#27 0x00007f0cfcc08a2f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Here, the offending code is this: https://github.com/Icinga/icinga2/blob/c7301a06b633c08abe22c07cc5914c3a2a639fcd/lib/base/application.cpp#L714-L717

But other places are also affected: https://github.com/Icinga/icinga2/blob/c7301a06b633c08abe22c07cc5914c3a2a639fcd/lib/base/application.cpp#L436-L438 which is reachable from https://github.com/Icinga/icinga2/blob/c7301a06b633c08abe22c07cc5914c3a2a639fcd/lib/cli/daemoncommand.cpp#L404-L423

Similarly (here you may argue that we're crashing already, but this could potentially change a crash with an immediate restart by systemd for example to a hanging process): https://github.com/Icinga/icinga2/blob/c7301a06b633c08abe22c07cc5914c3a2a639fcd/lib/base/application.cpp#L727-L778

In general, the signal handlers should be kept minimal and may only use async-signal-safe functions.

yhabteab commented 9 months ago

One of the netways customers occasionally suffers from this problem in production, preventing IDO from performing any database queries.

Thread 1 (Thread 0x7fbeac0e9900 (LWP 99277)):
#0  0x00007fbea8aad7fc in __lll_lock_wait_private () from /lib64/libc.so.6
#1  0x00007fbea8a29ba2 in _L_lock_16654 () from /lib64/libc.so.6
#2  0x00007fbea8a267e3 in malloc () from /lib64/libc.so.6
#3  0x00007fbea92e618d in operator new(unsigned long) () from /lib64/libstdc++.so.6
#4  0x00007fbea9344cd9 in std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&) () from /lib64/libstdc++.so.6
#5  0x000000000096c7d7 in std::string::_S_construct<char const*> (__beg=__beg@entry=0x1052160 "Application", __end=__end@entry=0x105216b "", __a=...) at /opt/rh/devtoolset-11/root/usr/include/c++/11/bits/stl_iterator_base_funcs.h:138
#6  0x00000000009952df in _S_construct<char const*> (__a=..., __end=0x105216b "", __beg=0x1052160 "Application") at /opt/rh/devtoolset-11/root/usr/include/c++/11/bits/basic_string.tcc:595
#7  _S_construct_aux<char const*> (__a=..., __end=0x105216b "", __beg=0x1052160 "Application") at /opt/rh/devtoolset-11/root/usr/include/c++/11/bits/basic_string.h:5180
#8  _S_construct<char const*> (__a=..., __end=0x105216b "", __beg=0x1052160 "Application") at /opt/rh/devtoolset-11/root/usr/include/c++/11/bits/basic_string.h:5201
#9  basic_string<> (__a=..., __s=0x1052160 "Application", this=0x7ffe7b7301e8) at /opt/rh/devtoolset-11/root/usr/include/c++/11/bits/basic_string.h:3663
#10 String (data=0x1052160 "Application", this=0x7ffe7b7301e8) at /usr/src/debug/icinga2-2.14.0/lib/base/string.cpp:21
#11 icinga::Application::SigUsr1Handler(int) () at /usr/src/debug/icinga2-2.14.0/lib/base/application.cpp:717
#12 <signal handler called>
#13 0x00007fbea8a23276 in _int_malloc () from /lib64/libc.so.6
#14 0x00007fbea8a2678c in malloc () from /lib64/libc.so.6
#15 0x00007fbea92e618d in operator new(unsigned long) () from /lib64/libstdc++.so.6
#16 0x00007fbea92e6289 in operator new[](unsigned long) () from /lib64/libstdc++.so.6
Full output ```cxx Thread 1 (Thread 0x7fbeac0e9900 (LWP 99277)): #0 0x00007fbea8aad7fc in __lll_lock_wait_private () from /lib64/libc.so.6 #1 0x00007fbea8a29ba2 in _L_lock_16654 () from /lib64/libc.so.6 #2 0x00007fbea8a267e3 in malloc () from /lib64/libc.so.6 #3 0x00007fbea92e618d in operator new(unsigned long) () from /lib64/libstdc++.so.6 #4 0x00007fbea9344cd9 in std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator const&) () from /lib64/libstdc++.so.6 #5 0x000000000096c7d7 in std::string::_S_construct (__beg=__beg@entry=0x1052160 "Application", __end=__end@entry=0x105216b "", __a=...) at /opt/rh/devtoolset-11/root/usr/include/c++/11/bits/stl_iterator_base_funcs.h:138 #6 0x00000000009952df in _S_construct (__a=..., __end=0x105216b "", __beg=0x1052160 "Application") at /opt/rh/devtoolset-11/root/usr/include/c++/11/bits/basic_string.tcc:595 #7 _S_construct_aux (__a=..., __end=0x105216b "", __beg=0x1052160 "Application") at /opt/rh/devtoolset-11/root/usr/include/c++/11/bits/basic_string.h:5180 #8 _S_construct (__a=..., __end=0x105216b "", __beg=0x1052160 "Application") at /opt/rh/devtoolset-11/root/usr/include/c++/11/bits/basic_string.h:5201 #9 basic_string<> (__a=..., __s=0x1052160 "Application", this=0x7ffe7b7301e8) at /opt/rh/devtoolset-11/root/usr/include/c++/11/bits/basic_string.h:3663 #10 String (data=0x1052160 "Application", this=0x7ffe7b7301e8) at /usr/src/debug/icinga2-2.14.0/lib/base/string.cpp:21 #11 icinga::Application::SigUsr1Handler(int) () at /usr/src/debug/icinga2-2.14.0/lib/base/application.cpp:717 #12 #13 0x00007fbea8a23276 in _int_malloc () from /lib64/libc.so.6 #14 0x00007fbea8a2678c in malloc () from /lib64/libc.so.6 #15 0x00007fbea92e618d in operator new(unsigned long) () from /lib64/libstdc++.so.6 #16 0x00007fbea92e6289 in operator new[](unsigned long) () from /lib64/libstdc++.so.6 #17 0x00007fbea933e1ac in std::basic_filebuf >::_M_allocate_internal_buffer() () from /lib64/libstdc++.so.6 #18 0x00007fbea933e7f2 in std::basic_filebuf >::open(char const*, std::_Ios_Openmode) () from /lib64/libstdc++.so.6 #19 0x00000000009ea609 in open (__mode=17, __s=, this=0x22b2ee40) at /opt/rh/devtoolset-11/root/usr/include/c++/11/fstream:926 No locals. #20 icinga::FileLogger::ReopenLogFile() () at /usr/src/debug/icinga2-2.14.0/lib/base/filelogger.cpp:49 stream = 0x22b2ee40 path = {static NPos = 18446744073709551615, m_Data = "/var/log/icinga2/icinga2.log"} __PRETTY_FUNCTION__ = "void icinga::FileLogger::ReopenLogFile()" #21 0x0000000000a5e6a0 in operator() (this=) at /usr/include/boost169/boost/function/function_template.hpp:677 #22 m_invoke > (func=..., this=) at /usr/include/boost169/boost/signals2/detail/variadic_slot_invoker.hpp:117 #23 operator(), 0> (args=empty std::tuple, func=..., this=) at /usr/include/boost169/boost/signals2/detail/variadic_slot_invoker.hpp:90 #24 operator() >, boost::signals2::slot >, boost::signals2::mutex> > > (connectionBody=..., this=0x7ffe7b7314c0) at /usr/include/boost169/boost/signals2/detail/variadic_slot_invoker.hpp:133 #25 dereference (this=0x7ffe7b7312d0) at /usr/include/boost169/boost/signals2/detail/slot_call_iterator.hpp:110 #26 dereference, std::_List_iterator >, boost::signals2::slot >, boost::signals2::mutex> > >, boost::signals2::detail::connection_body >, boost::signals2::slot >, boost::signals2::mutex> > > (f=...) at /usr/include/boost169/boost/iterator/iterator_facade.hpp:550 #27 operator* (this=0x7ffe7b7312d0) at /usr/include/boost169/boost/iterator/iterator_facade.hpp:656 No locals. #28 operator(), std::_List_iterator >, boost::signals2::slot >, boost::signals2::mutex> > >, boost::signals2::detail::connection_body >, boost::signals2::slot >, boost::signals2::mutex> > > (last=..., this=, last=..., first=...) at /usr/include/boost169/boost/signals2/optional_last_value.hpp:57 #29 operator(), boost::signals2::detail::slot_call_iterator_t, std::_List_iterator >, boost::signals2::slot >, boost::signals2::mutex> > >, boost::signals2::detail::connection_body >, boost::signals2::slot >, boost::signals2::mutex> > > (last=..., first=..., combiner=..., this=) at /usr/include/boost169/boost/signals2/detail/result_type_wrapper.hpp:64 #30 boost::signals2::detail::signal_impl, int, std::less, boost::function, boost::function, boost::signals2::mutex>::operator()() (this=0x28b8220) at /usr/include/boost169/boost/signals2/detail/signal_template.hpp:243 #31 0x00000000009d3b9f in operator() (this=0x12b78f0 ) at /usr/include/boost169/boost/smart_ptr/shared_ptr.hpp:726 #32 icinga::Application::RunEventLoop() () at /usr/src/debug/icinga2-2.14.0/lib/base/application.cpp:334 #33 0x0000000000cde938 in icinga::IcingaApplication::Main() () at /usr/src/debug/icinga2-2.14.0/lib/icinga/icingaapplication.cpp:112 #34 0x0000000000974f7a in icinga::Application::Run (this=0x7fbe7800d980) at /usr/src/debug/icinga2-2.14.0/lib/base/application.cpp:1014 #35 0x0000000000bf02a2 in StartUnixWorker(std::vector > const&, bool, icinga::String const&) () at /usr/include/boost169/boost/smart_ptr/intrusive_ptr.hpp:197 #36 0x0000000000bf0f92 in icinga::DaemonCommand::Run(boost::program_options::variables_map const&, std::vector > const&) const () at /opt/rh/devtoolset-11/root/usr/include/c++/11/ext/new_allocator.h:89 #37 0x0000000000950928 in Main() () at /usr/include/boost169/boost/smart_ptr/intrusive_ptr.hpp:197 #38 0x000000000093e729 in main () at /usr/src/debug/icinga2-2.14.0/icinga-app/icinga.cpp:946 #39 0x00007fbea89c3555 in __libc_start_main () from /lib64/libc.so.6 #40 0x000000000094e9cb in _start () at /opt/rh/devtoolset-11/root/usr/include/c++/11/bits/alloc_traits.h:851 ```

ref/IP/48572

julianbrost commented 7 months ago

Random thought I just had: maybe using sigtimedwait() in the main loop could be an alternative to registering functions as signal handlers.