Open julianbrost opened 1 year ago
One of the netways customers occasionally suffers from this problem in production, preventing IDO from performing any database queries.
Thread 1 (Thread 0x7fbeac0e9900 (LWP 99277)):
#0 0x00007fbea8aad7fc in __lll_lock_wait_private () from /lib64/libc.so.6
#1 0x00007fbea8a29ba2 in _L_lock_16654 () from /lib64/libc.so.6
#2 0x00007fbea8a267e3 in malloc () from /lib64/libc.so.6
#3 0x00007fbea92e618d in operator new(unsigned long) () from /lib64/libstdc++.so.6
#4 0x00007fbea9344cd9 in std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&) () from /lib64/libstdc++.so.6
#5 0x000000000096c7d7 in std::string::_S_construct<char const*> (__beg=__beg@entry=0x1052160 "Application", __end=__end@entry=0x105216b "", __a=...) at /opt/rh/devtoolset-11/root/usr/include/c++/11/bits/stl_iterator_base_funcs.h:138
#6 0x00000000009952df in _S_construct<char const*> (__a=..., __end=0x105216b "", __beg=0x1052160 "Application") at /opt/rh/devtoolset-11/root/usr/include/c++/11/bits/basic_string.tcc:595
#7 _S_construct_aux<char const*> (__a=..., __end=0x105216b "", __beg=0x1052160 "Application") at /opt/rh/devtoolset-11/root/usr/include/c++/11/bits/basic_string.h:5180
#8 _S_construct<char const*> (__a=..., __end=0x105216b "", __beg=0x1052160 "Application") at /opt/rh/devtoolset-11/root/usr/include/c++/11/bits/basic_string.h:5201
#9 basic_string<> (__a=..., __s=0x1052160 "Application", this=0x7ffe7b7301e8) at /opt/rh/devtoolset-11/root/usr/include/c++/11/bits/basic_string.h:3663
#10 String (data=0x1052160 "Application", this=0x7ffe7b7301e8) at /usr/src/debug/icinga2-2.14.0/lib/base/string.cpp:21
#11 icinga::Application::SigUsr1Handler(int) () at /usr/src/debug/icinga2-2.14.0/lib/base/application.cpp:717
#12 <signal handler called>
#13 0x00007fbea8a23276 in _int_malloc () from /lib64/libc.so.6
#14 0x00007fbea8a2678c in malloc () from /lib64/libc.so.6
#15 0x00007fbea92e618d in operator new(unsigned long) () from /lib64/libstdc++.so.6
#16 0x00007fbea92e6289 in operator new[](unsigned long) () from /lib64/libstdc++.so.6
ref/IP/48572
Random thought I just had: maybe using sigtimedwait()
in the main loop could be an alternative to registering functions as signal handlers.
I just observed the following deadlock while testing #9653. In the following, you can see a nice stack trace were a signal handler is executed on a thread that is currently inside
operator new
(presumably holding a lock at that point) and the signal handler also doesoperator new
and is waiting for a lock:Here, the offending code is this: https://github.com/Icinga/icinga2/blob/c7301a06b633c08abe22c07cc5914c3a2a639fcd/lib/base/application.cpp#L714-L717
But other places are also affected: https://github.com/Icinga/icinga2/blob/c7301a06b633c08abe22c07cc5914c3a2a639fcd/lib/base/application.cpp#L436-L438 which is reachable from https://github.com/Icinga/icinga2/blob/c7301a06b633c08abe22c07cc5914c3a2a639fcd/lib/cli/daemoncommand.cpp#L404-L423
Similarly (here you may argue that we're crashing already, but this could potentially change a crash with an immediate restart by systemd for example to a hanging process): https://github.com/Icinga/icinga2/blob/c7301a06b633c08abe22c07cc5914c3a2a639fcd/lib/base/application.cpp#L727-L778
In general, the signal handlers should be kept minimal and may only use async-signal-safe functions.