Icinga / icinga2

The core of our monitoring platform with a powerful configuration language and REST API.
https://icinga.com/docs/icinga2/latest
GNU General Public License v2.0
2.03k stars 578 forks source link

Thread leak in API with unclosed sockets #5148

Closed julianbrost closed 6 years ago

julianbrost commented 7 years ago

When doing lots of requests to the API using the command check_http -H localhost -p 5665 -S -e '401 Unauthorized' -N, the icinga2 process leaks threads over time until it reaches the process limit for its user and then either segfaults or aborts (I have seen both so far).

Expected Behavior

Icinga should handle the requests an not leak threads.

Current Behavior

Icinga leaks threads over time. The leaked threads look like this in gdb (for Icinga 2.6.3):

Thread 68 (Thread 0x7f52e4fa8700 (LWP 18968)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007f52ec29812b in boost::condition_variable::wait (this=this@entry=0x7f52dc065f68, m=...) at /usr/include/boost/thread/pthread/condition_variable.hpp:73
#2  0x00007f52ec227f5b in icinga::WorkQueue::Join (this=this@entry=0x7f52dc065e78, stop=stop@entry=true) at /build/icinga2-2.6.3/lib/base/workqueue.cpp:112
#3  0x00007f52ec25c2d2 in icinga::WorkQueue::~WorkQueue (this=0x7f52dc065e78, __in_chrg=<optimized out>) at /build/icinga2-2.6.3/lib/base/workqueue.cpp:48
#4  0x00007f52eb980806 in ~HttpServerConnection (this=0x7f52dc065dd0, __in_chrg=<optimized out>) at /build/icinga2-2.6.3/lib/remote/httpserverconnection.hpp:37
#5  icinga::HttpServerConnection::~HttpServerConnection (this=0x7f52dc065dd0, __in_chrg=<optimized out>) at /build/icinga2-2.6.3/lib/remote/httpserverconnection.hpp:37
#6  0x00007f52eb980a17 in ~intrusive_ptr (this=0x7f52dc060af0, __in_chrg=<optimized out>) at /usr/include/boost/smart_ptr/intrusive_ptr.hpp:97
#7  ~value (this=0x7f52dc060af0, __in_chrg=<optimized out>) at /usr/include/boost/bind/bind.hpp:112
#8  ~storage1 (this=0x7f52dc060af0, __in_chrg=<optimized out>) at /usr/include/boost/bind/storage.hpp:40
#9  boost::_bi::storage2<boost::_bi::value<boost::intrusive_ptr<icinga::HttpServerConnection> >, boost::_bi::value<icinga::HttpRequest> >::~storage2 (this=this@entry=0x7f52dc060af0, __in_chrg=<optimized out>) at /usr/include/boost/bind/storage.hpp:76
#10 0x00007f52eb980ac1 in ~list2 (this=0x7f52dc060af0, __in_chrg=<optimized out>) at /usr/include/boost/bind/bind.hpp:275
#11 ~bind_t (this=0x7f52dc060ae0, __in_chrg=<optimized out>) at /usr/include/boost/bind/bind.hpp:864
#12 manager (op=<optimized out>, out_buffer=..., in_buffer=...) at /usr/include/boost/function/function_base.hpp:389
#13 manager (op=<optimized out>, out_buffer=..., in_buffer=...) at /usr/include/boost/function/function_base.hpp:412
#14 boost::detail::function::functor_manager<boost::_bi::bind_t<void, boost::_mfi::mf1<void, icinga::HttpServerConnection, icinga::HttpRequest&>, boost::_bi::list2<boost::_bi::value<boost::intrusive_ptr<icinga::HttpServerConnection> >, boost::_bi::value<icinga::HttpRequest> > > >::manage (in_buffer=..., out_buffer=..., op=<optimized out>) at /usr/include/boost/function/function_base.hpp:440
#15 0x00007f52ec28baea in clear (this=<optimized out>, functor=...) at /usr/include/boost/function/function_template.hpp:504
#16 boost::function0<void>::clear (this=0x7f52dc065f94) at /usr/include/boost/function/function_template.hpp:877
#17 0x00007f52ec2389e6 in ~function0 (this=0x7f52e4fa7c70, __in_chrg=<optimized out>) at /usr/include/boost/function/function_template.hpp:759
#18 ~function (this=0x7f52e4fa7c70, __in_chrg=<optimized out>) at /usr/include/boost/function/function_template.hpp:1048
#19 operator= (f=<unknown type in /usr/lib/debug/.build-id/38/1781c10d7937a5e6d15b7b791c33e01f8e9cdb.debug, CU 0xaaed, DIE 0x2c41bd>, this=0x7f52e4fa7c90) at /usr/include/boost/function/function_template.hpp:1108
#20 operator= (this=0x7f52e4fa7c90) at /build/icinga2-2.6.3/lib/base/workqueue.hpp:43
#21 icinga::WorkQueue::WorkerThreadProc (this=0x7f52dc065e78) at /build/icinga2-2.6.3/lib/base/workqueue.cpp:248
#22 0x00007f52ecf1eaea in boost::(anonymous namespace)::thread_proxy (param=<optimized out>) at libs/thread/src/pthread/thread.cpp:164
#23 0x00007f52ec56c064 in start_thread (arg=0x7f52e4fa8700) at pthread_create.c:309
#24 0x00007f52e9b3362d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

In most cases (4 out of 5 so far), this lead to a segfault at some point: icinga2[16628]: segfault at ffffffffffffffff ip ffffffffffffffff sp 00002acd7791a2a8 error 15. In one case, Icinga aborted with the following log messages:

icinga2.log in case of SIGABRT (Icinga 2.6.2) ``` [2017-04-06 02:40:50 +0200] warning/HttpServerConnection: Error while reading Http request: Error: boost::thread_resource_error: Resource temporarily unavailable (0) libbase.so.2.6.2: (+0xdb790) [0x7fb77da9d790] (1) libbase.so.2.6.2: boost::thread::start_thread() (+0x64) [0x7fb77da9da94] (2) libbase.so.2.6.2: icinga::WorkQueue::Enqueue(boost::function&&, icinga::WorkQueuePriority, bool) (+0x625) [0x7fb77da3df05] (3) libremote.so.2.6.2: icinga::HttpServerConnection::ProcessMessage() (+0x314) [0x7fb77d1675b4] (4) libremote.so.2.6.2: icinga::HttpServerConnection::DataAvailableHandler() (+0x98) [0x7fb77d167d78] (5) libremote.so.2.6.2: icinga::HttpServerConnection::Start() (+0x19c) [0x7fb77d16808c] (6) libremote.so.2.6.2: icinga::ApiListener::NewClientHandlerInternal(boost::intrusive_ptr const&, icinga::String const&, icinga::ConnectionRole) (+0x4d9) [0x7fb77d178b09] (7) libremote.so.2.6.2: icinga::ApiListener::NewClientHandler(boost::intrusive_ptr const&, icinga::String const&, icinga::ConnectionRole) (+0x23) [0x7fb77d179f23] (8) libboost_thread.so.1.55.0: (+0xdaea) [0x7fb77e724aea] (9) libpthread.so.0: (+0x8064) [0x7fb77dd72064] (10) libc.so.6: clone (+0x6d) [0x7fb77b33a62d] (0) Handling new API client connection Context: (0) Handling new API client connection [2017-04-06 02:40:50 +0200] information/ApiListener: New client connection from [::1]:59250 (no client certificate) [2017-04-06 02:40:50 +0200] warning/HttpServerConnection: Error while reading Http request: Error: boost::thread_resource_error: Resource temporarily unavailable (0) libbase.so.2.6.2: (+0xdb790) [0x7fb77da9d790] (1) libbase.so.2.6.2: boost::thread::start_thread() (+0x64) [0x7fb77da9da94] (2) libbase.so.2.6.2: icinga::WorkQueue::Enqueue(boost::function&&, icinga::WorkQueuePriority, bool) (+0x625) [0x7fb77da3df05] (3) libremote.so.2.6.2: icinga::HttpServerConnection::ProcessMessage() (+0x314) [0x7fb77d1675b4] (4) libremote.so.2.6.2: icinga::HttpServerConnection::DataAvailableHandler() (+0x98) [0x7fb77d167d78] (5) libremote.so.2.6.2: icinga::HttpServerConnection::Start() (+0x19c) [0x7fb77d16808c] (6) libremote.so.2.6.2: icinga::ApiListener::NewClientHandlerInternal(boost::intrusive_ptr const&, icinga::String const&, icinga::ConnectionRole) (+0x4d9) [0x7fb77d178b09] (7) libremote.so.2.6.2: icinga::ApiListener::NewClientHandler(boost::intrusive_ptr const&, icinga::String const&, icinga::ConnectionRole) (+0x23) [0x7fb77d179f23] (8) libboost_thread.so.1.55.0: (+0xdaea) [0x7fb77e724aea] (9) libpthread.so.0: (+0x8064) [0x7fb77dd72064] (10) libc.so.6: clone (+0x6d) [0x7fb77b33a62d] (0) Handling new API client connection Context: (0) Handling new API client connection [2017-04-06 02:40:50 +0200] information/ApiListener: New client connection from [::1]:59252 (no client certificate) [2017-04-06 02:40:50 +0200] warning/HttpServerConnection: Error while reading Http request: Error: boost::thread_resource_error: Resource temporarily unavailable (0) libbase.so.2.6.2: (+0xdb790) [0x7fb77da9d790] (1) libbase.so.2.6.2: boost::thread::start_thread() (+0x64) [0x7fb77da9da94] (2) libbase.so.2.6.2: icinga::WorkQueue::Enqueue(boost::function&&, icinga::WorkQueuePriority, bool) (+0x625) [0x7fb77da3df05] (3) libremote.so.2.6.2: icinga::HttpServerConnection::ProcessMessage() (+0x314) [0x7fb77d1675b4] (4) libremote.so.2.6.2: icinga::HttpServerConnection::DataAvailableHandler() (+0x98) [0x7fb77d167d78] (5) libremote.so.2.6.2: icinga::HttpServerConnection::Start() (+0x19c) [0x7fb77d16808c] (6) libremote.so.2.6.2: icinga::ApiListener::NewClientHandlerInternal(boost::intrusive_ptr const&, icinga::String const&, icinga::ConnectionRole) (+0x4d9) [0x7fb77d178b09] (7) libremote.so.2.6.2: icinga::ApiListener::NewClientHandler(boost::intrusive_ptr const&, icinga::String const&, icinga::ConnectionRole) (+0x23) [0x7fb77d179f23] (8) libboost_thread.so.1.55.0: (+0xdaea) [0x7fb77e724aea] (9) libpthread.so.0: (+0x8064) [0x7fb77dd72064] (10) libc.so.6: clone (+0x6d) [0x7fb77b33a62d] (0) Handling new API client connection Context: (0) Handling new API client connection [2017-04-06 02:40:51 +0200] information/ApiListener: New client connection from [::1]:59256 (no client certificate) [2017-04-06 02:40:51 +0200] warning/HttpServerConnection: Error while reading Http request: Error: boost::thread_resource_error: Resource temporarily unavailable (0) libbase.so.2.6.2: (+0xdb790) [0x7fb77da9d790] (1) libbase.so.2.6.2: boost::thread::start_thread() (+0x64) [0x7fb77da9da94] (2) libbase.so.2.6.2: icinga::WorkQueue::Enqueue(boost::function&&, icinga::WorkQueuePriority, bool) (+0x625) [0x7fb77da3df05] (3) libremote.so.2.6.2: icinga::HttpServerConnection::ProcessMessage() (+0x314) [0x7fb77d1675b4] (4) libremote.so.2.6.2: icinga::HttpServerConnection::DataAvailableHandler() (+0x98) [0x7fb77d167d78] (5) libremote.so.2.6.2: icinga::HttpServerConnection::Start() (+0x19c) [0x7fb77d16808c] (6) libremote.so.2.6.2: icinga::ApiListener::NewClientHandlerInternal(boost::intrusive_ptr const&, icinga::String const&, icinga::ConnectionRole) (+0x4d9) [0x7fb77d178b09] (7) libremote.so.2.6.2: icinga::ApiListener::NewClientHandler(boost::intrusive_ptr const&, icinga::String const&, icinga::ConnectionRole) (+0x23) [0x7fb77d179f23] (8) libboost_thread.so.1.55.0: (+0xdaea) [0x7fb77e724aea] (9) libpthread.so.0: (+0x8064) [0x7fb77dd72064] (10) libc.so.6: clone (+0x6d) [0x7fb77b33a62d] (0) Handling new API client connection Context: (0) Handling new API client connection [2017-04-06 02:40:51 +0200] information/ApiListener: New client connection from [::1]:59258 (no client certificate) [2017-04-06 02:40:51 +0200] warning/HttpServerConnection: Error while reading Http request: Error: boost::thread_resource_error: Resource temporarily unavailable (0) libbase.so.2.6.2: (+0xdb790) [0x7fb77da9d790] (1) libbase.so.2.6.2: boost::thread::start_thread() (+0x64) [0x7fb77da9da94] (2) libbase.so.2.6.2: icinga::WorkQueue::Enqueue(boost::function&&, icinga::WorkQueuePriority, bool) (+0x625) [0x7fb77da3df05] (3) libremote.so.2.6.2: icinga::HttpServerConnection::ProcessMessage() (+0x314) [0x7fb77d1675b4] (4) libremote.so.2.6.2: icinga::HttpServerConnection::DataAvailableHandler() (+0x98) [0x7fb77d167d78] (5) libremote.so.2.6.2: icinga::HttpServerConnection::Start() (+0x19c) [0x7fb77d16808c] (6) libremote.so.2.6.2: icinga::ApiListener::NewClientHandlerInternal(boost::intrusive_ptr const&, icinga::String const&, icinga::ConnectionRole) (+0x4d9) [0x7fb77d178b09] (7) libremote.so.2.6.2: icinga::ApiListener::NewClientHandler(boost::intrusive_ptr const&, icinga::String const&, icinga::ConnectionRole) (+0x23) [0x7fb77d179f23] (8) libboost_thread.so.1.55.0: (+0xdaea) [0x7fb77e724aea] (9) libpthread.so.0: (+0x8064) [0x7fb77dd72064] (10) libc.so.6: clone (+0x6d) [0x7fb77b33a62d] (0) Handling new API client connection Context: (0) Handling new API client connection [2017-04-06 02:40:51 +0200] critical/Application: Icinga 2 has terminated unexpectedly. Additional information can be found in '/var/log/icinga2/crash/report.1491439251.043945' [2017-04-06 02:40:51 +0200] critical/checker: Exception occured while checking 'somehost': Error: parse error: premature EOF (right here) ------^ (0) libbase.so.2.6.2: (+0xde377) [0x7fb77daa0377] (1) libbase.so.2.6.2: (+0xde437) [0x7fb77daa0437] (2) libbase.so.2.6.2: icinga::JsonDecode(icinga::String const&) (+0x3c6) [0x7fb77da287b6] (3) libbase.so.2.6.2: icinga::Process::Run(boost::function const&) (+0x43d) [0x7fb77da5eaed] (4) libicinga.so.2.6.2: icinga::PluginUtility::ExecuteCommand(boost::intrusive_ptr const&, boost::intrusive_ptr const&, boost::intrusive_ptr const&, std::vector >, std::allocator > > > const&, boost::intrusive_ptr const&, bool, boost::function const&) (+0x7f3) [0x7fb77842cd03] (5) libmethods.so.2.6.2: icinga::PluginCheckTask::ScriptFunc(boost::intrusive_ptr const&, boost::intrusive_ptr const&, boost::intrusive_ptr const&, bool) (+0x5e3) [0x7fb7780b4f83] (6) libmethods.so.2.6.2: (+0x1cd50) [0x7fb7780bcd50] (7) libmethods.so.2.6.2: (+0x1849d) [0x7fb7780b849d] (8) libbase.so.2.6.2: icinga::Function::Invoke(std::vector > const&) (+0x4a) [0x7fb77da3039a] (9) libicinga.so.2.6.2: icinga::CheckCommand::Execute(boost::intrusive_ptr const&, boost::intrusive_ptr const&, boost::intrusive_ptr const&, bool) (+0x153) [0x7fb7783dda23] (10) libicinga.so.2.6.2: icinga::Checkable::ExecuteCheck() (+0x58f) [0x7fb77845eedf] (11) libchecker.so.2.6.2: (+0x1638b) [0x7fb7639a638b] (12) libbase.so.2.6.2: icinga::ThreadPool::WorkerThread::ThreadProc(icinga::ThreadPool::Queue&) (+0x369) [0x7fb77da50869] (13) libboost_thread.so.1.55.0: (+0xdaea) [0x7fb77e724aea] (14) libpthread.so.0: (+0x8064) [0x7fb77dd72064] (15) libc.so.6: clone (+0x6d) [0x7fb77b33a62d] (0) Executing check for object 'somehost' [2017-04-06 02:40:51 +0200] warning/Process: PID 9800 ('/usr/lib/nagios/plugins/check_ping' '-4' '-H' 'someaddr' '-c' '200,15%' '-w' '100,5%') died mysteriously: waitpid failed [2017-04-06 02:40:51 +0200] warning/PluginCheckTask: Check command for object 'somehost!ping4' (PID: 9800, arguments: '/usr/lib/nagios/plugins/check_ping' '-4' '-H' 'someaddr' '-c' '200,15%' '-w' '100,5%') terminated with exit code 128, output: PING OK - Packet loss = 0%, RTA = 0.86 ms|rta=0.857000ms;100.000000;200.000000;0.000000 pl=0%;5;15;0 [2017-04-06 02:40:51 +0200] warning/Process: PID 9838 ('/usr/lib/nagios/plugins/check_ping' '-6' '-H' 'someaddr' '-c' '200,15%' '-w' '100,5%') died mysteriously: waitpid failed [2017-04-06 02:40:51 +0200] warning/PluginCheckTask: Check command for object 'somehost!ping6' (PID: 9838, arguments: '/usr/lib/nagios/plugins/check_ping' '-6' '-H' 'someaddr' '-c' '200,15%' '-w' '100,5%') terminated with exit code 128, output: PING OK - Packet loss = 0%, RTA = 1.21 ms|rta=1.205000ms;100.000000;200.000000;0.000000 pl=0%;5;15;0 [2017-04-06 02:40:51 +0200] warning/Process: PID 9940 ('/usr/lib/nagios/plugins/check_ping' '-H' 'someaddr' '-c' '5000,100%' '-w' '3000,80%') died mysteriously: waitpid failed [2017-04-06 02:40:51 +0200] warning/PluginCheckTask: Check command for object 'somehost' (PID: 9940, arguments: '/usr/lib/nagios/plugins/check_ping' '-H' 'someaddr' '-c' '5000,100%' '-w' '3000,80%') terminated with exit code 128, output: PING OK - Packet loss = 0%, RTA = 1.36 ms|rta=1.362000ms;3000.000000;5000.000000;0.000000 pl=0%;80;100;0 ```
report.1491439251.043945 ``` Application information: Application version: r2.6.2-1 Installation root: /usr Sysconf directory: /etc Run directory: /run Local state directory: /var Package data directory: /usr/share/icinga2 State path: /var/lib/icinga2/icinga2.state Modified attributes path: /var/lib/icinga2/modified-attributes.conf Objects path: /var/cache/icinga2/icinga2.debug Vars path: /var/cache/icinga2/icinga2.vars PID path: /run/icinga2/icinga2.pid System information: Platform: Debian GNU/Linux Platform version: 8 (jessie) Kernel: Linux Kernel version: 4.9.13 Architecture: x86_64 Build information: Compiler: GNU 4.9.2 Build host: smithers Stacktrace: (0) libpthread.so.0: (+0xf890) [0x7fb77dd79890] (1) libc.so.6: gsignal (+0x37) [0x7fb77b287067] (2) libc.so.6: abort (+0x148) [0x7fb77b288448] (3) libc.so.6: (+0x2e266) [0x7fb77b280266] (4) libc.so.6: (+0x2e312) [0x7fb77b280312] (5) libbase.so.2.6.2: (+0x4be30) [0x7fb77da0de30] (6) libbase.so.2.6.2: icinga::ObjectLock::LockMutex(icinga::Object const*) (+0x9c) [0x7fb77da9ddac] (7) libbase.so.2.6.2: icinga::Dictionary::Get(icinga::String const&) const (+0x38) [0x7fb77da1f078] (8) libremote.so.2.6.2: icinga::HttpRequest::Parse(icinga::StreamReadContext&, bool) (+0x433) [0x7fb77d1669a3] (9) libremote.so.2.6.2: icinga::HttpServerConnection::ProcessMessage() (+0x3c) [0x7fb77d1672dc] (10) libremote.so.2.6.2: icinga::HttpServerConnection::DataAvailableHandler() (+0x98) [0x7fb77d167d78] (11) libbase.so.2.6.2: (+0x11a3bc) [0x7fb77dadc3bc] (12) libbase.so.2.6.2: icinga::Stream::SignalDataAvailable() (+0x40) [0x7fb77da7f370] (13) libbase.so.2.6.2: icinga::TlsStream::OnEvent(int) (+0x298) [0x7fb77da7faf8] (14) libbase.so.2.6.2: icinga::SocketEventEngineEpoll::ThreadProc(int) (+0x42b) [0x7fb77da4f33b] (15) libboost_thread.so.1.55.0: (+0xdaea) [0x7fb77e724aea] (16) libpthread.so.0: (+0x8064) [0x7fb77dd72064] (17) libc.so.6: clone (+0x6d) [0x7fb77b33a62d] *** * This would indicate a runtime problem or configuration error. If you believe this is a bug in Icinga 2 * please submit a bug report at https://github.com/Icinga/icinga2 and include this stack trace as well as any other * information that might be useful in order to reproduce this problem. *** Failed to launch GDB: No such file or directory ```

I suspect both behaviors are only a symptom of using up the ulimit for allowed processes.

Steps to Reproduce (for bugs)

  1. apt install icinga2
  2. icinga2 api setup
  3. systemctl restart icinga2
  4. Check the number of icinga2 threads: for pid in $(pidof icinga2); do ps -T -p $pid | grep -F icinga2; done | wc -l (16 in my case)
  5. Send lots of requests to the API: while :; do /usr/lib/nagios/plugins/check_http -H localhost -p 5665 -S -e '401 Unauthorized' -N > /dev/null; done (let this run for some time, see below)
  6. Again, check the number of icinga2 threads: for pid in $(pidof icinga2); do ps -T -p $pid | grep -F icinga2; done | wc -l (now for example 70 in my case)

Some remarks: I'm not entirely sure what's the best way to reproduce the issue. For the numbers above, I used a fresh Icinga2 instance on a new VM with the Debian standard config (i.e. monitoring localhost with a few checks) and running the loop sending the requests 6 times in parallel for about an hour. There it does not leak threads very fast but it does so over time.

On the other hand, with an instance that handles quite some checks, including many passive ones, and regularly receiving real API requests, running one loop doing requests is enough to get a thread leaked every few seconds.

Context

I tried to to debug another issue and forgot that loop running which made Icinga segfault. That's not good ;)

Your Environment

Al2Klimov commented 7 years ago

Hello @julianbrost

does the error still occur? If yes and you're really bored, you could test whether https://github.com/Icinga/icinga2/commit/66c0746d5b0550cd185361e4b096092184bd8548 solves the problem.

Best AK

PS: "really bored" because that patch hasn't been approved by the core devs (yet).

gunnarbeutner commented 7 years ago

FWIW #5278 fixes I2_LEAK_DEBUG which might be useful for diagnosing this problem.

Stefar77 commented 7 years ago

@gunnarbeutner @Al2Klimov See #5408 It's a deadlock with the lock in ApiListener. I replaced all locks for the m_HttpClients vector and it's stable!

ObjectLock olock(this);

with

boost::mutex::scoped_lock(m_HttpLock);

And made the private in the hpp file.

boost::mutex m_HttpLock;
Stefar77 commented 7 years ago

Looking closer at the stacktrace I think #5419 would be your fix.

Stefar77 commented 7 years ago

Another way to try and fix is until the release is limit the amount of concurrency to less cores, I use machines that have 24 cores so for testing I sometimes limit an instance to 2 or 4 cores so I don't bother other processes as much when starting/reloading all the time. (that deletes and reinserts lots of stuff in the database). Reload without a override uses all cores for a few seconds, with concurrency at 2 a reload takes about as long but only peaks at 200% so no worries that's only 2 cores. :+1:

In init.conf add; const Concurrency = 2 or set it to 1 and make it use ~1 core.

check with

# icinga2 console -e Concurrency
2.0
dnsmichi commented 7 years ago

It is not reproducible on macOS and RHEL7, but Debian 9 hits it.

(gdb) bt
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007fe438e1c8bb in boost::condition_variable::wait (this=this@entry=0x5581fc7ec128, m=...)
    at /usr/include/boost/thread/pthread/condition_variable.hpp:77
#2  0x00007fe438da12bb in icinga::WorkQueue::Join (this=this@entry=0x5581fc7ec020, stop=stop@entry=true) at ./lib/base/workqueue.cpp:115
#3  0x00007fe438db3439 in icinga::WorkQueue::~WorkQueue (this=0x5581fc7ec020, __in_chrg=<optimized out>) at ./lib/base/workqueue.cpp:51
#4  0x00007fe439451956 in icinga::HttpServerConnection::~HttpServerConnection (this=0x5581fc7ebf60, __in_chrg=<optimized out>)
    at ./lib/remote/httpserverconnection.hpp:37
#5  icinga::HttpServerConnection::~HttpServerConnection (this=0x5581fc7ebf60, __in_chrg=<optimized out>) at ./lib/remote/httpserverconnection.hpp:37
#6  0x00007fe439455777 in boost::intrusive_ptr<icinga::HttpServerConnection>::~intrusive_ptr (this=0x7fe3b402c700, __in_chrg=<optimized out>)
    at /usr/include/boost/smart_ptr/intrusive_ptr.hpp:97
#7  boost::_bi::value<boost::intrusive_ptr<icinga::HttpServerConnection> >::~value (this=0x7fe3b402c700, __in_chrg=<optimized out>)
    at /usr/include/boost/bind/bind.hpp:118
#8  boost::_bi::storage1<boost::_bi::value<boost::intrusive_ptr<icinga::HttpServerConnection> > >::~storage1 (this=0x7fe3b402c700,
    __in_chrg=<optimized out>) at /usr/include/boost/bind/storage.hpp:40
#9  boost::_bi::storage2<boost::_bi::value<boost::intrusive_ptr<icinga::HttpServerConnection> >, boost::_bi::value<icinga::HttpRequest> >::~storage2
    (this=0x7fe3b402c700, __in_chrg=<optimized out>) at /usr/include/boost/bind/storage.hpp:76
#10 boost::_bi::list2<boost::_bi::value<boost::intrusive_ptr<icinga::HttpServerConnection> >, boost::_bi::value<icinga::HttpRequest> >::~list2 (
    this=0x7fe3b402c700, __in_chrg=<optimized out>) at /usr/include/boost/bind/bind.hpp:281
#11 boost::_bi::bind_t<void, boost::_mfi::mf1<void, icinga::HttpServerConnection, icinga::HttpRequest&>, boost::_bi::list2<boost::_bi::value<boost::intrusive_ptr<icinga::HttpServerConnection> >, boost::_bi::value<icinga::HttpRequest> > >::~bind_t (this=0x7fe3b402c6f0, __in_chrg=<optimized out>)
    at /usr/include/boost/bind/bind.hpp:1275
#12 boost::detail::function::functor_manager<boost::_bi::bind_t<void, boost::_mfi::mf1<void, icinga::HttpServerConnection, icinga::HttpRequest&>, boost::_bi::list2<boost::_bi::value<boost::intrusive_ptr<icinga::HttpServerConnection> >, boost::_bi::value<icinga::HttpRequest> > > >::manager (
    op=<optimized out>, out_buffer=..., in_buffer=...) at /usr/include/boost/function/function_base.hpp:358
#13 boost::detail::function::functor_manager<boost::_bi::bind_t<void, boost::_mfi::mf1<void, icinga::HttpServerConnection, icinga::HttpRequest&>, boost::_bi::list2<boost::_bi::value<boost::intrusive_ptr<icinga::HttpServerConnection> >, boost::_bi::value<icinga::HttpRequest> > > >::manager (
    op=<optimized out>, out_buffer=..., in_buffer=...) at /usr/include/boost/function/function_base.hpp:379
#14 boost::detail::function::functor_manager<boost::_bi::bind_t<void, boost::_mfi::mf1<void, icinga::HttpServerConnection, icinga::HttpRequest&>, boost::_bi::list2<boost::_bi::value<boost::intrusive_ptr<icinga::HttpServerConnection> >, boost::_bi::value<icinga::HttpRequest> > > >::manage (
    in_buffer=..., out_buffer=..., op=<optimized out>) at /usr/include/boost/function/function_base.hpp:407
#15 0x00007fe438e15ecf in boost::detail::function::basic_vtable0<void>::clear (this=<optimized out>, functor=...)
    at /usr/include/boost/function/function_template.hpp:510
#16 boost::function0<void>::clear (this=0x7fe3a6355b70) at /usr/include/boost/function/function_template.hpp:879
#17 boost::function0<void>::~function0 (this=0x7fe3a6355b70, __in_chrg=<optimized out>) at /usr/include/boost/function/function_template.hpp:763
#18 boost::function<void ()>::~function() (this=0x7fe3a6355b70, __in_chrg=<optimized out>) at /usr/include/boost/function/function_template.hpp:1052
#19 boost::function<void ()>::operator=(boost::function<void ()>&&) (this=this@entry=0x7fe3a6355c90,
    f=f@entry=<unknown type in /usr/lib/debug/.build-id/21/53b090e5855a488d3dd7f43d0f0c6b71133971.debug, CU 0xdc2b, DIE 0x16c3d3>)
    at /usr/include/boost/function/function_template.hpp:1110
#20 0x00007fe438da8cf4 in icinga::Task::operator=(icinga::Task&&) (this=0x7fe3a6355c90) at ./lib/base/workqueue.hpp:44
#21 icinga::WorkQueue::WorkerThreadProc (this=0x5581fc7ec020) at ./lib/base/workqueue.cpp:278
#22 0x00007fe43a9c9116 in ?? () from /usr/lib/x86_64-linux-gnu/libboost_thread.so.1.62.0
#23 0x00007fe4399eb494 in start_thread (arg=0x7fe3a6356700) at pthread_create.c:333
#24 0x00007fe436c5faff in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97
dnsmichi commented 7 years ago

icinga2_debian_thread_leak

dnsmichi commented 7 years ago

Next steps: Figure out why it does not fail on RHEL but Debian.

linuxmail commented 6 years ago

hi, may it related to this issue: Because of a scan with Saint8, all Icinga2 where killed after phase2, which is a port scan.

# nmap -sT -n -PN -O --min_hostgroup 100 --min_parallelism 50 --max_scan_delay 10ms -p1-65535 -oX nmap.out.15979 -v icinga2.example.com

with this command, I can kill the instance and have a crash file (three files in here):

Application information:
  Application version: r2.8.0-1
  Installation root: /usr
  Sysconf directory: /etc
  Run directory: /run
  Local state directory: /var
  Package data directory: /usr/share/icinga2
  State path: /var/lib/icinga2/icinga2.state
  Modified attributes path: /var/lib/icinga2/modified-attributes.conf
  Objects path: /var/cache/icinga2/icinga2.debug
  Vars path: /var/cache/icinga2/icinga2.vars
  PID path: /run/icinga2/icinga2.pid

System information:
  Platform: Debian GNU/Linux
  Platform version: 9 (stretch)
  Kernel: Linux
  Kernel version: 4.9.0-4-amd64
  Architecture: x86_64

Build information:
  Compiler: GNU 6.3.0
  Build host: fd5704a3fd2e
Stacktrace:

    (0) libpthread.so.0: <unknown function> (+0x110c0) [0x2b69be8590c0]
    (1) libc.so.6: gsignal (+0xcf) [0x2b69c154ffcf]
    (2) libc.so.6: abort (+0x16a) [0x2b69c15513fa]
    (3) libc.so.6: <unknown function> (+0x2be37) [0x2b69c1548e37]
    (4) libc.so.6: <unknown function> (+0x2bee2) [0x2b69c1548ee2]
    (5) libremote.so.2.8.0: icinga::HttpServerConnection::ProcessMessageAsync(icinga::HttpRequest&) (+0x1607) [0x2b69bedb10d7]
    (6) libbase.so.2.8.0: icinga::WorkQueue::WorkerThreadProc() (+0x592) [0x2b69bf3c1f82]
    (7) libboost_thread.so.1.62.0: <unknown function> (+0x12116) [0x2b69bd87c116]
    (8) libpthread.so.0: <unknown function> (+0x7494) [0x2b69be84f494]
    (9) libc.so.6: clone (+0x3f) [0x2b69c1605aff]

***
* This would indicate a runtime problem or configuration error. If you believe this is a bug in Icinga 2
* please submit a bug report at https://github.com/Icinga/icinga2 and include this stack trace as well as any other
* information that might be useful in order to reproduce this problem.
***

Application information:
  Application version: r2.8.0-1
  Installation root: /usr
  Sysconf directory: /etc
  Run directory: /run
  Local state directory: /var
  Package data directory: /usr/share/icinga2
  State path: /var/lib/icinga2/icinga2.state
  Modified attributes path: /var/lib/icinga2/modified-attributes.conf
  Objects path: /var/cache/icinga2/icinga2.debug
  Vars path: /var/cache/icinga2/icinga2.vars
  PID path: /run/icinga2/icinga2.pid

System information:
  Platform: Debian GNU/Linux
  Platform version: 9 (stretch)
  Kernel: Linux
  Kernel version: 4.9.0-4-amd64
  Architecture: x86_64

Build information:
  Compiler: GNU 6.3.0
  Build host: fd5704a3fd2e
Stacktrace:

    (0) libpthread.so.0: <unknown function> (+0x110c0) [0x2b509b6bb0c0]
    (1) libc.so.6: gsignal (+0xcf) [0x2b509e3b1fcf]
    (2) libc.so.6: abort (+0x16a) [0x2b509e3b33fa]
    (3) libc.so.6: <unknown function> (+0x2be37) [0x2b509e3aae37]
    (4) libc.so.6: <unknown function> (+0x2bee2) [0x2b509e3aaee2]
    (5) libremote.so.2.8.0: icinga::HttpServerConnection::ProcessMessageAsync(icinga::HttpRequest&) (+0x1607) [0x2b509bc130d7]
    (6) libbase.so.2.8.0: icinga::WorkQueue::WorkerThreadProc() (+0x592) [0x2b509c223f82]
    (7) libboost_thread.so.1.62.0: <unknown function> (+0x12116) [0x2b509a6de116]
    (8) libpthread.so.0: <unknown function> (+0x7494) [0x2b509b6b1494]
    (9) libc.so.6: clone (+0x3f) [0x2b509e467aff]

***
* This would indicate a runtime problem or configuration error. If you believe this is a bug in Icinga 2
* please submit a bug report at https://github.com/Icinga/icinga2 and include this stack trace as well as any other
* information that might be useful in order to reproduce this problem.
***

Application information:
  Application version: r2.8.0-1
  Installation root: /usr
  Sysconf directory: /etc
  Run directory: /run
  Local state directory: /var
  Package data directory: /usr/share/icinga2
  State path: /var/lib/icinga2/icinga2.state
  Modified attributes path: /var/lib/icinga2/modified-attributes.conf
  Objects path: /var/cache/icinga2/icinga2.debug
  Vars path: /var/cache/icinga2/icinga2.vars
  PID path: /run/icinga2/icinga2.pid

System information:
  Platform: Debian GNU/Linux
  Platform version: 9 (stretch)
  Kernel: Linux
  Kernel version: 4.9.0-4-amd64
  Architecture: x86_64

Build information:
  Compiler: GNU 6.3.0
  Build host: fd5704a3fd2e
Stacktrace:

    (0) libpthread.so.0: <unknown function> (+0x110c0) [0x7f07efce40c0]
    (1) libc.so.6: gsignal (+0xcf) [0x7f07eceaefcf]
    (2) libc.so.6: abort (+0x16a) [0x7f07eceb03fa]
    (3) libc.so.6: <unknown function> (+0x2be37) [0x7f07ecea7e37]
    (4) libc.so.6: <unknown function> (+0x2bee2) [0x7f07ecea7ee2]
    (5) libbase.so.2.8.0: <unknown function> (+0xe669a) [0x7f07ef11369a]
    (6) libbase.so.2.8.0: icinga::TlsStream::Handshake() (+0x102) [0x7f07ef09a8c2]
    (7) libremote.so.2.8.0: icinga::ApiListener::NewClientHandlerInternal(boost::intrusive_ptr<icinga::Socket> const&, icinga::String const&, icinga::ConnectionRole) (+0x1ac) [0x7f07ef7580dc]
    (8) libremote.so.2.8.0: icinga::ApiListener::NewClientHandler(boost::intrusive_ptr<icinga::Socket> const&, icinga::String const&, icinga::ConnectionRole) (+0x25) [0x7f07ef7597e5]
    (9) libboost_thread.so.1.62.0: <unknown function> (+0x12116) [0x7f07f0cb8116]
    (10) libpthread.so.0: <unknown function> (+0x7494) [0x7f07efcda494]
    (11) libc.so.6: clone (+0x3f) [0x7f07ecf64aff]
linuxmail commented 6 years ago

hi,

today we scan our network with Saint again and all Icinga2 instances crashs:

Application information:
  Application version: r2.8.1-1
  Installation root: /usr
  Sysconf directory: /etc
  Run directory: /run
  Local state directory: /var
  Package data directory: /usr/share/icinga2
  State path: /var/lib/icinga2/icinga2.state
  Modified attributes path: /var/lib/icinga2/modified-attributes.conf
  Objects path: /var/cache/icinga2/icinga2.debug
  Vars path: /var/cache/icinga2/icinga2.vars
  PID path: /run/icinga2/icinga2.pid

System information:
  Platform: Debian GNU/Linux
  Platform version: 9 (stretch)
  Kernel: Linux
  Kernel version: 4.9.0-5-amd64
  Architecture: x86_64

Build information:
  Compiler: GNU 6.3.0
  Build host: d3c3d2a588bd
Stacktrace:

        (0) libpthread.so.0: <unknown function> (+0x110c0) [0x2b56b48c90c0]
        (1) libc.so.6: gsignal (+0xcf) [0x2b56b75bffcf]
        (2) libc.so.6: abort (+0x16a) [0x2b56b75c13fa]
        (3) libc.so.6: <unknown function> (+0x2be37) [0x2b56b75b8e37]
        (4) libc.so.6: <unknown function> (+0x2bee2) [0x2b56b75b8ee2]
        (5) libremote.so.2.8.1: icinga::HttpServerConnection::ProcessMessageAsync(icinga::HttpRequest&) (+0x1607) [0x2b56b4e21307]
        (6) libbase.so.2.8.1: icinga::WorkQueue::WorkerThreadProc() (+0x592) [0x2b56b5431f82]
        (7) libboost_thread.so.1.62.0: <unknown function> (+0x12116) [0x2b56b38ec116]
        (8) libpthread.so.0: <unknown function> (+0x7494) [0x2b56b48bf494]
        (9) libc.so.6: clone (+0x3f) [0x2b56b7675aff]

so, also with 2.8.1-1 Stretch release .. no change.

nprashanth commented 6 years ago

We still see this thread leak issue with following stack trace: (gdb) thr 4 [Switching to thread 4 (Thread 0x7fbdcbb2c700 (LWP 25001))]

0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185

185 ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S: No such file or directory. (gdb) bt

0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185

1 0x00007fbdec0a0a3b in ?? () from /usr/lib/x86_64-linux-gnu/icinga2/libbase.so.2.8.2

2 0x00007fbdec02f77b in icinga::WorkQueue::Join(bool) () from /usr/lib/x86_64-linux-gnu/icinga2/libbase.so.2.8.2

3 0x00007fbdec061476 in icinga::WorkQueue::~WorkQueue() () from /usr/lib/x86_64-linux-gnu/icinga2/libbase.so.2.8.2

4 0x00007fbde96734e6 in icinga::HttpServerConnection::~HttpServerConnection() ()

from /usr/lib/x86_64-linux-gnu/icinga2/libremote.so.2.8.2

5 0x00007fbde967377e in ?? () from /usr/lib/x86_64-linux-gnu/icinga2/libremote.so.2.8.2

6 0x00007fbde9674261 in ?? () from /usr/lib/x86_64-linux-gnu/icinga2/libremote.so.2.8.2

7 0x00007fbdec098eca in ?? () from /usr/lib/x86_64-linux-gnu/icinga2/libbase.so.2.8.2

8 0x00007fbdec0380c8 in icinga::WorkQueue::WorkerThreadProc() () from /usr/lib/x86_64-linux-gnu/icinga2/libbase.so.2.8.2

9 0x00007fbdead8da4a in ?? () from /usr/lib/x86_64-linux-gnu/libboost_thread.so.1.54.0

10 0x00007fbdec383184 in start_thread (arg=0x7fbdcbb2c700) at pthread_create.c:312

11 0x00007fbdeb29a03d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Is there any fix or workaround for this?

dnsmichi commented 6 years ago

A possible test would be to get a different boost version (and compile it) and recompile Icinga 2 against. Just for dev purposes, to get a clear view where to look for this error. CentOS 7 for example uses 1.54 which doesn't have such. Could also be a Debian specific patch which causes trouble here.

Workarounds don't exist, that's an issue with shared pointer logic here.

nprashanth commented 6 years ago

I have 1.54 boost on my Ubuntu 14.04.5:

~# dpkg -l | grep boost ii libboost-iostreams1.54.0:amd64 1.54.0-4ubuntu3.1 amd64 Boost.Iostreams Library ii libboost-program-options1.54.0:amd64 1.54.0-4ubuntu3.1 amd64 program options library for C++ ii libboost-regex1.54.0:amd64 1.54.0-4ubuntu3 amd64 regular expression library for C++ ii libboost-system1.54.0:amd64 1.54.0-4ubuntu3 amd64 Operating system (e.g. diagnostics support) library ii libboost-thread1.54.0:amd64 1.54.0-4ubuntu3 amd64 portable C++ multi-threading

dnsmichi commented 6 years ago

As @N-o-X tested this last week, I'm also not able to reproduce this on Debian Stretch 9.4 using current master.

apt-get -y install gdb vim git cmake make ccache build-essential libssl-dev libboost-all-dev bison flex default-libmysqlclient-dev libpq-dev libyajl-dev libedit-dev monitoring-plugins

ln -s /usr/bin/ccache /usr/local/bin/g++

groupadd icinga
groupadd icingacmd
useradd -c "icinga" -s /sbin/nologin -G icingacmd -g icinga icinga

git clone https://github.com/icinga/icinga2.git && cd icinga2

mkdir debug release
cd debug
cmake .. -DCMAKE_BUILD_TYPE=Debug -DICINGA2_UNITY_BUILD=OFF -DCMAKE_INSTALL_PREFIX=/usr/local/icinga2 -DICINGA2_PLUGINDIR=/usr/local/sbin
cd ..
make -j2 install -C debug
chown -R icinga:icinga /usr/local/icinga2/var/

cd /usr/local/icinga2
./lib/icinga2/prepare-dirs etc/sysconfig/icinga2
icinga2 api setup
vim etc/icinga2/conf.d/api-users.conf

gdb --args ./lib/icinga2/sbin/icinga2 daemon
watch -n 1 'for pid in $(pidof icinga2); do ps -T -p $pid | grep -F icinga2; done | wc -l'
while true; do curl -k -s -u root:icinga 'https://192.168.33.20:5665/v1/objects/services' >/dev/null; done

 while true; do curl -k -s -u root:icinga -H 'Accept: application/json' -X POST 'https://192.168.33.20:5665/v1/actions/process-check-result' -d '{ "type": "Service", "filter": true, "plugin_output": "api stress", "exit_status": 1 }' >/dev/null && sleep 0.1; done

while true; do /usr/local/sbin/check_http -H 192.168.33.20 -p 5665 -S -e '401 Unauthorized' -N > /dev/null; done

There may be some changes with git master vs 2.8.x, as we've changed from boost::shared/intrusive_ptr to std::shared/intrusive_ptr.

I've checked out v2.8.2 and have built that one. Same behaviour, round about 24-26 threads.

samta93 commented 6 years ago

@dnsmichi do you see this issue when installing icinga on mac?

dnsmichi commented 6 years ago

macOS is not a supported platform, just a development environment. Look above, tests have been run on various Linux distributions and neither gives an indication tp reliably reproduce the error and actually fix it.

slalomsk8er commented 6 years ago

I upgrades from Debian 8 to 9 and since then I get load warnings, icingaweb2 feels sluggish. My suspicion is that the 2581 Icinga threads have some thing to do with it.

for pid in $(pidof icinga2); do ps -T -p $pid | grep -F icinga2; done | wc -l 2685

information/cli: Icinga application loader (version: r2.8.2-1) information/cli: Loading configuration file(s). information/ConfigItem: Committing config item(s). warning/ApiListener: Attribute 'key_path' for object 'api' of type 'ApiListener' is deprecated and should not be used. warning/ApiListener: Attribute 'ca_path' for object 'api' of type 'ApiListener' is deprecated and should not be used. warning/ApiListener: Attribute 'cert_path' for object 'api' of type 'ApiListener' is deprecated and should not be used. warning/ApiListener: Please read the upgrading documentation for v2.8: https://www.icinga.com/docs/icinga2/latest/doc/16-upgrading-icinga-2/ information/ApiListener: My API identity: icticingalp01.ms.uhbs.ch warning/ApplyRule: Apply rule 'child-health' (in /etc/icinga2/conf.d/services.conf: 215:1-215:28) for type 'Service' does not match anywhere! information/ConfigItem: Instantiated 1 ApiListener. information/ConfigItem: Instantiated 5 Zones. information/ConfigItem: Instantiated 4 Endpoints. information/ConfigItem: Instantiated 1 FileLogger. information/ConfigItem: Instantiated 3 ApiUsers. information/ConfigItem: Instantiated 574 Notifications. information/ConfigItem: Instantiated 2 NotificationCommands. information/ConfigItem: Instantiated 219 CheckCommands. information/ConfigItem: Instantiated 14 Downtimes. information/ConfigItem: Instantiated 27 HostGroups. information/ConfigItem: Instantiated 1 IcingaApplication. information/ConfigItem: Instantiated 4 EventCommands. information/ConfigItem: Instantiated 58 Hosts. information/ConfigItem: Instantiated 10 Comments. information/ConfigItem: Instantiated 13 UserGroups. information/ConfigItem: Instantiated 2 Dependencies. information/ConfigItem: Instantiated 22 Users. information/ConfigItem: Instantiated 4 TimePeriods. information/ConfigItem: Instantiated 509 Services. information/ConfigItem: Instantiated 38 ServiceGroups. information/ConfigItem: Instantiated 138 ScheduledDowntimes. information/ConfigItem: Instantiated 1 ExternalCommandListener. information/ConfigItem: Instantiated 1 CheckerComponent. information/ConfigItem: Instantiated 1 GraphiteWriter. information/ConfigItem: Instantiated 1 IdoMysqlConnection. information/ConfigItem: Instantiated 1 NotificationComponent. information/ScriptGlobal: Dumping variables to file '/var/cache/icinga2/icinga2.vars' information/cli: Finished validating the configuration file(s).

dnsmichi commented 6 years ago

@slalomsk8er what exactly are you using to query the REST API? Anything special on your setup which helps to isolate the problem?

slalomsk8er commented 6 years ago

@dnsmichi There is icingaweb2 and a custom dashboard that uses a proxy script (attached).

It gets called with or without a filter argument.

http://******.******.**/dash/services.php?filter=%22*******%22%20in%20service.groups%20%7C%7C%20%22*******%22%20in%20service.groups

services.php.txt

all names/identifiers are replaced by *****

dnsmichi commented 6 years ago

The script condition $response === false doesn't call curl_close(). Is that intentional?

slalomsk8er commented 6 years ago

I guess it isn't intentional but from my understanding the GC will clean up after exit.

dnsmichi commented 6 years ago

I'm closing this in favour of #6361.

samta93 commented 6 years ago

Hello @dnsmichi

Is this issue fixed with latest Icinga build?

Could you please point me to the version which I can pick and test in my test environment.

Many Thanks, Samta R

dnsmichi commented 6 years ago

I've triggered snapshot builds a few minutes ago, you should see the current date in your package manager's update. It would help to know which distribution and version you're using.