Icinga / icinga2

The core of our monitoring platform with a powerful configuration language and REST API.
https://icinga.com/docs/icinga2/latest
GNU General Public License v2.0
2.03k stars 578 forks source link

LiveStatusQuery spam in icinga2.log #5953

Closed kajla closed 6 years ago

kajla commented 6 years ago

Dear Developers,

I recently noticed that the Icinga2 LiveStatusQuery spam errors every minute to the icinga2.log file.

Current Behavior

...
[2018-01-08 13:13:02 +0100] critical/Socket: send() failed with error code 32, "Broken pipe"
[2018-01-08 13:13:02 +0100] critical/LivestatusQuery: Cannot write query response to socket.
[2018-01-08 13:13:02 +0100] critical/LivestatusQuery: Cannot write query response to socket.
[2018-01-08 13:13:02 +0100] critical/ThreadPool: Exception thrown in event handler:
Error: Tried to read from closed socket.

        (0) libbase.so.2.8.0: <unknown function> (+0xca801) [0x7f0ba1fca801]
        (1) libbase.so.2.8.0: icinga::NetworkStream::Read(void*, unsigned long, bool) (+0x75) [0x7f0ba1febbf5]
        (2) libbase.so.2.8.0: icinga::StreamReadContext::FillFromStream(boost::intrusive_ptr<icinga::Stream> const&, bool) (+0x58) [0x7f0ba1fb2268]
        (3) libbase.so.2.8.0: icinga::Stream::ReadLine(icinga::String*, icinga::StreamReadContext&, bool) (+0x74) [0x7f0ba1fb23a4]
        (4) liblivestatus.so.2.8.0: icinga::LivestatusListener::ClientHandler(boost::intrusive_ptr<icinga::Socket> const&) (+0x13a) [0x7f0b9ce274ea]
        (5) libbase.so.2.8.0: icinga::ThreadPool::WorkerThread::ThreadProc(icinga::ThreadPool::Queue&) (+0x365) [0x7f0ba20082c5]
        (6) libboost_thread.so.1.54.0: <unknown function> (+0xcc0a) [0x7f0ba2c34c0a]
        (7) libpthread.so.0: <unknown function> (+0x8744) [0x7f0ba1543744]
        (8) libc.so.6: clone (+0x6d) [0x7f0ba2f29aad]

[2018-01-08 13:13:02 +0100] critical/Socket: send() failed with error code 32, "Broken pipe"
[2018-01-08 13:13:02 +0100] critical/LivestatusQuery: Cannot write query response to socket.
[2018-01-08 13:13:02 +0100] critical/LivestatusQuery: Cannot write query response to socket.
[2018-01-08 13:13:02 +0100] critical/ThreadPool: Exception thrown in event handler:
Error: Tried to read from closed socket.

        (0) libbase.so.2.8.0: <unknown function> (+0xca801) [0x7f0ba1fca801]
        (1) libbase.so.2.8.0: icinga::NetworkStream::Read(void*, unsigned long, bool) (+0x75) [0x7f0ba1febbf5]
        (2) libbase.so.2.8.0: icinga::StreamReadContext::FillFromStream(boost::intrusive_ptr<icinga::Stream> const&, bool) (+0x58) [0x7f0ba1fb2268]
        (3) libbase.so.2.8.0: icinga::Stream::ReadLine(icinga::String*, icinga::StreamReadContext&, bool) (+0x74) [0x7f0ba1fb23a4]
        (4) liblivestatus.so.2.8.0: icinga::LivestatusListener::ClientHandler(boost::intrusive_ptr<icinga::Socket> const&) (+0x13a) [0x7f0b9ce274ea]
        (5) libbase.so.2.8.0: icinga::ThreadPool::WorkerThread::ThreadProc(icinga::ThreadPool::Queue&) (+0x365) [0x7f0ba20082c5]
        (6) libboost_thread.so.1.54.0: <unknown function> (+0xcc0a) [0x7f0ba2c34c0a]
        (7) libpthread.so.0: <unknown function> (+0x8744) [0x7f0ba1543744]
        (8) libc.so.6: clone (+0x6d) [0x7f0ba2f29aad]

[2018-01-08 13:13:02 +0100] critical/Socket: send() failed with error code 32, "Broken pipe"
[2018-01-08 13:13:02 +0100] critical/LivestatusQuery: Cannot write query response to socket.
[2018-01-08 13:13:02 +0100] critical/LivestatusQuery: Cannot write query response to socket.
[2018-01-08 13:13:02 +0100] critical/ThreadPool: Exception thrown in event handler:
Error: Tried to read from closed socket.

        (0) libbase.so.2.8.0: <unknown function> (+0xca801) [0x7f0ba1fca801]
        (1) libbase.so.2.8.0: icinga::NetworkStream::Read(void*, unsigned long, bool) (+0x75) [0x7f0ba1febbf5]
        (2) libbase.so.2.8.0: icinga::StreamReadContext::FillFromStream(boost::intrusive_ptr<icinga::Stream> const&, bool) (+0x58) [0x7f0ba1fb2268]
        (3) libbase.so.2.8.0: icinga::Stream::ReadLine(icinga::String*, icinga::StreamReadContext&, bool) (+0x74) [0x7f0ba1fb23a4]
        (4) liblivestatus.so.2.8.0: icinga::LivestatusListener::ClientHandler(boost::intrusive_ptr<icinga::Socket> const&) (+0x13a) [0x7f0b9ce274ea]
        (5) libbase.so.2.8.0: icinga::ThreadPool::WorkerThread::ThreadProc(icinga::ThreadPool::Queue&) (+0x365) [0x7f0ba20082c5]
        (6) libboost_thread.so.1.54.0: <unknown function> (+0xcc0a) [0x7f0ba2c34c0a]
        (7) libpthread.so.0: <unknown function> (+0x8744) [0x7f0ba1543744]
        (8) libc.so.6: clone (+0x6d) [0x7f0ba2f29aad]
...

Context

Spam errors to the log file. Socket file: srw-rw---- 1 icinga icingacmd 0 Jan 8 13:07 /var/run/icinga2/cmd/livestatus=

# file /var/run/icinga2/cmd/livestatus
/var/run/icinga2/cmd/livestatus: socket

Your Environment

/etc/icinga2/features-enabled/livestatus.conf file contents:

/**
 *  The livestatus library implements the livestatus query protocol.
 */

library "livestatus"

object LivestatusListener "livestatus" {
  socket_type = "unix"
  socket_path = "/var/run/icinga2/cmd/livestatus"
}

Thank you in advance.

dnsmichi commented 6 years ago

If your client application does not close the socket, or wait for processing the response, such errors occur. This is a clear indication of something going wrong, nothing I would call "spam".

Kind regards, Michael

kajla commented 6 years ago

Oh, I see. I used Thruk to SLA reports. It needs a LiveStatus. Do you think it doesn't close this socket? What can I do with it?

Thank you in advance.

dnsmichi commented 6 years ago

I'd investigate on the time such queries take, extract what the application fires and debug that. You may enable Icinga 2's debug log to see more, but the rest really is network (sockets) and the remote application.

Cheers, Michael

Lali1006 commented 6 years ago

Hi,

I have same problem in Icinga 2.6.2, with Thruk 2.18.
My icinga2.log conatins a lot of Livestatus query errors: [2018-01-09 12:36:57 +0100] critical/LivestatusQuery: Cannot write query response to socket.

I haven't socket and thread pool error in the log, just this line. I have got same result in the debug.log.

My Thruk collected some information about the problem: [ERROR][Thruk] No Backend available [2018/01/08 10:10:16][ERROR][Thruk] on page: https://*********/thruk/cgi-bin/status.cgi?host=all&servicestatustypes=28&_=1515399710886 [2018/01/08 10:10:16][ERROR][Thruk] All: ERROR: failed to connect - Connection refused. (*****:6557) [2018/01/08 10:10:16][ERROR][Thruk] Error in: /thruk/cgi-bin/status.cgi [2018/01/08 10:10:16][ERROR][Thruk] failed to connect - Connection refused at /usr/share/thruk/lib/Monitoring/Livestatus/Class/Lite.pm line 380. at /usr/lib64/thruk/perl5/Plack/Util.pm line 142. eval {...} called at /usr/lib64/thruk/perl5/Plack/Util.pm line 142 Plack::Util::run_app('CODE(0x2da2040)', 'HASH(0x3bf7db0)') called at /usr/lib64/thruk/perl5/Plack/Handler/FCGI.pm line 143 Plack::Handler::FCGI::run('Plack::Handler::FCGI=HASH(0x10ecc48)', 'CODE(0x2da2040)') called at /usr/share/thruk/script/thruk_fastcgi.pl line 24

My httpd error.log content: broken pipe at /usr/share/thruk/lib/Thruk/Backend/Pool.pm line 30. Thruk::Backend::Pool::ANON('PIPE') called at /usr/lib64/thruk/perl5/Plack/Handler/FCGI.pm line 161 eval {...} called at /usr/lib64/thruk/perl5/Plack/Handler/FCGI.pm line 161 Plack::Handler::FCGI::run('Plack::Handler::FCGI=HASH(0x207dc48)', 'CODE(0x3d33050)') called at /usr/share/thruk/script/thruk_fastcgi.pl line 24 [Mon Jan 08 08:53:28.588033 2018] [fcgid:warn] [pid 76590] mod_fcgid: cleanup zombie process 112971 ERROR: got signal TERM while handling request, possible timeout in https://*******/thruk/cgi-bin/history.cgi?entries=100&start=2017-05-01+00:00:00&end=2018-01-09+00:00&archive=&host=****&service=Filesys$ at /usr/share/thruk/lib/Thruk.pm line 543.

kajla commented 6 years ago

My Thruk version is same; 2.18. We have different Icinga2 version with the same error. I think it's a Thruk issue. Am I correct, @dnsmichi ?

Thank you.

dnsmichi commented 6 years ago

I'd say so.