Icinga / icinga2

The core of our monitoring platform with a powerful configuration language and REST API.
GNU General Public License v2.0
1.99k stars 573 forks source link

warning/JsonRpcConnection: Error while sending JSON-RPC message for identity (satellite) [2.13.7] #9788

Open petr-fischer opened 1 year ago

petr-fischer commented 1 year ago

Describe the bug

Our master-satellite sync is breaking with this error/stacktrace (from the master icinga2.log):

[2023-06-12 16:58:32 +0200] information/ApiListener: Reconnecting to endpoint 'xxx-satellite.domain.cz' via host 'xxx.yyy.zzz' and port '5665'
[2023-06-12 16:58:32 +0200] information/ApiListener: New client connection for identity 'xxx-satellite.domain.cz' to [xxx.yyy.zzz]:5665
[2023-06-12 16:58:32 +0200] information/ApiListener: Finished reconnecting to endpoint 'xxx-satellite.domain.cz' via host 'xxx.yyy.zzz' and port '5665'
[2023-06-12 16:58:32 +0200] information/ApiListener: Sending config updates for endpoint 'xxx-satellite.domain.cz' in zone 'xxx'.
[2023-06-12 16:58:32 +0200] information/ApiListener: Syncing configuration files for zone 'xxx' to endpoint 'xxx-satellite.domain.cz'.
[2023-06-12 16:58:32 +0200] information/ApiListener: Syncing configuration files for global zone 'director-global' to endpoint 'xxx-satellite.domain.cz'.
[2023-06-12 16:58:32 +0200] information/ApiListener: Finished sending config file updates for endpoint 'xxx-satellite.domain.cz' in zone 'xxx'.
[2023-06-12 16:58:32 +0200] information/ApiListener: Syncing runtime objects to endpoint 'xxx-satellite.domain.cz'.
[2023-06-12 16:58:32 +0200] warning/JsonRpcConnection: Error while sending JSON-RPC message for identity 'xxx-satellite.domain.cz'
Error: Connection reset by peer

 0# __cxa_throw in /usr/lib64/icinga2/sbin/icinga2
 1# 0x000000000086B7BA in /usr/lib64/icinga2/sbin/icinga2
 2# icinga::JsonRpcConnection::WriteOutgoingMessages(boost::asio::basic_yield_context<boost::asio::executor_binder<void (*)(), boost::asio::executor> >) in /usr/lib64/icinga2/sbin/icinga2
 3# 0x0000000000AE4456 in /usr/lib64/icinga2/sbin/icinga2
 4# 0x0000000000AE5910 in /usr/lib64/icinga2/sbin/icinga2
 5# make_fcontext in /lib64/libboost_context.so.1.69.0
[2023-06-12 16:58:32 +0200] warning/JsonRpcConnection: API client disconnected for identity 'xxx-satellite.domain.cz'
[2023-06-12 16:58:32 +0200] warning/ApiListener: Removing API client for endpoint 'xxx-satellite.domain.cz'. 0 API clients left.
[2023-06-12 16:58:32 +0200] information/ApiListener: Finished syncing runtime objects to endpoint 'xxx-satellite.domain.cz'.
[2023-06-12 16:58:32 +0200] information/ApiListener: Finished sending runtime config updates for endpoint 'xxx-satellite.domain.cz' in zone 'xxx'.
[2023-06-12 16:58:32 +0200] information/ApiListener: Sending replay log for endpoint 'xxx-satellite.domain.cz' in zone 'xxx'.
[2023-06-12 16:58:32 +0200] information/ApiListener: Finished sending replay log for endpoint 'xxx-satellite.domain.cz' in zone 'xxx'.
[2023-06-12 16:58:32 +0200] information/ApiListener: Finished syncing endpoint 'xxx-satellite.domain.cz' in zone 'xxx'.


To Reproduce

I don't know how to reproduce it. If you need our exact configuration of master, satellites and agents, we can anonymize it and send, but it's probably not necessary.

icinga2 --version

icinga2 - The Icinga 2 network monitoring daemon (version: r2.13.7-1)

Copyright (c) 2012-2023 Icinga GmbH (https://icinga.com/)
License GPLv2+: GNU GPL version 2 or later <https://gnu.org/licenses/gpl2.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

System information:
  Platform: CentOS Linux
  Platform version: 7 (Core)
  Kernel: Linux
  Kernel version: 3.10.0-1160.83.1.el7.x86_64
  Architecture: x86_64

Build information:
  Compiler: GNU 11.2.1
  Build host: runner-hh8q3bz2-project-575-concurrent-0
  OpenSSL version: OpenSSL 1.0.2k-fips  26 Jan 2017

Application information:

General paths:
  Config directory: /etc/icinga2
  Data directory: /var/lib/icinga2
  Log directory: /var/log/icinga2
  Cache directory: /var/cache/icinga2
  Spool directory: /var/spool/icinga2
  Run directory: /run/icinga2

Old paths (deprecated):
  Installation root: /usr
  Sysconf directory: /etc
  Run directory (base): /run
  Local state directory: /var

Internal paths:
  Package data directory: /usr/share/icinga2
  State path: /var/lib/icinga2/icinga2.state
  Modified attributes path: /var/lib/icinga2/modified-attributes.conf
  Objects path: /var/cache/icinga2/icinga2.debug
  Vars path: /var/cache/icinga2/icinga2.vars
  PID path: /run/icinga2/icinga2.pid

cat /etc/os-release

NAME="CentOS Linux"
VERSION="7 (Core)"
ID_LIKE="rhel fedora"
PRETTY_NAME="CentOS Linux 7 (Core)"

icinga2 feature list

Disabled features: compatlog debuglog elasticsearch gelf graphite icingadb influxdb2 livestatus opentsdb perfdata statusdata syslog
Enabled features: api checker command ido-pgsql influxdb mainlog notification

Icinga Web 2 version and modules

Icinga Web 2 Version    2.11.4
Git commit  11453bfa92a70a44efbf7f966f5e7f27e9300a28
PHP Version     7.3.33
Git commit date     2023-01-26

Loaded Libraries
icinga/icinga-php-library   0.11.0
icinga/icinga-php-thirdparty    0.11.0

Loaded Modules
director        1.10.2  Configure
fileshipper         1.1.0   Configure
grafana         1.3.6   Configure
incubator       0.20.0  Configure
monitoring      2.11.4  Configure

icinga2 object list --type Endpoint

Output is over 6000 lines, too long.

icinga2 object list --type Zone

Output is over 6000 lines, too long.

yum list installed | grep boost (boost libs)

boost-regex.x86_64          1.53.0-28.el7     @CentOS7_os_x86_64     
boost-system.x86_64         1.53.0-28.el7     @CentOS7_os_x86_64     
boost-thread.x86_64         1.53.0-28.el7     @CentOS7_os_x86_64     
boost169-chrono.x86_64      1.69.0-2.el7      @epel                             
boost169-context.x86_64     1.69.0-2.el7      @epel                             
boost169-coroutine.x86_64   1.69.0-2.el7      @epel                             
boost169-date-time.x86_64   1.69.0-2.el7      @epel                             
boost169-filesystem.x86_64  1.69.0-2.el7      @epel                             
boost169-iostreams.x86_64   1.69.0-2.el7      @DataSpring_EPEL_7Server_x86_64   
boost169-regex.x86_64       1.69.0-2.el7      @epel                             
boost169-system.x86_64      1.69.0-2.el7      @epel                             
boost169-thread.x86_64      1.69.0-2.el7      @epel

Additional context

There are similar bugs here, like #9153, but the error is different (broken pipe).

julianbrost commented 1 year ago

What does xxx-satellite.domain.cz log when this happens?

With the stacktrace, the error probably looks more severe than it actually is. In itself, it just says that a connection got closed. Question is why that happened and why it doesn't get reestablished properly.

petr-fischer commented 1 year ago

We are planning to upgrade to 2.14 - if the error persists on 2.14, I will send the logs.

carraroj commented 1 year ago


Tqnsls commented 11 months ago

Are there any updates regarding this issue? We also experience this issue from week to week and after icinga2 logs the error, the amount of overdue checks on the satellite grows immediately. Only a restart of the instance helps.

widhalmt commented 9 months ago


Skap81 commented 4 months ago

Since Friday, April 26., we have the same problem. We have 2 Master and 16 Checker (8 Checker Zones + Master). One master and two checker of one zone now didn't syncronise with the config-master. The other seven checker-zones are working fine.

[2024-04-30 14:10:00 +0200] warning/JsonRpcConnection: Error while sending JSON-RPC message for identity 'checker1' Error: Connection reset by peer Stacktrace: 0# __cxa_throw in /usr/lib64/icinga2/sbin/icinga2 1# 0x00000000008C3B8C in /usr/lib64/icinga2/sbin/icinga2 2# icinga::JsonRpcConnection::WriteOutgoingMessages(boost::asio::basic_yield_context<boost::asio::executor_binder<void (*)(), boost::asio::executor> >) in /usr/lib64/icinga2/sbin/icinga2 3# 0x0000000000B3DC27 in /usr/lib64/icinga2/sbin/icinga2 4# 0x0000000000B3EBCF in /usr/lib64/icinga2/sbin/icinga2 5# make_fcontext in /usr/lib64/icinga-boost/libboost_context.so.1.69.0 [2024-04-30 14:10:01 +0200] warning/JsonRpcConnection: API client disconnected for identity 'checker1' [2024-04-30 14:10:01 +0200] warning/ApiListener: Removing API client for endpoint 'checker1'. 0 API clients left.

[2024-04-30 12:11:59 +0200] warning/JsonRpcConnection: Error while sending JSON-RPC message for identity '2nd-master' Error: Broken pipe Stacktrace: 0# __cxa_throw in /usr/lib64/icinga2/sbin/icinga2 1# 0x00000000008C3B8C in /usr/lib64/icinga2/sbin/icinga2 2# icinga::JsonRpcConnection::WriteOutgoingMessages(boost::asio::basic_yield_context<boost::asio::executor_binder<void (*)(), boost::asio::executor> >) in /usr/lib64/icinga2/sbin/icinga2 3# 0x0000000000B3DC27 in /usr/lib64/icinga2/sbin/icinga2 4# 0x0000000000B3EBCF in /usr/lib64/icinga2/sbin/icinga2 5# make_fcontext in /usr/lib64/icinga-boost/libboost_context.so.1.69.0 [2024-04-30 12:11:59 +0200] warning/JsonRpcConnection: API client disconnected for identity '2nd-master' [2024-04-30 12:11:59 +0200] warning/ApiListener: Removing API client for endpoint '2nd-master'. 0 API clients left.`

This happens every minute:

[2024-04-30 14:25:00 +0200] warning/JsonRpcConnection: Error while sending JSON-RPC message for identity 'checker1' [2024-04-30 14:25:17 +0200] warning/JsonRpcConnection: Error while sending JSON-RPC message for identity 'checker2' [2024-04-30 14:25:18 +0200] warning/JsonRpcConnection: Error while sending JSON-RPC message for identity '2nd-master' [2024-04-30 14:25:27 +0200] warning/JsonRpcConnection: Error while sending JSON-RPC message for identity 'checker1' [2024-04-30 14:25:56 +0200] warning/JsonRpcConnection: Error while sending JSON-RPC message for identity 'checker1' [2024-04-30 14:26:02 +0200] warning/JsonRpcConnection: Error while sending JSON-RPC message for identity '2nd-master' [2024-04-30 14:26:27 +0200] warning/JsonRpcConnection: Error while sending JSON-RPC message for identity 'checker1' [2024-04-30 14:26:48 +0200] warning/JsonRpcConnection: Error while sending JSON-RPC message for identity '2nd-master' [2024-04-30 14:26:57 +0200] warning/JsonRpcConnection: Error while sending JSON-RPC message for identity 'checker1' [2024-04-30 14:27:27 +0200] warning/JsonRpcConnection: Error while sending JSON-RPC message for identity 'checker1' [2024-04-30 14:27:33 +0200] warning/JsonRpcConnection: Error while sending JSON-RPC message for identity '2nd-master' [2024-04-30 14:27:47 +0200] warning/JsonRpcConnection: Error while sending JSON-RPC message for identity 'checker2' [2024-04-30 14:27:57 +0200] warning/JsonRpcConnection: Error while sending JSON-RPC message for identity 'checker1' [2024-04-30 14:28:19 +0200] warning/JsonRpcConnection: Error while sending JSON-RPC message for identity '2nd-master' [2024-04-30 14:28:27 +0200] warning/JsonRpcConnection: Error while sending JSON-RPC message for identity 'checker1' [2024-04-30 14:28:46 +0200] warning/JsonRpcConnection: Error while sending JSON-RPC message for identity 'checker2' [2024-04-30 14:28:57 +0200] warning/JsonRpcConnection: Error while sending JSON-RPC message for identity 'checker1' [2024-04-30 14:29:03 +0200] warning/JsonRpcConnection: Error while sending JSON-RPC message for identity '2nd-master' [2024-04-30 14:29:27 +0200] warning/JsonRpcConnection: Error while sending JSON-RPC message for identity 'checker1' [2024-04-30 14:29:48 +0200] warning/JsonRpcConnection: Error while sending JSON-RPC message for identity '2nd-master' [2024-04-30 14:29:57 +0200] warning/JsonRpcConnection: Error while sending JSON-RPC message for identity 'checker1' [2024-04-30 14:30:28 +0200] warning/JsonRpcConnection: Error while sending JSON-RPC message for identity 'checker1' [2024-04-30 14:30:34 +0200] warning/JsonRpcConnection: Error while sending JSON-RPC message for identity '2nd-master'

I updated both master an the affected zone to version: r2.14.2-1. The other zones are running at r2.14.0-1, 2.13.2-1 and r2.10.5-1

Skap81 commented 4 months ago

I found a error in the debug.log:

[2024-05-07 12:43:48 +0200] notice/JsonRpcConnection: Received 'log::SetLogPosition' message from identity 'master'.
[2024-05-07 12:43:48 +0200] notice/JsonRpcConnection: Error while reading JSON-RPC message for identity 'master': Error: Length specifier must not exceed 9 characters
[2024-05-07 12:43:48 +0200] warning/JsonRpcConnection: API client disconnected for identity 'master'
[2024-05-07 12:43:48 +0200] warning/ApiListener: Removing API client for endpoint 'master'. 0 API clients left.

What´s that: "Length specifier must not exceed 9 characters"

RincewindsHat commented 3 months ago

@Skap81 Afaik, the message stream between two Icinga2 instances is Netstring encoded and the message would suggest, that message is rejected if the length specifier exceeds 9 characters, meaning more than 1000000000 Bytes (1GiB). Not sure, how it gets that big though.