Starting about a year ago we started having issues when a servers that we monitor goes down uncleanly and reboots (i.e. during an out of memory event). Normally, after the reboot, Icinga reconnects properly and all is fine. Now however, often, but not always, we will see that the server that we are monitoring will no longer connect to the Icinga Master server.
The checks which are executed on the monitored server will error with "Remote Icinga instance '$server' is not connected to '$icinga'.
This problem does not resolve itself. When we restart Icinga on the monitored server it will not resolve itself either.
Only when we restart Icinga on the master server it will recover.
We've had this for a long time, about a year I'd guess at least. I had hoped it would fix itself in a newer version but since it hasn't yet I'm reporting it now.
I don't see anything significant in the log. Here's an output from the Icinga server when it happens. The server with issues, known as myserver01 reboots around 02:03, but only reconnects with Icinga Server when I restart the Icinga Server on both masters around 02:17.
[2020-11-03 02:01:36 +0100] information/WorkQueue: #5 (ApiListener, RelayQueue) items: 1, rate: 382.833/s (22970/min 118674/5min 357081/15min);
[2020-11-03 02:01:52 +0100] information/RemoteCheckQueue: items: 0, rate: 0/s (6/min 30/5min 90/15min);
[2020-11-03 02:02:12 +0100] information/RemoteCheckQueue: items: 0, rate: 0/s (12/min 60/5min 180/15min);
[2020-11-03 02:02:32 +0100] information/RemoteCheckQueue: items: 0, rate: 0/s (6/min 30/5min 90/15min);
[2020-11-03 02:03:12 +0100] information/RemoteCheckQueue: items: 0, rate: 0/s (6/min 30/5min 90/15min);
[2020-11-03 02:03:32 +0100] information/RemoteCheckQueue: items: 1, rate: 0/s (6/min 30/5min 90/15min);
[2020-11-03 02:03:42 +0100] information/RemoteCheckQueue: items: 0, rate: 0/s (6/min 30/5min 90/15min);
[2020-11-03 02:03:47 +0100] information/JsonRpcConnection: No messages for identity 'myserver01' have been received in the last 60 seconds.
[2020-11-03 02:03:47 +0100] warning/JsonRpcConnection: API client disconnected for identity 'myserver01'
[2020-11-03 02:03:47 +0100] warning/ApiListener: Removing API client for endpoint 'myserver01'. 0 API clients left.
[2020-11-03 02:03:57 +0100] information/ApiListener: Reconnecting to endpoint 'myserver01' via host '<redacted>' and port '5665'
[2020-11-03 02:04:02 +0100] information/RemoteCheckQueue: items: 0, rate: 0/s (6/min 30/5min 90/15min);
[2020-11-03 02:04:11 +0100] information/ConfigObject: Dumping program state to file '/var/lib/icinga2/icinga2.state'
[2020-11-03 02:04:12 +0100] information/RemoteCheckQueue: items: 0, rate: 0/s (12/min 60/5min 180/15min);
[2020-11-03 02:04:32 +0100] information/RemoteCheckQueue: items: 0, rate: 0/s (6/min 30/5min 90/15min);
[2020-11-03 02:04:37 +0100] information/Checkable: Checkable 'myserver01!http4_myserver01' has 6 notification(s). Checking filters for type 'Problem', sends will be logged.
[2020-11-03 02:04:55 +0100] information/Checkable: Checkable 'myserver01!ssh-ipv4' has 5 notification(s). Checking filters for type 'Problem', sends will be logged.
[2020-11-03 02:04:58 +0100] information/Checkable: Checkable 'myserver01!pop-ipv4' has 5 notification(s). Checking filters for type 'Problem', sends will be logged.
[2020-11-03 02:05:00 +0100] information/Checkable: Checkable 'myserver01!mailqueue' has 3 notification(s). Checking filters for type 'Problem', sends will be logged.
[2020-11-03 02:05:05 +0100] information/Checkable: Checkable 'myserver01!disk /' has 6 notification(s). Checking filters for type 'Problem', sends will be logged.
[2020-11-03 02:05:09 +0100] information/Checkable: Checkable 'myserver01!redis-instance' has 6 notification(s). Checking filters for type 'Problem', sends will be logged.
[2020-11-03 02:05:12 +0100] information/Checkable: Checkable 'myserver01!nginx-config' has 3 notification(s). Checking filters for type 'Problem', sends will be logged.
[2020-11-03 02:05:12 +0100] information/RemoteCheckQueue: items: 0, rate: 0/s (6/min 30/5min 90/15min);
[2020-11-03 02:05:13 +0100] information/Checkable: Checkable 'myserver01!spamd' has 3 notification(s). Checking filters for type 'Problem', sends will be logged.
[2020-11-03 02:05:13 +0100] information/Checkable: Checkable 'myserver01!load' has 3 notification(s). Checking filters for type 'Problem', sends will be logged.
[2020-11-03 02:05:14 +0100] information/Checkable: Checkable 'myserver01!http4_directadmin' has 6 notification(s). Checking filters for type 'Problem', sends will be logged.
[2020-11-03 02:05:14 +0100] information/Checkable: Checkable 'myserver01!iptables_custom_chain' has 3 notification(s). Checking filters for type 'Problem', sends will be logged.
[2020-11-03 02:05:26 +0100] information/Checkable: Checkable 'myserver01!ftp-ipv4' has 5 notification(s). Checking filters for type 'Problem', sends will be logged.
[2020-11-03 02:05:26 +0100] information/WorkQueue: #6 (ApiListener, SyncQueue) items: 0, rate: 0/s (0/min 0/5min 0/15min);
[2020-11-03 02:05:26 +0100] information/WorkQueue: #5 (ApiListener, RelayQueue) items: 0, rate: 392.533/s (23552/min 118490/5min 356845/15min);
[2020-11-03 02:05:26 +0100] information/WorkQueue: #7 (IdoMysqlConnection, ido-mysql) items: 0, rate: 198.033/s (11882/min 59449/5min 548068/15min);
[2020-11-03 02:05:29 +0100] information/Checkable: Checkable 'myserver01!filebeat' has 3 notification(s). Checking filters for type 'Problem', sends will be logged.
[2020-11-03 02:05:32 +0100] information/RemoteCheckQueue: items: 0, rate: 0/s (6/min 30/5min 90/15min);
[2020-11-03 02:05:37 +0100] information/Checkable: Checkable 'myserver01!readonly' has 6 notification(s). Checking filters for type 'Problem', sends will be logged.
[2020-11-03 02:05:39 +0100] information/Checkable: Checkable 'myserver01!clamd' has 3 notification(s). Checking filters for type 'Problem', sends will be logged.
[2020-11-03 02:05:39 +0100] information/Checkable: Checkable 'myserver01!imap-ipv4' has 5 notification(s). Checking filters for type 'Problem', sends will be logged.
[2020-11-03 02:05:42 +0100] information/RemoteCheckQueue: items: 0, rate: 0/s (6/min 30/5min 90/15min);
[2020-11-03 02:05:48 +0100] information/Checkable: Checkable 'myserver01!smtp-ipv4' has 5 notification(s). Checking filters for type 'Problem', sends will be logged.
[2020-11-03 02:05:51 +0100] information/Checkable: Checkable 'myserver01!fail2ban' has 3 notification(s). Checking filters for type 'Problem', sends will be logged.
[2020-11-03 02:05:52 +0100] information/RemoteCheckQueue: items: 0, rate: 0/s (6/min 30/5min 90/15min);
[2020-11-03 02:05:53 +0100] information/Checkable: Checkable 'myserver01!DNS check - localhost' has 6 notification(s). Checking filters for type 'Problem', sends will be logged.
[2020-11-03 02:05:56 +0100] information/WorkQueue: #7 (IdoMysqlConnection, ido-mysql) items: 2, rate: 196.667/s (11800/min 59597/5min 548064/15min);
[2020-11-03 02:05:59 +0100] information/Checkable: Checkable 'myserver01!freshclam' has 3 notification(s). Checking filters for type 'Problem', sends will be logged.
[2020-11-03 02:06:00 +0100] information/Checkable: Checkable 'myserver01!rsyslogd' has 3 notification(s). Checking filters for type 'Problem', sends will be logged.
[2020-11-03 02:06:04 +0100] information/Checkable: Checkable 'myserver01!mysql-threads running' has 6 notification(s). Checking filters for type 'Problem', sends will be logged.
[2020-11-03 02:06:06 +0100] information/WorkQueue: #7 (IdoMysqlConnection, ido-mysql) items: 2, rate: 199.033/s (11942/min 59647/5min 548117/15min);
[2020-11-03 02:06:07 +0100] information/Checkable: Checkable 'myserver01!php_systemd' has 6 notification(s). Checking filters for type 'Problem', sends will be logged.
[2020-11-03 02:06:12 +0100] information/RemoteCheckQueue: items: 0, rate: 0/s (12/min 60/5min 180/15min);
[2020-11-03 02:06:12 +0100] information/Checkable: Checkable 'myserver01!exim' has 5 notification(s). Checking filters for type 'Problem', sends will be logged.
[2020-11-03 02:06:16 +0100] information/WorkQueue: #7 (IdoMysqlConnection, ido-mysql) items: 2, rate: 204.567/s (12274/min 59857/5min 548461/15min);
[2020-11-03 02:06:18 +0100] information/Checkable: Checkable 'myserver01!cron' has 5 notification(s). Checking filters for type 'Problem', sends will be logged.
[2020-11-03 02:06:20 +0100] information/Checkable: Checkable 'myserver01!nginx' has 6 notification(s). Checking filters for type 'Problem', sends will be logged.
[2020-11-03 02:06:32 +0100] information/RemoteCheckQueue: items: 0, rate: 0/s (6/min 30/5min 90/15min);
[2020-11-03 02:06:36 +0100] information/WorkQueue: #7 (IdoMysqlConnection, ido-mysql) items: 2, rate: 201.767/s (12106/min 59815/5min 483110/15min);
[2020-11-03 02:06:39 +0100] information/Notification: Sending reminder 'Problem' notification 'myserver01!http4_myserver01!bunny-service-24x7' for user 'bunny'
[2020-11-03 02:06:39 +0100] information/Notification: Completed sending 'Problem' notification 'myserver01!http4_myserver01!bunny-service-24x7' for checkable 'myserver01!http4_myserver01' and user 'bunny' using command 'bunny-service'.
[2020-11-03 02:06:46 +0100] information/WorkQueue: #7 (IdoMysqlConnection, ido-mysql) items: 2, rate: 4595.17/s (275710/min 323553/5min 746667/15min);
[2020-11-03 02:06:56 +0100] information/WorkQueue: #7 (IdoMysqlConnection, ido-mysql) items: 2, rate: 4591.13/s (275468/min 323267/5min 746551/15min);
[2020-11-03 02:07:00 +0100] information/Notification: Sending reminder 'Problem' notification 'myserver01!pop-ipv4!bunny-service-24x7' for user 'bunny'
[2020-11-03 02:07:00 +0100] information/Notification: Completed sending 'Problem' notification 'myserver01!pop-ipv4!bunny-service-24x7' for checkable 'myserver01!pop-ipv4' and user 'bunny' using command 'bunny-service'.
[2020-11-03 02:07:00 +0100] information/Notification: Sending reminder 'Problem' notification 'myserver01!ssh-ipv4!mail-service-24x7' for user 'notice'
[2020-11-03 02:07:00 +0100] information/Notification: Completed sending 'Problem' notification 'myserver01!ssh-ipv4!mail-service-24x7' for checkable 'myserver01!ssh-ipv4' and user 'notice' using command 'mail-service-notification'.
[2020-11-03 02:07:05 +0100] information/Notification: Sending reminder 'Problem' notification 'myserver01!mailqueue!mail-service-24x7' for user 'notice'
[2020-11-03 02:07:05 +0100] information/Notification: Completed sending 'Problem' notification 'myserver01!mailqueue!mail-service-24x7' for checkable 'myserver01!mailqueue' and user 'notice' using command 'mail-service-notification'.
[2020-11-03 02:07:06 +0100] information/WorkQueue: #7 (IdoMysqlConnection, ido-mysql) items: 2, rate: 12366.7/s (742004/min 789725/5min 1212924/15min);
[2020-11-03 02:07:10 +0100] information/Notification: Sending reminder 'Problem' notification 'myserver01!disk /!mail-service-24x7' for user 'notice'
[2020-11-03 02:07:10 +0100] information/Notification: Completed sending 'Problem' notification 'myserver01!disk /!mail-service-24x7' for checkable 'myserver01!disk /' and user 'notice' using command 'mail-service-notification'.
[2020-11-03 02:07:11 +0100] information/Checkable: Checkable 'myserver01!http4_directadmin' has 6 notification(s). Checking filters for type 'Recovery', sends will be logged.
[2020-11-03 02:07:12 +0100] information/RemoteCheckQueue: items: 0, rate: 0/s (6/min 30/5min 90/15min);
[2020-11-03 02:07:15 +0100] information/Notification: Sending reminder 'Problem' notification 'myserver01!load!mail-service-24x7' for user 'notice'
[2020-11-03 02:07:15 +0100] information/Notification: Completed sending 'Problem' notification 'myserver01!load!mail-service-24x7' for checkable 'myserver01!load' and user 'notice' using command 'mail-service-notification'.
[2020-11-03 02:07:15 +0100] information/Notification: Sending reminder 'Problem' notification 'myserver01!iptables_custom_chain!mail-service-24x7' for user 'notice'
[2020-11-03 02:07:15 +0100] information/Notification: Completed sending 'Problem' notification 'myserver01!iptables_custom_chain!mail-service-24x7' for checkable 'myserver01!iptables_custom_chain' and user 'notice' using command 'mail-service-notification'.
[2020-11-03 02:07:16 +0100] information/WorkQueue: #7 (IdoMysqlConnection, ido-mysql) items: 3, rate: 20365.2/s (1221914/min 1269697/5min 1693138/15min);
[2020-11-03 02:07:23 +0100] information/Checkable: Checkable 'myserver01!ftp-ipv4' has 5 notification(s). Checking filters for type 'Recovery', sends will be logged.
[2020-11-03 02:07:26 +0100] information/WorkQueue: #7 (IdoMysqlConnection, ido-mysql) items: 2, rate: 22444.7/s (1346681/min 1394358/5min 1817753/15min);
[2020-11-03 02:07:31 +0100] information/Notification: Sending reminder 'Problem' notification 'myserver01!filebeat!mail-service-24x7' for user 'notice'
[2020-11-03 02:07:31 +0100] information/Notification: Completed sending 'Problem' notification 'myserver01!filebeat!mail-service-24x7' for checkable 'myserver01!filebeat' and user 'notice' using command 'mail-service-notification'.
[2020-11-03 02:07:32 +0100] information/RemoteCheckQueue: items: 0, rate: 0/s (6/min 30/5min 90/15min);
[2020-11-03 02:07:36 +0100] information/WorkQueue: #7 (IdoMysqlConnection, ido-mysql) items: 2, rate: 23483.1/s (1408988/min 1456709/5min 1879908/15min);
[2020-11-03 02:07:39 +0100] information/Checkable: Checkable 'myserver01!imap-ipv4' has 5 notification(s). Checking filters for type 'Recovery', sends will be logged.
[2020-11-03 02:07:41 +0100] information/Notification: Sending reminder 'Problem' notification 'myserver01!readonly!mail-service-24x7' for user 'notice'
[2020-11-03 02:07:41 +0100] information/Notification: Completed sending 'Problem' notification 'myserver01!readonly!mail-service-24x7' for checkable 'myserver01!readonly' and user 'notice' using command 'mail-service-notification'.
[2020-11-03 02:07:42 +0100] information/RemoteCheckQueue: items: 0, rate: 0/s (6/min 30/5min 90/15min);
[2020-11-03 02:07:44 +0100] information/Checkable: Checkable 'myserver01!smtp-ipv4' has 5 notification(s). Checking filters for type 'Recovery', sends will be logged.
[2020-11-03 02:07:46 +0100] information/WorkQueue: #7 (IdoMysqlConnection, ido-mysql) items: 2, rate: 21536.7/s (1292199/min 1603436/5min 2026883/15min);
[2020-11-03 02:07:48 +0100] information/Checkable: Checkable 'myserver01!DNS check - localhost' has 6 notification(s). Checking filters for type 'Recovery', sends will be logged.
[2020-11-03 02:07:52 +0100] information/RemoteCheckQueue: items: 0, rate: 0/s (6/min 30/5min 90/15min);
[2020-11-03 02:08:06 +0100] information/WorkQueue: #7 (IdoMysqlConnection, ido-mysql) items: 4, rate: 17985.8/s (1079147/min 1857166/5min 2280407/15min);
[2020-11-03 02:08:12 +0100] information/Notification: Sending reminder 'Problem' notification 'myserver01!php_systemd!mail-service-24x7' for user 'notice'
[2020-11-03 02:08:12 +0100] information/Notification: Completed sending 'Problem' notification 'myserver01!php_systemd!mail-service-24x7' for checkable 'myserver01!php_systemd' and user 'notice' using command 'mail-service-notification'.
[2020-11-03 02:08:12 +0100] information/RemoteCheckQueue: items: 0, rate: 0/s (12/min 60/5min 180/15min);
[2020-11-03 02:08:16 +0100] information/WorkQueue: #7 (IdoMysqlConnection, ido-mysql) items: 8, rate: 15547.8/s (932870/min 2190994/5min 2614205/15min);
[2020-11-03 02:08:22 +0100] information/Notification: Sending reminder 'Problem' notification 'myserver01!cron!mail-service-24x7' for user 'notice'
[2020-11-03 02:08:22 +0100] information/Notification: Completed sending 'Problem' notification 'myserver01!cron!mail-service-24x7' for checkable 'myserver01!cron' and user 'notice' using command 'mail-service-notification'.
[2020-11-03 02:08:22 +0100] information/Notification: Sending reminder 'Problem' notification 'myserver01!nginx!mail-service-24x7' for user 'notice'
[2020-11-03 02:08:22 +0100] information/Notification: Completed sending 'Problem' notification 'myserver01!nginx!mail-service-24x7' for checkable 'myserver01!nginx' and user 'notice' using command 'mail-service-notification'.
[2020-11-03 02:08:26 +0100] information/WorkQueue: #7 (IdoMysqlConnection, ido-mysql) items: 2, rate: 16892.7/s (1013563/min 2396393/5min 2819540/15min);
[2020-11-03 02:08:32 +0100] information/RemoteCheckQueue: items: 0, rate: 0/s (6/min 30/5min 90/15min);
[2020-11-03 02:08:46 +0100] information/WorkQueue: #7 (IdoMysqlConnection, ido-mysql) items: 6, rate: 13400.5/s (804031/min 2396263/5min 2819488/15min);
[2020-11-03 02:08:46 +0100] information/Checkable: Checkable 'myserver01!http4_myserver01' has 6 notification(s). Checking filters for type 'Recovery', sends will be logged.
[2020-11-03 02:08:46 +0100] information/Notification: Sending 'Recovery' notification 'myserver01!http4_myserver01!bunny-service-24x7' for user 'bunny'
[2020-11-03 02:08:46 +0100] information/Notification: Completed sending 'Recovery' notification 'myserver01!http4_myserver01!bunny-service-24x7' for checkable 'myserver01!http4_myserver01' and user 'bunny' using command 'bunny-service'.
[2020-11-03 02:08:56 +0100] information/Checkable: Checkable 'myserver01!pop-ipv4' has 5 notification(s). Checking filters for type 'Recovery', sends will be logged.
[2020-11-03 02:08:56 +0100] information/Notification: Sending 'Recovery' notification 'myserver01!pop-ipv4!bunny-service-24x7' for user 'bunny'
[2020-11-03 02:08:56 +0100] information/Notification: Completed sending 'Recovery' notification 'myserver01!pop-ipv4!bunny-service-24x7' for checkable 'myserver01!pop-ipv4' and user 'bunny' using command 'bunny-service'.
[2020-11-03 02:09:02 +0100] information/Checkable: Checkable 'myserver01!ssh-ipv4' has 5 notification(s). Checking filters for type 'Recovery', sends will be logged.
[2020-11-03 02:09:02 +0100] information/Notification: Sending 'Recovery' notification 'myserver01!ssh-ipv4!mail-service-24x7' for user 'notice'
[2020-11-03 02:09:02 +0100] information/Notification: Completed sending 'Recovery' notification 'myserver01!ssh-ipv4!mail-service-24x7' for checkable 'myserver01!ssh-ipv4' and user 'notice' using command 'mail-service-notification'.
[2020-11-03 02:09:12 +0100] information/RemoteCheckQueue: items: 0, rate: 0/s (6/min 30/5min 90/15min);
[2020-11-03 02:09:14 +0100] information/ConfigObject: Dumping program state to file '/var/lib/icinga2/icinga2.state'
[2020-11-03 02:09:36 +0100] information/WorkQueue: #7 (IdoMysqlConnection, ido-mysql) items: 6, rate: 9171.72/s (550303/min 2934512/5min 3357760/15min);
[2020-11-03 02:09:42 +0100] information/RemoteCheckQueue: items: 0, rate: 0/s (12/min 60/5min 180/15min);
[2020-11-03 02:09:46 +0100] information/WorkQueue: #7 (IdoMysqlConnection, ido-mysql) items: 16, rate: 7954.35/s (477261/min 2934658/5min 3357806/15min);
[2020-11-03 02:09:52 +0100] information/RemoteCheckQueue: items: 0, rate: 0/s (6/min 30/5min 90/15min);
[2020-11-03 02:09:56 +0100] information/WorkQueue: #7 (IdoMysqlConnection, ido-mysql) items: 4, rate: 5709.8/s (342588/min 2934620/5min 3174343/15min);
[2020-11-03 02:10:06 +0100] information/WorkQueue: #7 (IdoMysqlConnection, ido-mysql) items: 14, rate: 200.183/s (12011/min 2934906/5min 3174405/15min);
[2020-11-03 02:10:12 +0100] information/RemoteCheckQueue: items: 0, rate: 0/s (12/min 60/5min 180/15min);
[2020-11-03 02:10:16 +0100] information/WorkQueue: #5 (ApiListener, RelayQueue) items: 1, rate: 397.75/s (23865/min 119941/5min 357855/15min);
[2020-11-03 02:10:16 +0100] information/WorkQueue: #7 (IdoMysqlConnection, ido-mysql) items: 6, rate: 199.35/s (11961/min 2935166/5min 3174647/15min);
[2020-11-03 02:10:26 +0100] information/WorkQueue: #7 (IdoMysqlConnection, ido-mysql) items: 16, rate: 198.75/s (11925/min 2934840/5min 3174631/15min);
[2020-11-03 02:10:32 +0100] information/RemoteCheckQueue: items: 0, rate: 0/s (6/min 30/5min 90/15min);
[2020-11-03 02:10:36 +0100] information/WorkQueue: #6 (ApiListener, SyncQueue) items: 0, rate: 0/s (0/min 0/5min 0/15min);
[2020-11-03 02:10:36 +0100] information/WorkQueue: #5 (ApiListener, RelayQueue) items: 0, rate: 404.6/s (24276/min 119753/5min 357725/15min);
[2020-11-03 02:10:36 +0100] information/WorkQueue: #7 (IdoMysqlConnection, ido-mysql) items: 0, rate: 202.383/s (12143/min 2935036/5min 3174549/15min);
[2020-11-03 02:10:56 +0100] information/WorkQueue: #7 (IdoMysqlConnection, ido-mysql) items: 24, rate: 200.2/s (12012/min 2934814/5min 3174489/15min);
[2020-11-03 02:11:12 +0100] information/RemoteCheckQueue: items: 0, rate: 0/s (6/min 30/5min 90/15min);
[2020-11-03 02:11:16 +0100] information/WorkQueue: #7 (IdoMysqlConnection, ido-mysql) items: 7, rate: 192.767/s (11566/min 2934383/5min 3174184/15min);
[2020-11-03 02:11:26 +0100] information/WorkQueue: #7 (IdoMysqlConnection, ido-mysql) items: 10, rate: 194.417/s (11665/min 2934610/5min 3174105/15min);
[2020-11-03 02:11:36 +0100] information/WorkQueue: #7 (IdoMysqlConnection, ido-mysql) items: 6, rate: 192.583/s (11555/min 2934514/5min 3174125/15min);
[2020-11-03 02:11:42 +0100] information/RemoteCheckQueue: items: 0, rate: 0/s (12/min 60/5min 180/15min);
[2020-11-03 02:11:46 +0100] information/WorkQueue: #7 (IdoMysqlConnection, ido-mysql) items: 44, rate: 192.783/s (11567/min 2670724/5min 3173911/15min);
[2020-11-03 02:11:52 +0100] information/RemoteCheckQueue: items: 0, rate: 0/s (6/min 30/5min 90/15min);
[2020-11-03 02:12:06 +0100] information/WorkQueue: #7 (IdoMysqlConnection, ido-mysql) items: 12, rate: 197.817/s (11869/min 2204582/5min 3174491/15min);
[2020-11-03 02:12:12 +0100] information/RemoteCheckQueue: items: 0, rate: 0/s (12/min 60/5min 180/15min);
[2020-11-03 02:12:16 +0100] information/WorkQueue: #7 (IdoMysqlConnection, ido-mysql) items: 2, rate: 203.283/s (12197/min 1724715/5min 3174537/15min);
[2020-11-03 02:12:32 +0100] information/RemoteCheckQueue: items: 0, rate: 0/s (6/min 30/5min 90/15min);
[2020-11-03 02:13:12 +0100] information/RemoteCheckQueue: items: 0, rate: 0/s (6/min 30/5min 90/15min);
[2020-11-03 02:13:42 +0100] information/RemoteCheckQueue: items: 0, rate: 0/s (12/min 60/5min 180/15min);
[2020-11-03 02:13:52 +0100] information/RemoteCheckQueue: items: 0, rate: 0/s (6/min 30/5min 90/15min);
[2020-11-03 02:14:12 +0100] information/RemoteCheckQueue: items: 0, rate: 0/s (12/min 60/5min 180/15min);
[2020-11-03 02:14:17 +0100] information/ConfigObject: Dumping program state to file '/var/lib/icinga2/icinga2.state'
[2020-11-03 02:14:32 +0100] information/RemoteCheckQueue: items: 0, rate: 0/s (6/min 30/5min 90/15min);
[2020-11-03 02:14:46 +0100] information/WorkQueue: #7 (IdoMysqlConnection, ido-mysql) items: 42, rate: 206.05/s (12363/min 59839/5min 3054321/15min);
[2020-11-03 02:15:08 +0100] information/Notification: Sending reminder 'Problem' notification 'myserver01!disk /!mail-service-night-pd' for user 'pagerduty'
[2020-11-03 02:15:08 +0100] information/Notification: Completed sending 'Problem' notification 'myserver01!disk /!mail-service-night-pd' for checkable 'myserver01!disk /' and user 'pagerduty' using command 'notify-service-by-pagerduty'.
[2020-11-03 02:15:12 +0100] information/RemoteCheckQueue: items: 0, rate: 0/s (6/min 30/5min 90/15min);
[2020-11-03 02:15:19 +0100] information/Notification: Sending reminder 'Problem' notification 'myserver02!rbl-ipv4!mail-service-24x7' for user 'notice'
[2020-11-03 02:15:19 +0100] information/Notification: Completed sending 'Problem' notification 'myserver02!rbl-ipv4!mail-service-24x7' for checkable 'myserver02!rbl-ipv4' and user 'notice' using command 'mail-service-notification'.
[2020-11-03 02:15:24 +0100] information/Notification: Sending reminder 'Problem' notification 'myserver03!disk /!bunny-service-24x7' for user 'bunny'
[2020-11-03 02:15:24 +0100] information/Notification: Completed sending 'Problem' notification 'myserver03!disk /!bunny-service-24x7' for checkable 'myserver03!disk /' and user 'bunny' using command 'bunny-service'.
[2020-11-03 02:15:39 +0100] information/Notification: Sending reminder 'Problem' notification 'myserver01!readonly!mail-service-night-pd' for user 'pagerduty'
[2020-11-03 02:15:39 +0100] information/Notification: Completed sending 'Problem' notification 'myserver01!readonly!mail-service-night-pd' for checkable 'myserver01!readonly' and user 'pagerduty' using command 'notify-service-by-pagerduty'.
[2020-11-03 02:15:42 +0100] information/RemoteCheckQueue: items: 0, rate: 0/s (12/min 60/5min 180/15min);
[2020-11-03 02:15:46 +0100] information/WorkQueue: #5 (ApiListener, RelayQueue) items: 0, rate: 373.783/s (22427/min 118309/5min 356165/15min);
[2020-11-03 02:15:46 +0100] information/WorkQueue: #6 (ApiListener, SyncQueue) items: 0, rate: 0/s (0/min 0/5min 0/15min);
[2020-11-03 02:15:46 +0100] information/WorkQueue: #7 (IdoMysqlConnection, ido-mysql) items: 0, rate: 7738.1/s (464286/min 512268/5min 3506591/15min);
[2020-11-03 02:15:52 +0100] information/RemoteCheckQueue: items: 0, rate: 0/s (6/min 30/5min 90/15min);
[2020-11-03 02:16:10 +0100] information/Notification: Sending reminder 'Problem' notification 'myserver01!php_systemd!mail-service-night-pd' for user 'pagerduty'
[2020-11-03 02:16:10 +0100] information/Notification: Completed sending 'Problem' notification 'myserver01!php_systemd!mail-service-night-pd' for checkable 'myserver01!php_systemd' and user 'pagerduty' using command 'notify-service-by-pagerduty'.
[2020-11-03 02:16:12 +0100] information/RemoteCheckQueue: items: 0, rate: 0/s (12/min 60/5min 180/15min);
[2020-11-03 02:16:26 +0100] information/Notification: Sending reminder 'Problem' notification 'myserver01!nginx!mail-service-night-pd' for user 'pagerduty'
[2020-11-03 02:16:26 +0100] information/Notification: Completed sending 'Problem' notification 'myserver01!nginx!mail-service-night-pd' for checkable 'myserver01!nginx' and user 'pagerduty' using command 'notify-service-by-pagerduty'.
[2020-11-03 02:16:32 +0100] information/RemoteCheckQueue: items: 0, rate: 0/s (6/min 30/5min 90/15min);
[2020-11-03 02:17:04 +0100] information/Application: Received request to shut down.
[2020-11-03 02:17:04 +0100] information/Application: Shutting down...
[2020-11-03 02:17:05 +0100] warning/JsonRpcConnection: API client disconnected for identity 'icinga01-ams'
[2020-11-03 02:17:05 +0100] warning/ApiListener: Removing API client for endpoint 'icinga01-ams'. 0 API clients left.
[2020-11-03 02:17:07 +0100] information/ApiListener: Reconnecting to endpoint 'icinga01-ams' via host 'icinga01-ams' and port '5665'
This is the Zone/Endpoint object on the Icinga Master:
Copyright (c) 2012-2020 Icinga GmbH (https://icinga.com/)
License GPLv2+: GNU GPL version 2 or later http://gnu.org/licenses/gpl2.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
System information:
Platform: Debian GNU/Linux
Platform version: 9 (stretch)
Kernel: Linux
Kernel version: 4.9.0-12-amd64
Architecture: x86_64
Build information:
Compiler: GNU 6.3.0
Build host: runner-ltrjqz9n-project-298-concurrent-0
Application information:
General paths:
Config directory: /etc/icinga2
Data directory: /var/lib/icinga2
Log directory: /var/log/icinga2
Cache directory: /var/cache/icinga2
Spool directory: /var/spool/icinga2
Run directory: /run/icinga2
Old paths (deprecated):
Installation root: /usr
Sysconf directory: /etc
Run directory (base): /run
Local state directory: /var
Internal paths:
Package data directory: /usr/share/icinga2
State path: /var/lib/icinga2/icinga2.state
Modified attributes path: /var/lib/icinga2/modified-attributes.conf
Objects path: /var/cache/icinga2/icinga2.debug
Vars path: /var/cache/icinga2/icinga2.vars
PID path: /run/icinga2/icinga2.pid
Copyright (c) 2012-2020 Icinga GmbH (https://icinga.com/)
License GPLv2+: GNU GPL version 2 or later http://gnu.org/licenses/gpl2.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
System information:
Platform: Debian GNU/Linux
Platform version: 9 (stretch)
Kernel: Linux
Kernel version: 4.9.0-13-amd64
Architecture: x86_64
* If you run multiple Icinga 2 instances, the `zones.conf` file (or `icinga2 object list --type Endpoint` and `icinga2 object list --type Zone`) from all affected nodes.
I assume config linked above is enough to cover this..?
Hoping to see your response, thank you!
Hello,
Starting about a year ago we started having issues when a servers that we monitor goes down uncleanly and reboots (i.e. during an out of memory event). Normally, after the reboot, Icinga reconnects properly and all is fine. Now however, often, but not always, we will see that the server that we are monitoring will no longer connect to the Icinga Master server.
The checks which are executed on the monitored server will error with "Remote Icinga instance '$server' is not connected to '$icinga'.
This problem does not resolve itself. When we restart Icinga on the monitored server it will not resolve itself either.
Only when we restart Icinga on the master server it will recover.
We've had this for a long time, about a year I'd guess at least. I had hoped it would fix itself in a newer version but since it hasn't yet I'm reporting it now.
Our setup consists out of two Icinga Master instances, known as icinga01-ams and icinga01-ede. Setup as described here: https://icinga.com/docs/icinga2/latest/doc/06-distributed-monitoring/#high-availability-master-with-agents
I don't see anything significant in the log. Here's an output from the Icinga server when it happens. The server with issues, known as myserver01 reboots around 02:03, but only reconnects with Icinga Server when I restart the Icinga Server on both masters around 02:17.
This is the Zone/Endpoint object on the Icinga Master:
And this is the zone/Endpoint we have on myserver01:
Your Environment
Include as many relevant details about the environment you experienced the problem in
icinga2 --version
): Icinga Master:Copyright (c) 2012-2020 Icinga GmbH (https://icinga.com/) License GPLv2+: GNU GPL version 2 or later http://gnu.org/licenses/gpl2.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law.
System information: Platform: Debian GNU/Linux Platform version: 9 (stretch) Kernel: Linux Kernel version: 4.9.0-12-amd64 Architecture: x86_64
Build information: Compiler: GNU 6.3.0 Build host: runner-ltrjqz9n-project-298-concurrent-0
Application information:
General paths: Config directory: /etc/icinga2 Data directory: /var/lib/icinga2 Log directory: /var/log/icinga2 Cache directory: /var/cache/icinga2 Spool directory: /var/spool/icinga2 Run directory: /run/icinga2
Old paths (deprecated): Installation root: /usr Sysconf directory: /etc Run directory (base): /run Local state directory: /var
Internal paths: Package data directory: /usr/share/icinga2 State path: /var/lib/icinga2/icinga2.state Modified attributes path: /var/lib/icinga2/modified-attributes.conf Objects path: /var/cache/icinga2/icinga2.debug Vars path: /var/cache/icinga2/icinga2.vars PID path: /run/icinga2/icinga2.pid
icinga2 --version icinga2 - The Icinga 2 network monitoring daemon (version: r2.12.1-1)
Copyright (c) 2012-2020 Icinga GmbH (https://icinga.com/) License GPLv2+: GNU GPL version 2 or later http://gnu.org/licenses/gpl2.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law.
System information: Platform: Debian GNU/Linux Platform version: 9 (stretch) Kernel: Linux Kernel version: 4.9.0-13-amd64 Architecture: x86_64
Build information: Compiler: GNU 6.3.0 Build host: runner-wytxxqbb-project-298-concurrent-0 OpenSSL version: OpenSSL 1.1.0l 10 Sep 2019
Application information:
General paths: Config directory: /etc/icinga2 Data directory: /var/lib/icinga2 Log directory: /var/log/icinga2 Cache directory: /var/cache/icinga2 Spool directory: /var/spool/icinga2 Run directory: /run/icinga2
Old paths (deprecated): Installation root: /usr Sysconf directory: /etc Run directory (base): /run Local state directory: /var
Internal paths: Package data directory: /usr/share/icinga2 State path: /var/lib/icinga2/icinga2.state Modified attributes path: /var/lib/icinga2/modified-attributes.conf Objects path: /var/cache/icinga2/icinga2.debug Vars path: /var/cache/icinga2/icinga2.vars PID path: /run/icinga2/icinga2.pid
Disabled features: command compatlog debuglog elasticsearch gelf graphite influxdb livestatus opentsdb perfdata statusdata syslog Enabled features: api checker ido-mysql mainlog notification
Disabled features: command compatlog debuglog elasticsearch gelf graphite icingadb influxdb livestatus notification opentsdb perfdata statusdata syslog Enabled features: api checker mainlog
[2020-11-04 12:33:50 +0100] information/cli: Icinga application loader (version: r2.11.4-1) [2020-11-04 12:33:50 +0100] information/cli: Loading configuration file(s). [2020-11-04 12:33:50 +0100] information/ConfigItem: Committing config item(s). [2020-11-04 12:33:50 +0100] information/ApiListener: My API identity: icinga01-ede [2020-11-04 12:33:56 +0100] warning/ApplyRule: Apply rule 'apt' (in /etc/icinga2/zones.d/global-templates/apt.conf: 3:1-3:19) for type 'Service' does not match anywhere! [2020-11-04 12:33:56 +0100] warning/ApplyRule: Apply rule 'check-backup-status' (in /etc/icinga2/zones.d/global-templates/check-backup-status.conf: 3:1-3:35) for type 'Service' does not match anywhere! [2020-11-04 12:33:56 +0100] information/ConfigItem: Instantiated 1 FileLogger. [2020-11-04 12:33:56 +0100] information/ConfigItem: Instantiated 8 NotificationCommands. [2020-11-04 12:33:56 +0100] information/ConfigItem: Instantiated 1 NotificationComponent. [2020-11-04 12:33:56 +0100] information/ConfigItem: Instantiated 47268 Notifications. [2020-11-04 12:33:56 +0100] information/ConfigItem: Instantiated 1 IcingaApplication. [2020-11-04 12:33:56 +0100] information/ConfigItem: Instantiated 397 Hosts. [2020-11-04 12:33:56 +0100] information/ConfigItem: Instantiated 1 ApiListener. [2020-11-04 12:33:56 +0100] information/ConfigItem: Instantiated 10 Comments. [2020-11-04 12:33:56 +0100] information/ConfigItem: Instantiated 1 CheckerComponent. [2020-11-04 12:33:56 +0100] information/ConfigItem: Instantiated 399 Zones. [2020-11-04 12:33:56 +0100] information/ConfigItem: Instantiated 386 Endpoints. [2020-11-04 12:33:56 +0100] information/ConfigItem: Instantiated 4 ApiUsers. [2020-11-04 12:33:56 +0100] information/ConfigItem: Instantiated 1 IdoMysqlConnection. [2020-11-04 12:33:56 +0100] information/ConfigItem: Instantiated 268 CheckCommands. [2020-11-04 12:33:56 +0100] information/ConfigItem: Instantiated 6 TimePeriods. [2020-11-04 12:33:56 +0100] information/ConfigItem: Instantiated 5 Users. [2020-11-04 12:33:56 +0100] information/ConfigItem: Instantiated 10832 Services. [2020-11-04 12:33:56 +0100] information/ScriptGlobal: Dumping variables to file '/var/cache/icinga2/icinga2.vars' [2020-11-04 12:33:56 +0100] information/cli: Finished validating the configuration file(s).