centreon / centreon-archived

Centreon is a network, system and application monitoring tool. Centreon is the only AIOps Platform Providing Holistic Visibility to Complex IT Workflows from Cloud to Edge.
https://www.centreon.com
GNU General Public License v2.0
574 stars 240 forks source link

retry_interval not respected for host checks #7068

Open sbraz opened 5 years ago

sbraz commented 5 years ago

BUG REPORT INFORMATION

Centreon Engine version: 1.8.1

Steps to reproduce the issue: Hi, I have the following configuration which is supposed to check hosts every 20 minutes:

define host {
    host_name                      an_example_host
    alias                          xxxxxxx
    address                        x
    register                       1
    use                            TPLT_ROUTEUR_A
    parents                        xxxxx
    _HOST_ID                       9741
}

define host {
    name                           TPLT_ROUTEUR_A
    alias                          TPLT_ROUTEUR_A
    timezone                       :Europe/Paris
    check_command                  check_ping
    register                       0
    use                            gen_A_20min_notif_45min
    _SNMPCOMMUNITY                 xxxxxxx
    _SNMPVERSION                   2c
}

define host {
    name                           gen_A_20min_notif_45min
    alias                          gen_A_20min_notif_45min
    contacts                       xxxxxx
    check_period                   24x7
    notification_period            24x7
    max_check_attempts             3
    check_interval                 20
    retry_interval                 20
    notification_interval          0
    notification_options           d,r
    first_notification_delay       45
    recovery_notification_delay    0
    stalking_options               o,d,u
    register                       0
    active_checks_enabled          1
    passive_checks_enabled         1
    notifications_enabled          1
}

Here is my centengine.cfg.

Describe the results you expected: Checks for this host should be performed every 20 minutes, regardless of its state.

Describe the results you received: Only the normal check interval is respected, retries are done every minute, here is an excerpt from the logs with stalking enabled:

2018-12-17T14:45:17+0100 [1545054317] [1975] HOST ALERT: an_example_host;UP;HARD;1;PING OK - Packet loss = 0%, RTA = 45.32 ms
2018-12-17T15:05:22+0100 [1545055522] [1975] HOST ALERT: an_example_host;UP;HARD;1;PING OK - Packet loss = 0%, RTA = 45.60 ms
2018-12-17T15:25:27+0100 [1545056727] [1975] HOST ALERT: an_example_host;UP;HARD;1;PING OK - Packet loss = 0%, RTA = 70.90 ms
2018-12-17T15:45:37+0100 [1545057937] [1975] HOST ALERT: an_example_host;UP;HARD;1;PING OK - Packet loss = 0%, RTA = 42.09 ms
2018-12-17T16:05:47+0100 [1545059147] [1975] HOST ALERT: an_example_host;DOWN;SOFT;1;PING CRITICAL - Packet loss = 0%, RTA = 440.77 ms
2018-12-17T16:06:12+0100 [1545059172] [1975] HOST ALERT: an_example_host;UP;SOFT;2;PING OK - Packet loss = 0%, RTA = 29.37 ms
2018-12-17T16:25:51+0100 [1545060351] [1975] HOST ALERT: an_example_host;UP;HARD;1;PING OK - Packet loss = 0%, RTA = 35.64 ms
2018-12-17T16:45:56+0100 [1545061556] [1975] HOST ALERT: an_example_host;UP;HARD;1;PING OK - Packet loss = 0%, RTA = 87.21 ms
2018-12-17T17:06:01+0100 [1545062761] [1975] HOST ALERT: an_example_host;UP;HARD;1;PING OK - Packet loss = 0%, RTA = 29.64 ms
2018-12-17T17:26:06+0100 [1545063966] [1975] HOST ALERT: an_example_host;UP;HARD;1;PING OK - Packet loss = 0%, RTA = 44.90 ms
2018-12-17T17:46:10+0100 [1545065170] [1975] HOST ALERT: an_example_host;UP;HARD;1;PING OK - Packet loss = 0%, RTA = 23.36 ms
2018-12-17T18:06:15+0100 [1545066375] [1975] HOST ALERT: an_example_host;UP;HARD;1;PING OK - Packet loss = 0%, RTA = 23.43 ms
2018-12-17T18:26:20+0100 [1545067580] [1975] HOST ALERT: an_example_host;UP;HARD;1;PING OK - Packet loss = 0%, RTA = 36.63 ms
2018-12-17T18:46:25+0100 [1545068785] [1975] HOST ALERT: an_example_host;UP;HARD;1;PING OK - Packet loss = 0%, RTA = 88.78 ms
2018-12-17T19:06:35+0100 [1545069995] [1975] HOST ALERT: an_example_host;DOWN;SOFT;1;PING CRITICAL - Packet loss = 0%, RTA = 866.64 ms
2018-12-17T19:07:25+0100 [1545070045] [1975] HOST ALERT: an_example_host;DOWN;SOFT;2;PING CRITICAL - Packet loss = 0%, RTA = 539.13 ms
2018-12-17T19:07:25+0100 [1545070045] [1975] HOST ALERT: an_example_host;DOWN;SOFT;2;PING CRITICAL - Packet loss = 0%, RTA = 539.13 ms
2018-12-17T19:08:25+0100 [1545070105] [1975] HOST ALERT: an_example_host;DOWN;HARD;3;PING CRITICAL - Packet loss = 0%, RTA = 412.98 ms
2018-12-17T19:08:25+0100 [1545070105] [1975] HOST ALERT: an_example_host;DOWN;HARD;3;PING CRITICAL - Packet loss = 0%, RTA = 412.98 ms
2018-12-17T19:10:15+0100 [1545070215] [1975] HOST ALERT: an_example_host;UP;HARD;1;PING WARNING - Packet loss = 0%, RTA = 354.04 ms
2018-12-17T19:26:45+0100 [1545071205] [1975] HOST ALERT: an_example_host;DOWN;SOFT;1;PING CRITICAL - Packet loss = 0%, RTA = 784.59 ms
2018-12-17T19:27:20+0100 [1545071240] [1975] HOST ALERT: an_example_host;UP;SOFT;2;PING WARNING - Packet loss = 0%, RTA = 333.92 ms
2018-12-17T19:46:55+0100 [1545072415] [1975] HOST ALERT: an_example_host;DOWN;SOFT;1;PING CRITICAL - Packet loss = 16%, RTA = 1097.51 ms
2018-12-17T19:47:20+0100 [1545072440] [1975] HOST ALERT: an_example_host;DOWN;SOFT;2;PING CRITICAL - Packet loss = 0%, RTA = 754.73 ms
2018-12-17T19:47:20+0100 [1545072440] [1975] HOST ALERT: an_example_host;DOWN;SOFT;2;PING CRITICAL - Packet loss = 0%, RTA = 754.73 ms
2018-12-17T19:48:25+0100 [1545072505] [1975] HOST ALERT: an_example_host;DOWN;HARD;3;PING CRITICAL - Packet loss = 0%, RTA = 555.49 ms
2018-12-17T19:48:25+0100 [1545072505] [1975] HOST ALERT: an_example_host;DOWN;HARD;3;PING CRITICAL - Packet loss = 0%, RTA = 555.49 ms
2018-12-17T19:50:45+0100 [1545072645] [1975] HOST ALERT: an_example_host;DOWN;HARD;1;PING CRITICAL - Packet loss = 0%, RTA = 721.06 ms
sbraz commented 5 years ago

I've enabled stalking on the services and I can confirm that it is not a failed service check that triggered the host check: image One host check happened at 11:40:55 and the following at 11:42:10.