Icinga / icinga2

The core of our monitoring platform with a powerful configuration language and REST API.
https://icinga.com/docs/icinga2/latest
GNU General Public License v2.0
2.02k stars 577 forks source link

Multiple Notifications: State changed detected from DOWN to DOWN #10098

Open danielmoser96 opened 4 months ago

danielmoser96 commented 4 months ago

Describe the bug

We have noticed that notifications are sent multiple times despite the definition of interval=0. This is due to the fact that Icinga sporadically recognizes event changes from DOWN to DOWN or CRITICAL to CRITICAL.

debug log:

[2024-07-11 11:10:10 +0200] notice/NotificationComponent: Reminder notification 'XXX!Prio_123_Mail_Host': Notification was sent out once and interval=0 disables reminder notifications.
[2024-07-11 11:10:42 +0200] debug/Checkable: Update checkable 'XXX' with check interval '30' from last check time at 2024-07-11 11:10:42 +0200 (1.72069e+09) to next check time at 2024-07-11 11:10:47 +0200 (1.72069e+09).
[2024-07-11 11:10:42 +0200] notice/Checkable: State Change: Checkable 'XXX' soft state change from DOWN to DOWN detected.
[2024-07-11 11:11:21 +0200] debug/Checkable: Update checkable 'XXX' with check interval '30' from last check time at 2024-07-11 11:11:21 +0200 (1.72069e+09) to next check time at 2024-07-11 11:11:25 +0200 (1.72069e+09).
[2024-07-11 11:11:21 +0200] notice/Checkable: State Change: Checkable 'XXX' soft state change from DOWN to DOWN detected.
[2024-07-11 11:11:59 +0200] debug/Checkable: Update checkable 'XXX' with check interval '30' from last check time at 2024-07-11 11:11:59 +0200 (1.72069e+09) to next check time at 2024-07-11 11:12:03 +0200 (1.72069e+09).
[2024-07-11 11:11:59 +0200] notice/Checkable: State Change: Checkable 'XXX' soft state change from DOWN to DOWN detected.
[2024-07-11 11:12:37 +0200] debug/Checkable: Update checkable 'XXX' with check interval '30' from last check time at 2024-07-11 11:12:37 +0200 (1.72069e+09) to next check time at 2024-07-11 11:12:42 +0200 (1.72069e+09).
[2024-07-11 11:12:37 +0200] notice/Checkable: State Change: Checkable 'XXX' soft state change from DOWN to DOWN detected.
[2024-07-11 11:13:16 +0200] debug/Checkable: Update checkable 'XXX' with check interval '30' from last check time at 2024-07-11 11:13:15 +0200 (1.72069e+09) to next check time at 2024-07-11 11:13:45 +0200 (1.72069e+09).
[2024-07-11 11:13:16 +0200] notice/Checkable: State Change: Checkable 'XXX' hard state change from DOWN to DOWN detected.
[2024-07-11 11:13:16 +0200] notice/NotificationComponent: Attempting to send reminder notification 'XXX!Prio_123_Mail_Host'.
[2024-07-11 11:13:16 +0200] notice/Notification: Attempting to send reminder notifications of type 'Problem' for notification object 'XXX!Prio_123_Mail_Host'.
[2024-07-11 11:13:16 +0200] debug/Notification: User 'Mail' notification 'XXX!Prio_123_Mail_Host', Type 'Problem', TypeFilter: Acknowledgement, Custom, DowntimeEnd, DowntimeRemoved, DowntimeStart, FlappingEnd, FlappingStart, Problem and Recovery (FType=32, TypeFilter=32)
[2024-07-11 11:13:16 +0200] debug/Notification: User 'Mail' notification 'XXX!Prio_123_Mail_Host', State 'Down', StateFilter: Critical, Down, OK, Unknown, Up and Warning (FState=32, StateFilter=-1)
[2024-07-11 11:13:16 +0200] information/Notification: Sending reminder 'Problem' notification 'XXX!Prio_123_Mail_Host' for user 'Mail'
[2024-07-11 11:13:16 +0200] notice/Process: Running command '/usr/lib64/nagios/plugins/xxx/action_notify_by_mail.py' ... : PID 460440
[2024-07-11 11:13:16 +0200] information/Notification: Completed sending 'Problem' notification 'XXX!Prio_123_Mail_Host' for checkable 'XXX' and user 'Mail' using command 'action_notify_by_mail_host'.
[2024-07-11 11:13:16 +0200] notice/Process: PID 460440 ('/usr/lib64/nagios/plugins/xxx/action_notify_by_mail.py' ... ) terminated with exit code 0

To Reproduce

  1. Create a Host with an IP which is not available.
  2. Enable this Notification Rule:
    zones.d/master/notification_apply.conf
    apply Notification "Prio_123_Mail_Host" to Host {
    command = "[action_notify_by_mail_host]"
    interval = 0s
    period = "24x7"
    assign where host.name
    states = [ Down ]
    types = [ Problem ]
    users = [ "Mail" ]
    }
  3. Wait some time

Expected behavior

A host state change from DOWN to DOWN should not occur.

Your Environment

Include as many relevant details about the environment you experienced the problem in

[2024-07-11 11:26:03 +0200] information/cli: Icinga application loader (version: r2.14.2-1) [2024-07-11 11:26:03 +0200] information/cli: Loading configuration file(s). [2024-07-11 11:26:03 +0200] information/ConfigItem: Committing config item(s). [2024-07-11 11:26:03 +0200] warning/Zone: The Zone object 'master' has more than two endpoints. Due to a known issue this type of configuration is strongly discouraged and may cause Icinga to use excessive amounts of CPU time. [2024-07-11 11:26:03 +0200] information/ApiListener: My API identity: srv123.xxx.com [2024-07-11 11:26:03 +0200] information/ConfigItem: Instantiated 1 NotificationComponent. [2024-07-11 11:26:03 +0200] information/ConfigItem: Instantiated 1 LivestatusListener. [2024-07-11 11:26:03 +0200] information/ConfigItem: Instantiated 1 GraphiteWriter. [2024-07-11 11:26:03 +0200] information/ConfigItem: Instantiated 1937 Downtimes. [2024-07-11 11:26:03 +0200] information/ConfigItem: Instantiated 1 ExternalCommandListener. [2024-07-11 11:26:03 +0200] information/ConfigItem: Instantiated 354 Dependencies. [2024-07-11 11:26:03 +0200] information/ConfigItem: Instantiated 1 CheckerComponent. [2024-07-11 11:26:03 +0200] information/ConfigItem: Instantiated 8 Users. [2024-07-11 11:26:03 +0200] information/ConfigItem: Instantiated 12 TimePeriods. [2024-07-11 11:26:03 +0200] information/ConfigItem: Instantiated 1 ServiceGroup. [2024-07-11 11:26:03 +0200] information/ConfigItem: Instantiated 6609 Services. [2024-07-11 11:26:03 +0200] information/ConfigItem: Instantiated 1466 ScheduledDowntimes. [2024-07-11 11:26:03 +0200] information/ConfigItem: Instantiated 4 Zones. [2024-07-11 11:26:03 +0200] information/ConfigItem: Instantiated 11 NotificationCommands. [2024-07-11 11:26:03 +0200] information/ConfigItem: Instantiated 19039 Notifications. [2024-07-11 11:26:03 +0200] information/ConfigItem: Instantiated 1 FileLogger. [2024-07-11 11:26:03 +0200] information/ConfigItem: Instantiated 1 IcingaApplication. [2024-07-11 11:26:03 +0200] information/ConfigItem: Instantiated 2359 Hosts. [2024-07-11 11:26:03 +0200] information/ConfigItem: Instantiated 81 HostGroups. [2024-07-11 11:26:03 +0200] information/ConfigItem: Instantiated 3 Endpoints. [2024-07-11 11:26:03 +0200] information/ConfigItem: Instantiated 38 Comments. [2024-07-11 11:26:03 +0200] information/ConfigItem: Instantiated 13 ApiUsers. [2024-07-11 11:26:03 +0200] information/ConfigItem: Instantiated 1 ApiListener. [2024-07-11 11:26:03 +0200] information/ConfigItem: Instantiated 337 CheckCommands. [2024-07-11 11:26:03 +0200] information/ConfigItem: Instantiated 1 IcingaDB. [2024-07-11 11:26:03 +0200] information/ScriptGlobal: Dumping variables to file '/var/cache/icinga2/icinga2.vars' [2024-07-11 11:26:03 +0200] information/cli: Finished validating the configuration file(s).

danielmoser96 commented 4 months ago

It gets even worse. We now have DOWN messages even though the host is set to ACKNOWLEDGE.

image

danielmoser96 commented 2 months ago

Push