Icinga / icinga2

The core of our monitoring platform with a powerful configuration language and REST API.
https://icinga.com/docs/icinga2/latest
GNU General Public License v2.0
2k stars 574 forks source link

(scheduled) Downtimes - Notifications are not suppressed #9654

Open mschroeder21 opened 1 year ago

mschroeder21 commented 1 year ago

Describe the bug

Notifications are not suppressed during (scheduled) Downtimes.

To Reproduce

  1. Create a scheduled Downtime
  2. Wait for a State change
  3. Notification will be send

Expected behavior

Notification will be suppressed.

Screenshots

icinga_scheduled_downtime

Your Environment

Include as many relevant details about the environment you experienced the problem in

Additional context

Maybe helpful: max_check_attempt is set to 1 for this check.

julianbrost commented 1 year ago

If you run multiple Icinga 2 instances, the zones.conf file (or icinga2 object list --type Endpoint and icinga2 object list --type Zone) from all affected nodes.: HA Master + Agents (Details can not be shared publicly)

Can you please share at least some more details on the structure, in particular:

  1. Are the agents connected directly to the masters?
  2. Is the check executed using command_endpoint?
mschroeder21 commented 1 year ago

Yes, the agents are directly connected to the masters. We don't have any satellites in this enviroment. Yes, the check is executed using command_endpoint

mschroeder21 commented 1 year ago

Any news or ideas what's happening here?

0xliam commented 1 year ago

This might be unrelated to this issue, but we are seeing issues with notifications being sent when a host should be in downtime after a config change is made. If a host or service problem is acknowledged, or put into scheduled downtime, and then a configuration change is made via Icinga Director, those acks and downtimes are purged - this occurs regardless of what zone the Director change is made in.

e.g. I ack a host problem in for host-a in zone-a, create a scheduled downtime for host-b in zone-b, and then push a Director config change for host-c in zone-c - the ack's for host-a and downtime for host-b are removed, but are in the history.

Additionally, when the Icinga master reloads config after a Director deployment, we are seeing a race condition that causes hosts to send down notifications, and a few minutes later, enter downtime:

image

image

Icinga Web 2 Version 2.11.4 Icinga2 Version r2.13.7-1

mschroeder21 commented 1 year ago

Is there a possibility to get feedback on this topic?

julianbrost commented 1 year ago

My first guess would be that there could be some inconsistency between both masters. While inside the downtime, you can request https://localhost:5665/v1/objects/services/affected-host-name!affected-service-name from both masters and compare what you get. downtime_depth would be of particular interest as this shows if both masters agree on whether the service is in a downtime.

mschroeder21 commented 1 year ago

Thanks for your answer. Both masters are in sync (downtime_depth is 1 if a service is in downtime). I have also already cleaned /var/lib/icinga2/api/zones/ several times on the second master to get a fresh sync from the config master.

log1-c commented 2 months ago

Something like what @0xliam describes happens in our setup from time to time as well. image

The config deployment by the Director was triggered at 18:00 At 18:01:45 configs were synced between the masters (with the config master ignoring the updates from the secondary master) and into the zones. That all was finished at 18:02:10 Between 18:02:10 and 18:02:40 multiple downtimes where created and those successfully suppressed notifications. At 18:02:41 a whole lot of "Syncing configuration files for xyz to " messages (re)appear in the log without a config deployment being triggered, only for the masters The downtime from the screenshot was entered at 18:02:43

Log line for the host that was notified during the supposed downtime

### DOWNTIME ENTERED VIA API ### 
[2024-08-07 18:02:48 +0200] information/Downtime: Triggering downtime 'xyz-p1-ts2004!memory-toplist!13960866-f66f-4804-b527-625b31b85818' for checkable 'xyz-p1-ts2004!memory-toplist'.
[2024-08-07 18:02:48 +0200] information/ConfigObjectUtility: Created and activated object 'xyz-p1-ts2004!memory-toplist!13960866-f66f-4804-b527-625b31b85818' of type 'Downtime'.
[2024-08-07 18:02:48 +0200] information/Checkable: Checkable 'xyz-p1-ts2004!memory_free_VD' has 1 notification(s). Checking filters for type 'DowntimeStart', sends will be logged.
[2024-08-07 18:02:48 +0200] information/Downtime: Triggering downtime 'xyz-p1-ts2004!memory_free_VD!bfa5736a-5b2d-4061-8c2b-6ab3290d508c' for checkable 'xyz-p1-ts2004!memory_free_VD'.
[2024-08-07 18:02:48 +0200] information/ConfigObjectUtility: Created and activated object 'xyz-p1-ts2004!memory_free_VD!bfa5736a-5b2d-4061-8c2b-6ab3290d508c' of type 'Downtime'.
[2024-08-07 18:02:48 +0200] information/Checkable: Checkable 'xyz-p1-ts2004!pending_updates' has 1 notification(s). Checking filters for type 'DowntimeStart', sends will be logged.
[2024-08-07 18:02:48 +0200] information/Downtime: Triggering downtime 'xyz-p1-ts2004!pending_updates!849fa58a-fbf2-43cf-97c7-cd6cd5c83a5a' for checkable 'xyz-p1-ts2004!pending_updates'.
[2024-08-07 18:02:48 +0200] information/ConfigObjectUtility: Created and activated object 'xyz-p1-ts2004!pending_updates!849fa58a-fbf2-43cf-97c7-cd6cd5c83a5a' of type 'Downtime'.
[2024-08-07 18:02:48 +0200] information/Checkable: Checkable 'xyz-p1-ts2004!pending_updates_security-only' has 1 notification(s). Checking filters for type 'DowntimeStart', sends will be logged.
[2024-08-07 18:02:48 +0200] information/Downtime: Triggering downtime 'xyz-p1-ts2004!pending_updates_security-only!8f643388-a8e4-40dc-80f1-c57c97fdbce3' for checkable 'xyz-p1-ts2004!pending_updates_security-only'.
[2024-08-07 18:02:48 +0200] information/ConfigObjectUtility: Created and activated object 'xyz-p1-ts2004!pending_updates_security-only!8f643388-a8e4-40dc-80f1-c57c97fdbce3' of type 'Downtime'.
[2024-08-07 18:02:48 +0200] information/Checkable: Checkable 'xyz-p1-ts2004!rdp-x224-status' has 1 notification(s). Checking filters for type 'DowntimeStart', sends will be logged.
[2024-08-07 18:02:48 +0200] information/Downtime: Triggering downtime 'xyz-p1-ts2004!rdp-x224-status!024afb09-1141-47a4-8398-de52c761102d' for checkable 'xyz-p1-ts2004!rdp-x224-status'.
[2024-08-07 18:02:48 +0200] information/ConfigObjectUtility: Created and activated object 'xyz-p1-ts2004!rdp-x224-status!024afb09-1141-47a4-8398-de52c761102d' of type 'Downtime'.
[2024-08-07 18:02:48 +0200] information/Checkable: Checkable 'xyz-p1-ts2004!sentinelone-agent-status' has 1 notification(s). Checking filters for type 'DowntimeStart', sends will be logged.
[2024-08-07 18:02:48 +0200] information/Downtime: Triggering downtime 'xyz-p1-ts2004!sentinelone-agent-status!d1fd675e-8976-41cd-855e-14d557cf7cd6' for checkable 'xyz-p1-ts2004!sentinelone-agent-status'.
[2024-08-07 18:02:48 +0200] information/ConfigObjectUtility: Created and activated object 'xyz-p1-ts2004!sentinelone-agent-status!d1fd675e-8976-41cd-855e-14d557cf7cd6' of type 'Downtime'.
[2024-08-07 18:02:48 +0200] information/Downtime: Triggering downtime 'xyz-p1-ts2004!sentinelone_application_security!8ed4fb6c-989c-4040-8c72-d0e30f2e73a6' for checkable 'xyz-p1-ts2004!sentinelone_application_security'.
[2024-08-07 18:02:48 +0200] information/ConfigObjectUtility: Created and activated object 'xyz-p1-ts2004!sentinelone_application_security!8ed4fb6c-989c-4040-8c72-d0e30f2e73a6' of type 'Downtime'.
[2024-08-07 18:02:48 +0200] information/Checkable: Checkable 'xyz-p1-ts2004!sentinelone_threats' has 2 notification(s). Checking filters for type 'DowntimeStart', sends will be logged.
[2024-08-07 18:02:48 +0200] information/Downtime: Triggering downtime 'xyz-p1-ts2004!sentinelone_threats!f3f5f9fe-2b39-41d1-8dbc-43601c96fba0' for checkable 'xyz-p1-ts2004!sentinelone_threats'.
[2024-08-07 18:02:48 +0200] information/ConfigObjectUtility: Created and activated object 'xyz-p1-ts2004!sentinelone_threats!f3f5f9fe-2b39-41d1-8dbc-43601c96fba0' of type 'Downtime'.
[2024-08-07 18:02:48 +0200] information/Checkable: Checkable 'xyz-p1-ts2004!service-dcomlaunch' has 1 notification(s). Checking filters for type 'DowntimeStart', sends will be logged.
[2024-08-07 18:02:48 +0200] information/Downtime: Triggering downtime 'xyz-p1-ts2004!service-dcomlaunch!3d0ef4ab-a08d-4b44-961a-c6b512c13bd8' for checkable 'xyz-p1-ts2004!service-dcomlaunch'.
[2024-08-07 18:02:48 +0200] information/ConfigObjectUtility: Created and activated object 'xyz-p1-ts2004!service-dcomlaunch!3d0ef4ab-a08d-4b44-961a-c6b512c13bd8' of type 'Downtime'.
[2024-08-07 18:02:48 +0200] information/Checkable: Checkable 'xyz-p1-ts2004!service-eventlog' has 1 notification(s). Checking filters for type 'DowntimeStart', sends will be logged.
[2024-08-07 18:02:48 +0200] information/Downtime: Triggering downtime 'xyz-p1-ts2004!service-eventlog!06e304df-196f-479f-8f1e-7a577bb46b01' for checkable 'xyz-p1-ts2004!service-eventlog'.
[2024-08-07 18:02:48 +0200] information/ConfigObjectUtility: Created and activated object 'xyz-p1-ts2004!service-eventlog!06e304df-196f-479f-8f1e-7a577bb46b01' of type 'Downtime'.
[2024-08-07 18:02:48 +0200] information/Checkable: Checkable 'xyz-p1-ts2004!service-frxsvc' has 1 notification(s). Checking filters for type 'DowntimeStart', sends will be logged.
[2024-08-07 18:02:48 +0200] information/Downtime: Triggering downtime 'xyz-p1-ts2004!service-frxsvc!a321407e-ef41-43fb-a890-16e1d46d6a0f' for checkable 'xyz-p1-ts2004!service-frxsvc'.
[2024-08-07 18:02:48 +0200] information/ConfigObjectUtility: Created and activated object 'xyz-p1-ts2004!service-frxsvc!a321407e-ef41-43fb-a890-16e1d46d6a0f' of type 'Downtime'.
[2024-08-07 18:02:48 +0200] information/Checkable: Checkable 'xyz-p1-ts2004!service-gpsvc' has 1 notification(s). Checking filters for type 'DowntimeStart', sends will be logged.
[2024-08-07 18:02:48 +0200] information/Downtime: Triggering downtime 'xyz-p1-ts2004!service-gpsvc!3dd5276b-a26c-43c0-96ea-fd3149409c6a' for checkable 'xyz-p1-ts2004!service-gpsvc'.
[2024-08-07 18:02:48 +0200] information/ConfigObjectUtility: Created and activated object 'xyz-p1-ts2004!service-gpsvc!3dd5276b-a26c-43c0-96ea-fd3149409c6a' of type 'Downtime'.
[2024-08-07 18:02:48 +0200] information/Checkable: Checkable 'xyz-p1-ts2004!service-lanmanserver' has 1 notification(s). Checking filters for type 'DowntimeStart', sends will be logged.
[2024-08-07 18:02:48 +0200] information/Downtime: Triggering downtime 'xyz-p1-ts2004!service-lanmanserver!c0edc4d2-5989-4378-8e2f-d2c81d63ba84' for checkable 'xyz-p1-ts2004!service-lanmanserver'.
[2024-08-07 18:02:48 +0200] information/ConfigObjectUtility: Created and activated object 'xyz-p1-ts2004!service-lanmanserver!c0edc4d2-5989-4378-8e2f-d2c81d63ba84' of type 'Downtime'.
[2024-08-07 18:02:48 +0200] information/Checkable: Checkable 'xyz-p1-ts2004!service-lanmanworkstation' has 1 notification(s). Checking filters for type 'DowntimeStart', sends will be logged.
[2024-08-07 18:02:48 +0200] information/Downtime: Triggering downtime 'xyz-p1-ts2004!service-lanmanworkstation!ba0dccab-c0a3-48ca-9bca-b4fe80d63b6d' for checkable 'xyz-p1-ts2004!service-lanmanworkstation'.
[2024-08-07 18:02:48 +0200] information/ConfigObjectUtility: Created and activated object 'xyz-p1-ts2004!service-lanmanworkstation!ba0dccab-c0a3-48ca-9bca-b4fe80d63b6d' of type 'Downtime'.
[2024-08-07 18:02:48 +0200] information/Checkable: Checkable 'xyz-p1-ts2004!service-logprocessorservice' has 1 notification(s). Checking filters for type 'DowntimeStart', sends will be logged.
[2024-08-07 18:02:48 +0200] information/Downtime: Triggering downtime 'xyz-p1-ts2004!service-logprocessorservice!7f79485e-4410-4780-b444-67d4a1ed4200' for checkable 'xyz-p1-ts2004!service-logprocessorservice'.
[2024-08-07 18:02:48 +0200] information/ConfigObjectUtility: Created and activated object 'xyz-p1-ts2004!service-logprocessorservice!7f79485e-4410-4780-b444-67d4a1ed4200' of type 'Downtime'.
[2024-08-07 18:02:48 +0200] information/Checkable: Checkable 'xyz-p1-ts2004!service-mpssvc' has 1 notification(s). Checking filters for type 'DowntimeStart', sends will be logged.
[2024-08-07 18:02:48 +0200] information/Downtime: Triggering downtime 'xyz-p1-ts2004!service-mpssvc!ffaf3b73-dda1-440c-93fb-8296341594a6' for checkable 'xyz-p1-ts2004!service-mpssvc'.
[2024-08-07 18:02:48 +0200] information/ConfigObjectUtility: Created and activated object 'xyz-p1-ts2004!service-mpssvc!ffaf3b73-dda1-440c-93fb-8296341594a6' of type 'Downtime'.
[2024-08-07 18:02:48 +0200] information/Checkable: Checkable 'xyz-p1-ts2004!service-rdagentbootloader' has 1 notification(s). Checking filters for type 'DowntimeStart', sends will be logged.
[2024-08-07 18:02:48 +0200] information/Downtime: Triggering downtime 'xyz-p1-ts2004!service-rdagentbootloader!87c04bc6-c807-4bdd-ab85-e07d2878d1dc' for checkable 'xyz-p1-ts2004!service-rdagentbootloader'.
[2024-08-07 18:02:48 +0200] information/ConfigObjectUtility: Created and activated object 'xyz-p1-ts2004!service-rdagentbootloader!87c04bc6-c807-4bdd-ab85-e07d2878d1dc' of type 'Downtime'.
[2024-08-07 18:02:48 +0200] information/Checkable: Checkable 'xyz-p1-ts2004!service-rpcss' has 1 notification(s). Checking filters for type 'DowntimeStart', sends will be logged.
[2024-08-07 18:02:48 +0200] information/Downtime: Triggering downtime 'xyz-p1-ts2004!service-rpcss!f14e490e-9a7e-4270-932b-d8fd7df81396' for checkable 'xyz-p1-ts2004!service-rpcss'.
[2024-08-07 18:02:48 +0200] information/ConfigObjectUtility: Created and activated object 'xyz-p1-ts2004!service-rpcss!f14e490e-9a7e-4270-932b-d8fd7df81396' of type 'Downtime'.
[2024-08-07 18:02:48 +0200] information/Checkable: Checkable 'xyz-p1-ts2004!service-schedule' has 1 notification(s). Checking filters for type 'DowntimeStart', sends will be logged.
[2024-08-07 18:02:48 +0200] information/Downtime: Triggering downtime 'xyz-p1-ts2004!service-schedule!ca162fec-1260-43de-a524-cc17e2d2d869' for checkable 'xyz-p1-ts2004!service-schedule'.
[2024-08-07 18:02:48 +0200] information/ConfigObjectUtility: Created and activated object 'xyz-p1-ts2004!service-schedule!ca162fec-1260-43de-a524-cc17e2d2d869' of type 'Downtime'.
[2024-08-07 18:02:48 +0200] information/Checkable: Checkable 'xyz-p1-ts2004!service-sentinelagent' has 1 notification(s). Checking filters for type 'DowntimeStart', sends will be logged.
[2024-08-07 18:02:48 +0200] information/Downtime: Triggering downtime 'xyz-p1-ts2004!service-sentinelagent!9fd29be4-2d24-41b1-83e0-3bf3063ecad4' for checkable 'xyz-p1-ts2004!service-sentinelagent'.
[2024-08-07 18:02:48 +0200] information/ConfigObjectUtility: Created and activated object 'xyz-p1-ts2004!service-sentinelagent!9fd29be4-2d24-41b1-83e0-3bf3063ecad4' of type 'Downtime'.
[2024-08-07 18:02:48 +0200] information/Checkable: Checkable 'xyz-p1-ts2004!service-sentinelstaticengine' has 1 notification(s). Checking filters for type 'DowntimeStart', sends will be logged.
[2024-08-07 18:02:48 +0200] information/Downtime: Triggering downtime 'xyz-p1-ts2004!service-sentinelstaticengine!70b79481-44e0-4e8c-85f0-dd49b5c4de3c' for checkable 'xyz-p1-ts2004!service-sentinelstaticengine'.
[2024-08-07 18:02:48 +0200] information/ConfigObjectUtility: Created and activated object 'xyz-p1-ts2004!service-sentinelstaticengine!70b79481-44e0-4e8c-85f0-dd49b5c4de3c' of type 'Downtime'.
[2024-08-07 18:02:48 +0200] information/Checkable: Checkable 'xyz-p1-ts2004!service-winmgmt' has 1 notification(s). Checking filters for type 'DowntimeStart', sends will be logged.
[2024-08-07 18:02:48 +0200] information/Downtime: Triggering downtime 'xyz-p1-ts2004!service-winmgmt!3b77eab9-b3f0-46f6-b5fd-462848b0c4ce' for checkable 'xyz-p1-ts2004!service-winmgmt'.
[2024-08-07 18:02:48 +0200] information/ConfigObjectUtility: Created and activated object 'xyz-p1-ts2004!service-winmgmt!3b77eab9-b3f0-46f6-b5fd-462848b0c4ce' of type 'Downtime'.
[2024-08-07 18:02:48 +0200] information/Checkable: Checkable 'xyz-p1-ts2004!service-winrm' has 1 notification(s). Checking filters for type 'DowntimeStart', sends will be logged.
[2024-08-07 18:02:48 +0200] information/Downtime: Triggering downtime 'xyz-p1-ts2004!service-winrm!fc171c9b-2109-401c-a196-51844220b4f8' for checkable 'xyz-p1-ts2004!service-winrm'.
[2024-08-07 18:02:48 +0200] information/ConfigObjectUtility: Created and activated object 'xyz-p1-ts2004!service-winrm!fc171c9b-2109-401c-a196-51844220b4f8' of type 'Downtime'.
[2024-08-07 18:02:48 +0200] information/Downtime: Triggering downtime 'xyz-p1-ts2004!software_inventory!3334eae7-e6a7-4186-8345-890e5a069cc5' for checkable 'xyz-p1-ts2004!software_inventory'.
[2024-08-07 18:02:48 +0200] information/ConfigObjectUtility: Created and activated object 'xyz-p1-ts2004!software_inventory!3334eae7-e6a7-4186-8345-890e5a069cc5' of type 'Downtime'.
[2024-08-07 18:02:48 +0200] information/Checkable: Checkable 'xyz-p1-ts2004!userprofile-containers' has 1 notification(s). Checking filters for type 'DowntimeStart', sends will be logged.
[2024-08-07 18:02:48 +0200] information/Downtime: Triggering downtime 'xyz-p1-ts2004!userprofile-containers!422d1ce1-620f-45fd-b868-b996867d2610' for checkable 'xyz-p1-ts2004!userprofile-containers'.
[2024-08-07 18:02:48 +0200] information/ConfigObjectUtility: Created and activated object 'xyz-p1-ts2004!userprofile-containers!422d1ce1-620f-45fd-b868-b996867d2610' of type 'Downtime'.
### NOTIFICATION WAS SENT ### 
[2024-08-07 18:08:23 +0200] information/Checkable: Checkable 'xyz-p1-ts2004' has 1 notification(s). Checking filters for type 'Problem', sends will be logged.
[2024-08-07 18:08:25 +0200] information/ConfigObjectUtility: Created and activated object 'xyz-p1-ts2004!90d985c3-0bc3-40f4-8cfc-00fbb741fdbc' of type 'Comment'.
[2024-08-07 18:08:25 +0200] information/Checkable: Acknowledgement set for checkable 'xyz-p1-ts2004'.
### DOWNTIME WAS STARTED WITH A DELAY of ~ 50 minutes ###
[2024-08-07 18:51:11 +0200] information/Checkable: Checkable 'xyz-p1-ts2004' has 1 notification(s). Checking filters for type 'DowntimeStart', sends will be logged.
[2024-08-07 18:51:11 +0200] information/Downtime: Triggering downtime 'xyz-p1-ts2004!ef6293dc-916f-4db8-8e4b-aff0c37744ef' for checkable 'xyz-p1-ts2004'.
[2024-08-07 18:51:11 +0200] information/ConfigObjectUtility: Created and activated object 'xyz-p1-ts2004!ef6293dc-916f-4db8-8e4b-aff0c37744ef' of type 'Downtime'.
[2024-08-07 18:51:11 +0200] information/Downtime: Triggering downtime 'xyz-p1-ts2004!cpu!b98242bc-a9ec-4561-ac64-3292c0779221' for checkable 'xyz-p1-ts2004!cpu'.
[2024-08-07 18:51:11 +0200] information/ConfigObjectUtility: Created and activated object 'xyz-p1-ts2004!cpu!b98242bc-a9ec-4561-ac64-3292c0779221' of type 'Downtime'.
[2024-08-07 18:51:11 +0200] information/Downtime: Triggering downtime 'xyz-p1-ts2004!cpu-toplist!9efb9f31-2a70-4b6f-a454-92ba04977230' for checkable 'xyz-p1-ts2004!cpu-toplist'.
[2024-08-07 18:51:11 +0200] information/ConfigObjectUtility: Created and activated object 'xyz-p1-ts2004!cpu-toplist!9efb9f31-2a70-4b6f-a454-92ba04977230' of type 'Downtime'.
[2024-08-07 18:51:11 +0200] information/Checkable: Checkable 'xyz-p1-ts2004!disk' has 2 notification(s). Checking filters for type 'DowntimeStart', sends will be logged.
[2024-08-07 18:51:11 +0200] information/Downtime: Triggering downtime 'xyz-p1-ts2004!disk!1619bb9b-96ed-432b-89f5-29774260bfd0' for checkable 'xyz-p1-ts2004!disk'.
[2024-08-07 18:51:11 +0200] information/ConfigObjectUtility: Created and activated object 'xyz-p1-ts2004!disk!1619bb9b-96ed-432b-89f5-29774260bfd0' of type 'Downtime'.
[2024-08-07 18:51:11 +0200] information/Downtime: Triggering downtime 'xyz-p1-ts2004!icinga-agent-parent-service!e646196c-02dd-4ca6-a76c-28c87e3aa1cc' for checkable 'xyz-p1-ts2004!icinga-agent-parent-service'.
[2024-08-07 18:51:11 +0200] information/ConfigObjectUtility: Created and activated object 'xyz-p1-ts2004!icinga-agent-parent-service!e646196c-02dd-4ca6-a76c-28c87e3aa1cc' of type 'Downtime'.
[2024-08-07 18:51:11 +0200] information/Downtime: Triggering downtime 'xyz-p1-ts2004!icinga-agent-version!bd76f856-586e-47ab-9d9e-880893090628' for checkable 'xyz-p1-ts2004!icinga-agent-version'.
[2024-08-07 18:51:11 +0200] information/ConfigObjectUtility: Created and activated object 'xyz-p1-ts2004!icinga-agent-version!bd76f856-586e-47ab-9d9e-880893090628' of type 'Downtime'.
log1-c commented 1 month ago

More occurences of this. "Light mode" is a downtime set via API on host shutdown. "Dark mode" is a scheduled downtime. image image

Debug logs can be provided if helpful!