Icinga / icinga2

The core of our monitoring platform with a powerful configuration language and REST API.
https://icinga.com/docs/icinga2/latest
GNU General Public License v2.0
2k stars 574 forks source link

Notification for a acknowledged host get's send, as soon as timeperiod start's #8680

Closed Foxeronie closed 3 years ago

Foxeronie commented 3 years ago

Hi everyone,

we currently have the problem, that nightshift alarms get's triggered, even though the host is in downtime or acknowledged.

An example: Pager alarm timeperiod is starting at 3pm Host changed to DOWN state at 2pm 24x7 alerts gets send as soon as host is in hard state (~2:10pm) host get's acknowledged at 2:30pm pager alarm is sent at 3pm

An image of what happens: Screenshot_2021-03-12_15-23-14

To Reproduce

I try to give you all needed configs. If I missed one please let me know. user

object User "Rufbereitschaft" {
    display_name = "Rufbereitschaft"
    email = "rz@domain.de"
    enable_notifications = true
    zone = "master"
}

timeperiod

object TimePeriod "Nachtalarm" {
    import "legacy-timeperiod"
    prefer_includes = false
    ranges = {
        "april 2"   = "00:00-24:00"
        "april 5"   = "00:00-24:00"
        "december 24"   = "00:00-24:00"
        "december 25"   = "00:00-24:00"
        "december 26"   = "00:00-24:00"
        "december 31"   = "00:00-24:00"
        "friday"    = "00:00-09:00,15:00-24:00"
        "january 1" = "00:00-24:00"
        "june 5"    = "00:00-24:00"
        "may 1" = "00:00-24:00"
        "may 13"    = "00:00-24:00"
        "may 24"    = "00:00-24:00"
        "monday"    = "00:00-09:00,17:00-24:00"
        "october 3" = "00:00-24:00"
        "october 31"    = "00:00-24:00"
        "saturday"  = "00:00-24:00"
        "sunday"    = "00:00-24:00"
        "thursday"  = "00:00-09:00,17:00-24:00"
        "tuesday"   = "00:00-09:00,17:00-24:00"
        "wednesday" = "00:00-09:00,17:00-24:00"
    }
}

host down notification

apply Notification "pager_host_down" to Host {
    command = "notification_pager_host"
    interval = 0s
    period = "Nachtalarm"
    zone = "master"
    assign where host.vars.monitoring_class == "nachtalarm" && host.vars.monitoring_class != "testserver"
    states = [ Down ]
    types = [ Problem ]
    users = [ "Rufbereitschaft" ]
}

host

object Host "hostA.domain.de" {
    display_name = "hostA"
    address = "192.168.2.25"
    check_command = "icmp"
    max_check_attempts = "10"
    check_interval = 1m
    retry_interval = 1m
    zone = "worker"
    notes_url = "https://confluence.domain.de/display/ITDOCS/AFS+Service"
    icon_image = "/img/sun.png"
    icon_image_alt = "SUN"
    groups = [ "it/afsfs/netappe" ]
    vars.has_agent = true
    vars.has_multipath = false
    vars.has_nrpe = true
    vars.icmp_upper_critical = "900,80%"
    vars.icmp_upper_warning = "600,60%"
    vars.impact = 100
    vars.location = "rz1/z-01-07/41"
    vars.manufacturer = "sun microsystems"
    vars.model = "sunfire x4140"
    vars.monitoring_class = "nachtalarm"
    vars.operating_system = "solaris"
    vars.parent_switch = "unknown"
    vars.support_group = [ "leitstand" ]
}

Expected behavior

User "Rufbereitschaft" should not get an alert, because the host is acknowledged

Your Environment

Version used (icinga2 --version)

icinga2 - The Icinga 2 network monitoring daemon (version: r2.12.3-1)

Copyright (c) 2012-2021 Icinga GmbH (https://icinga.com/)
License GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl2.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

System information:
  Platform: Ubuntu
  Platform version: 20.04.2 LTS (Focal Fossa)
  Kernel: Linux
  Kernel version: 5.4.0-65-generic
  Architecture: x86_64

Build information:
  Compiler: GNU 9.3.0
  Build host: runner-hh8q3bz2-project-298-concurrent-0
  OpenSSL version: OpenSSL 1.1.1f  31 Mar 2020

Application information:

General paths:
  Config directory: /etc/icinga2
  Data directory: /var/lib/icinga2
  Log directory: /var/log/icinga2
  Cache directory: /var/cache/icinga2
  Spool directory: /var/spool/icinga2
  Run directory: /run/icinga2

Old paths (deprecated):
  Installation root: /usr
  Sysconf directory: /etc
  Run directory (base): /run
  Local state directory: /var

Internal paths:
  Package data directory: /usr/share/icinga2
  State path: /var/lib/icinga2/icinga2.state
  Modified attributes path: /var/lib/icinga2/modified-attributes.conf
  Objects path: /var/cache/icinga2/icinga2.debug
  Vars path: /var/cache/icinga2/icinga2.vars
  PID path: /run/icinga2/icinga2.pid

Operating System and version

:~# cat /etc/issue
Ubuntu 20.04.2 LTS \n \l

Enabled features (icinga2 feature list)

Disabled features: command compatlog debuglog elasticsearch gelf icingadb influxdb livestatus opentsdb perfdata statusdata syslog
Enabled features: api checker graphite ido-mysql mainlog notification

Icinga Web 2 version and modules (System - About):

Icinga Web 2 Version
    2.8.2
Git commit
    293021b2000e9d459387153ca5690f97e0184aaa 
PHP Version
    7.4.3

businessprocess | 2.3.0
-- | --
director | master
generictts | 2.0.0
incubator | 0.6.0
ipl | v0.5.0
monitoring | 2.8.2
reactbundle | 0.8.0

Config validation (icinga2 daemon -C):

[2021-03-12 15:31:12 +0100] information/cli: Icinga application loader (version: r2.12.3-1)
[2021-03-12 15:31:12 +0100] information/cli: Loading configuration file(s).
[2021-03-12 15:31:13 +0100] information/ConfigItem: Committing config item(s).
[2021-03-12 15:31:13 +0100] information/ApiListener: My API identity: icinga-master10.domain.de
[2021-03-12 15:31:23 +0100] information/WorkQueue: #5 (GraphiteWriter, graphite) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2021-03-12 15:31:23 +0100] information/WorkQueue: #4 (DaemonUtility::LoadConfigFiles) items: 0, rate: 72/s (4320/min 4320/5min 4320/15min);
[2021-03-12 15:31:23 +0100] information/WorkQueue: #7 (ApiListener, RelayQueue) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2021-03-12 15:31:23 +0100] information/WorkQueue: #8 (ApiListener, SyncQueue) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2021-03-12 15:31:40 +0100] warning/ApplyRule: Apply rule 'HMC Event Log' (in /var/lib/icinga2/api/packages/director/86d4df53-f9b5-4a1f-a8b1-98b6cce5d237/zones.d/worker/service_apply.conf: 1986:1-1986:29) for type 'Service' does not match anywhere!
[2021-03-12 15:31:40 +0100] warning/ApplyRule: Apply rule 'HMC Event Log' (in /var/lib/icinga2/api/packages/director/86d4df53-f9b5-4a1f-a8b1-98b6cce5d237/zones.d/worker/service_apply.conf: 1997:1-1997:29) for type 'Service' does not match anywhere!
[2021-03-12 15:31:40 +0100] warning/ApplyRule: Apply rule 'HMC Power8 Temperatures' (in /var/lib/icinga2/api/packages/director/86d4df53-f9b5-4a1f-a8b1-98b6cce5d237/zones.d/worker/service_apply.conf: 2360:1-2360:39) for type 'Service' does not match anywhere!
[2021-03-12 15:31:40 +0100] warning/ApplyRule: Apply rule 'HMC Power8 Temperatures' (in /var/lib/icinga2/api/packages/director/86d4df53-f9b5-4a1f-a8b1-98b6cce5d237/zones.d/worker/service_apply.conf: 2371:1-2371:39) for type 'Service' does not match anywhere!
[2021-03-12 15:31:40 +0100] information/ConfigItem: Instantiated 1 NotificationComponent.
[2021-03-12 15:31:40 +0100] information/ConfigItem: Instantiated 6136 Hosts.
[2021-03-12 15:31:40 +0100] information/ConfigItem: Instantiated 3191 Downtimes.
[2021-03-12 15:31:40 +0100] information/ConfigItem: Instantiated 1 GraphiteWriter.
[2021-03-12 15:31:40 +0100] information/ConfigItem: Instantiated 6 NotificationCommands.
[2021-03-12 15:31:40 +0100] information/ConfigItem: Instantiated 1 FileLogger.
[2021-03-12 15:31:40 +0100] information/ConfigItem: Instantiated 2667 Comments.
[2021-03-12 15:31:40 +0100] information/ConfigItem: Instantiated 22105 Notifications.
[2021-03-12 15:31:40 +0100] information/ConfigItem: Instantiated 1 IcingaApplication.
[2021-03-12 15:31:40 +0100] information/ConfigItem: Instantiated 423 HostGroups.
[2021-03-12 15:31:40 +0100] information/ConfigItem: Instantiated 56772 Dependencies.
[2021-03-12 15:31:40 +0100] information/ConfigItem: Instantiated 1 CheckerComponent.
[2021-03-12 15:31:40 +0100] information/ConfigItem: Instantiated 3241 Zones.
[2021-03-12 15:31:40 +0100] information/ConfigItem: Instantiated 3243 Endpoints.
[2021-03-12 15:31:40 +0100] information/ConfigItem: Instantiated 1 IdoMysqlConnection.
[2021-03-12 15:31:40 +0100] information/ConfigItem: Instantiated 14 ApiUsers.
[2021-03-12 15:31:40 +0100] information/ConfigItem: Instantiated 1 ApiListener.
[2021-03-12 15:31:40 +0100] information/ConfigItem: Instantiated 131 CheckCommands.
[2021-03-12 15:31:40 +0100] information/ConfigItem: Instantiated 3 TimePeriods.
[2021-03-12 15:31:40 +0100] information/ConfigItem: Instantiated 2 UserGroups.
[2021-03-12 15:31:40 +0100] information/ConfigItem: Instantiated 97 Users.
[2021-03-12 15:31:40 +0100] information/ConfigItem: Instantiated 80477 Services.
[2021-03-12 15:31:40 +0100] information/ConfigItem: Instantiated 43 ServiceGroups.
[2021-03-12 15:31:40 +0100] information/ConfigItem: Instantiated 39 ScheduledDowntimes.
[2021-03-12 15:31:40 +0100] information/ScriptGlobal: Dumping variables to file '/var/cache/icinga2/icinga2.vars'
[2021-03-12 15:31:40 +0100] information/cli: Finished validating the configuration file(s).

multiple Icinga 2 instances

object Endpoint "icinga-master10.domain.de" {
}

object Endpoint "icinga-master11.domain.de" {
  host = "192.168.194.116"
  log_duration = 0s
}

object Endpoint "icinga-monitor10.domain.de" {
  host = "192.168.64.241"
  log_duration = 0s
}

object Endpoint "icinga-monitor11.domain.de" {
  host = "192.168.64.242"
  log_duration = 0s
}

object Endpoint "icinga-monitor12.domain.de" {
  host = "192.168.64.243"
  log_duration = 0s
}

object Endpoint "icinga-monitor13.domain.de" {
  host = "192.168.64.244"
  log_duration = 0s
}

object Endpoint "icinga-worker10.domain.de" {
  host = "192.168.194.118"
  log_duration = 0s
}

object Endpoint "icinga-worker11.domain.de" {
  host = "192.168.194.119"
  log_duration = 0s
}

object Zone "director-global" {
  global = true
}

object Zone "global-templates" {
  global = true
}

object Zone "master" {
  endpoints = [ "icinga-master10.domain.de", "icinga-master11.domain.de", ]
}

object Zone "monitor-afs" {
  endpoints = [ "icinga-monitor12.domain.de", "icinga-monitor13.domain.de", ]
  parent = "master"
}

object Zone "monitor-general" {
  endpoints = [ "icinga-monitor10.domain.de", "icinga-monitor11.domain.de", ]
  parent = "master"
}

object Zone "worker" {
  endpoints = [ "icinga-worker10.domain.de", "icinga-worker11.domain.de", ]
  parent = "master"
}

Maybe somehow related to #8667 ?

Best regards, Patrick

Al2Klimov commented 3 years ago

Hello @Foxeronie and thank you for reporting!

To me it sounds like #8513. Please could you give the snapshot packages a try?

Best, AK

charr403 commented 3 years ago

Hi,

A workaround would be to set the timeperiod for the notification to 24x7, but assign the timeperiod "Nachtalarm" to the user "Rufbereitschaft". That works for our environment.

We had two different notifications before. One for workdays and one for on-call duty. But we experienced the same. We were notified again for acknowledged problems when the next timeperiod started.

Best regards, Sebastian

Tqnsls commented 3 years ago

Hi, we are also experiencing this problem with downtimes. We have defined time periods f.e. from 8 am to 5 pm on the hosts.

2021_03_19_15_30_52_Window

In the debug logfile it says: 2021-03-19 08:00:00 +0100] notice/NotificationComponent: Attempting to re-send previously suppressed notification '<host>!host-mail-bronze'

According to the screenshot: Our onduty collegue saw that a host was down and set a downtime at 7.39. Unfortunately the Host notified at 8 am nevertheless So this issue dues not only occur when an ACK is set but also a Downtime.

Al2Klimov commented 3 years ago

IMAO the lack of external feedback for a long time indicates that that feedback will never happen. Therefore closing this one.

Feel free to re-open if the problem persists with the latest Icinga 2 version as long as you provide the desired information.