Icinga / icinga2

The core of our monitoring platform with a powerful configuration language and REST API.
https://icinga.com/docs/icinga2/latest
GNU General Public License v2.0
2.03k stars 578 forks source link

icinga2.8 - Notifications are sent even in downtime #6231

Closed anan80 closed 6 years ago

anan80 commented 6 years ago

We have icinga master1-2 /satellite/client setup running with version 2.8. We are facing issue that we are still receiving alert notification even in downtime, Downtime is set thorugh web-UI and it shows in UI but still icinga is triggering alert notification mails. I observed that, no of servers doesnot match between master1 and master 2 server in /var/lib/icinga2/api/packages/_api//conf.d/downtimes directory

icinga: 2.8.0-1 icingaweb2: 2.4.1

Master 1 - api.conf is set to false for accept_config but master 2 is set to true

dnsmichi commented 6 years ago

I would appreciate it if you would take the time to fill in the issue template and provide steps to reproduce the problem.

unix0r commented 6 years ago

I'm also seeing this issue.

There is a downtime for a service of a host (created via icingaweb2):

object Downtime "PLS-SERVER-xxxxx" ignore_on_error {
    author = "admin"
    comment = "node was removed"
    config_owner = ""
    duration = 0.000000
    end_time = 1839587992.000000
    entry_time = 1523968793.733191
    fixed = true
    host_name = "pls-goeteborg1_10.46.8.141_10.46.8.140"
    scheduled_by = ""
    service_name = "Kiosk_LastSeen"
    start_time = 1523968792.000000
    triggered_by = ""
    version = 1523968793.733220
    zone = "pls-goeteborg1"
}

But there are still notifications coming from this service:

Service Monitoring on PLS-SERVER

Last Order at Order Point on 10.46.8.140 is CRITICAL!

Info: 59 days 3 hours ago

When: 2018-04-19 14:45:06 +0200 Service: Kiosk_LastSeen Host: pls-goeteborg1_10.46.8.141_10.46.8.140 IPv4: 10.46.8.141

System Information:

root@PLS-SERVER:/etc/icinga2# icinga2 --version icinga2 - The Icinga 2 network monitoring daemon (version: r2.8.2-1)

Copyright (c) 2012-2017 Icinga Development Team (https://www.icinga.com/) License GPLv2+: GNU GPL version 2 or later http://gnu.org/licenses/gpl2.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law.

Application information: Installation root: /usr Sysconf directory: /etc Run directory: /run Local state directory: /var Package data directory: /usr/share/icinga2 State path: /var/lib/icinga2/icinga2.state Modified attributes path: /var/lib/icinga2/modified-attributes.conf Objects path: /var/cache/icinga2/icinga2.debug Vars path: /var/cache/icinga2/icinga2.vars PID path: /run/icinga2/icinga2.pid

System information: Platform: Debian GNU/Linux Platform version: 9 (stretch) Kernel: Linux Kernel version: 4.9.0-4-amd64 Architecture: x86_64

Build information: Compiler: GNU 6.3.0 Build host: 022328c363ac

root@PLS-SERVER:/etc/icinga2# icinga2 feature list Disabled features: compatlog debuglog elasticsearch gelf influxdb livestatus opentsdb statusdata syslog Enabled features: api checker command graphite ido-pgsql mainlog notification perfdata

root@PLS-SERVER:/etc/icinga2# icinga2 daemon -C information/cli: Icinga application loader (version: r2.8.2-1) information/cli: Loading configuration file(s). information/ConfigItem: Committing config item(s). information/ApiListener: My API identity: master warning/ApplyRule: Apply rule 'ping6' (in /etc/icinga2/conf.d/services.conf: 28:1-28:21) for type 'Service' does not match anywhere! information/ConfigItem: Instantiated 1 ApiListener. information/ConfigItem: Instantiated 10 Zones. information/ConfigItem: Instantiated 8 Endpoints. information/ConfigItem: Instantiated 1 FileLogger. information/ConfigItem: Instantiated 2 ApiUsers. information/ConfigItem: Instantiated 358 Notifications. information/ConfigItem: Instantiated 2 NotificationCommands. information/ConfigItem: Instantiated 236 CheckCommands. information/ConfigItem: Instantiated 139 Downtimes. information/ConfigItem: Instantiated 8 HostGroups. information/ConfigItem: Instantiated 1 IcingaApplication. information/ConfigItem: Instantiated 1 EventCommand. information/ConfigItem: Instantiated 510 Hosts. information/ConfigItem: Instantiated 2 UserGroups. information/ConfigItem: Instantiated 2 Users. information/ConfigItem: Instantiated 4 TimePeriods. information/ConfigItem: Instantiated 2991 Services. information/ConfigItem: Instantiated 16 ServiceGroups. information/ConfigItem: Instantiated 1 ScheduledDowntime. information/ConfigItem: Instantiated 1 ExternalCommandListener. information/ConfigItem: Instantiated 1 CheckerComponent. information/ConfigItem: Instantiated 1 GraphiteWriter. information/ConfigItem: Instantiated 1 PerfdataWriter. information/ConfigItem: Instantiated 1 IdoPgsqlConnection. information/ConfigItem: Instantiated 1 NotificationComponent. information/ScriptGlobal: Dumping variables to file '/var/cache/icinga2/icinga2.vars' information/cli: Finished validating the configuration file(s).

dnsmichi commented 6 years ago

Please extract the runtime state for this host/service and downtime via REST API endpoints /v1/objects/services and /v1/objects/downtimes.

unix0r commented 6 years ago

Downtime:

    {
        "attrs": {
            "__name": "pls-goeteborg1_10.46.8.141_10.46.8.140!Kiosk_LastSeen!PLS-SERVER-1523968793-7",
            "active": true,
            "author": "admin",
            "comment": "node was removed",
            "config_owner": "",
            "duration": 0.0,
            "end_time": 1839587992.0,
            "entry_time": 1523968793.733191,
            "fixed": true,
            "ha_mode": 0.0,
            "host_name": "pls-goeteborg1_10.46.8.141_10.46.8.140",
            "legacy_id": 70.0,
            "name": "PLS-SERVER-1523968793-7",
            "original_attributes": null,
            "package": "_api",
            "paused": false,
            "scheduled_by": "",
            "service_name": "Kiosk_LastSeen",
            "source_location": {
                "first_column": 0.0,
                "first_line": 1.0,
                "last_column": 56.0,
                "last_line": 1.0,
                "path": "/var/lib/icinga2/api/packages/_api/PLS-SERVER-1513681196-1/conf.d/downtimes/pls-goeteborg1_10.46.8.141_10.46.8.140!Kiosk_LastSeen!PLS-SERVER-1523968793-7.conf"
            },
            "start_time": 1523968792.0,
            "templates": [
                "PLS-SERVER-1523968793-7"
            ],
            "trigger_time": 1523968793.737246,
            "triggered_by": "",
            "triggers": [],
            "type": "Downtime",
            "version": 1523968793.73322,
            "was_cancelled": false,
            "zone": "pls-goeteborg1"
        },
        "joins": {},
        "meta": {},
        "name": "pls-goeteborg1_10.46.8.141_10.46.8.140!Kiosk_LastSeen!PLS-SERVER-1523968793-7",
        "type": "Downtime"
    }

Service:

    {
        "attrs": {
            "__name": "pls-goeteborg1_10.46.8.141_10.46.8.140!Kiosk_LastSeen",
            "acknowledgement": 0.0,
            "acknowledgement_expiry": 0.0,
            "action_url": "",
            "active": true,
            "check_attempt": 1.0,
            "check_command": "check_kiosk_lastseen",
            "check_interval": 600.0,
            "check_period": "",
            "check_timeout": null,
            "command_endpoint": "",
            "display_name": "Last Order at Order Point",
            "downtime_depth": 1.0,
            "enable_active_checks": true,
            "enable_event_handler": true,
            "enable_flapping": false,
            "enable_notifications": true,
            "enable_passive_checks": true,
            "enable_perfdata": true,
            "event_command": "",
            "flapping": false,
            "flapping_current": 0.0,
            "flapping_last_change": 0.0,
            "flapping_threshold": 0.0,
            "flapping_threshold_high": 30.0,
            "flapping_threshold_low": 25.0,
            "force_next_check": false,
            "force_next_notification": false,
            "groups": [
                "Kiosk_LastSeen"
            ],
            "ha_mode": 0.0,
            "host_name": "pls-goeteborg1_10.46.8.141_10.46.8.140",
            "icon_image": "",
            "icon_image_alt": "",
            "last_check": 1524150835.323797,
            "last_check_result": {
                "active": true,
                "check_source": "pls-goeteborg1",
                "command": [
                    "/usr/lib/nagios/plugins/check_lastseen",
                    "-c",
                    "2880",
                    "-v",
                    "2018-02-19T10:08:37.27438+01:00",
                    "-w",
                    "1440"
                ],
                "execution_end": 1524150835.323666,
                "execution_start": 1524150835.302616,
                "exit_status": 2.0,
                "output": "59 days 6 hours ago",
                "performance_data": [
                    "duration=85325;1440;2880"
                ],
                "schedule_end": 1524150835.323797,
                "schedule_start": 1524150835.302077,
                "state": 2.0,
                "type": "CheckResult",
                "vars_after": {
                    "attempt": 1.0,
                    "reachable": true,
                    "state": 2.0,
                    "state_type": 1.0
                },
                "vars_before": {
                    "attempt": 1.0,
                    "reachable": true,
                    "state": 2.0,
                    "state_type": 1.0
                }
            },
            "last_hard_state": 2.0,
            "last_hard_state_change": 1519204509.405459,
            "last_reachable": true,
            "last_state": 2.0,
            "last_state_change": 1519204509.405459,
            "last_state_critical": 1524150835.461949,
            "last_state_ok": 1519117653.056663,
            "last_state_type": 1.0,
            "last_state_unknown": 0.0,
            "last_state_unreachable": 0.0,
            "last_state_warning": 1519203909.451425,
            "max_check_attempts": 5.0,
            "name": "Kiosk_LastSeen",
            "next_check": 1524151430.251961,
            "notes": "",
            "notes_url": "",
            "original_attributes": null,
            "package": "_etc",
            "paused": false,
            "retry_interval": 60.0,
            "severity": 129.0,
            "source_location": {
                "first_column": 1.0,
                "first_line": 236.0,
                "last_column": 30.0,
                "last_line": 236.0,
                "path": "/etc/icinga2/zones.d/global-templates/P_services.conf"
            },
            "state": 2.0,
            "state_type": 1.0,
            "templates": [
                "Kiosk_LastSeen",
                "kiosk-service-urgent",
                "kiosk-service",
                "generic-service"
            ],
            "type": "Service",
            "vars": {
                "notification": {
                    "mail": {
                        "users": [
                            "v_user"
                        ]
                    }
                }
            },
            "version": 0.0,
            "volatile": false,
            "zone": "pls-goeteborg1"
        },
        "joins": {},
        "meta": {},
        "name": "pls-goeteborg1_10.46.8.141_10.46.8.140!Kiosk_LastSeen",
        "type": "Service"
    }
dnsmichi commented 6 years ago

Hm, looks ok to me. Can you share the zones.conf on that master, and check which node is sending the notification email? I would suspect that it happens on the secondary master which maybe doesn't have the downtime applied.

unix0r commented 6 years ago

We only have one master and only the master is sending mails. Notifications and Downtimes are also applied via icingaweb2 of the master.

object Endpoint NodeName {
}

object Zone ZoneName {
    endpoints = [ NodeName ]
}

object Zone "global-templates" {
    global = true
}

object Zone "director-global" {
    global = true
}
object Endpoint "pls-goeteborg1"{
    host = "10.10.27.2"
}
object Zone "pls-goeteborg1"{
    endpoints = ["pls-goeteborg1"]
    parent = ZoneName
}
dnsmichi commented 6 years ago

So your problem is different to what @anan80 described, highly likely.

One thing I would also check via REST API - the notification objects and their current state. E.g. the last_notification timestamp, etc.