Icinga / icinga2

The core of our monitoring platform with a powerful configuration language and REST API.
https://icinga.com/docs/icinga2/latest
GNU General Public License v2.0
1.99k stars 570 forks source link

specifying multiple dependencies #9361

Closed errror closed 9 months ago

errror commented 2 years ago

Describe the bug

According to https://github.com/Icinga/icinga2/commit/d0c0beb8beb8a56fc61843b6dfa8b4b34c0a4ec1 a service depending on more than one other service does not cause notfications if ALL parent services fail. I did not find any discussion about that change. Maybe this is intentional, maybe this is a bug.

How do I specify that a service should not cause notifications if only some parent services are failing but not all of them?

To Reproduce

  1. Add the following config to a fresh icinga2 install:
    
    template Service "file_exists-service" {
    import "generic-service"
    check_interval = 30s
    retry_interval = 5s
    check_command = "file_age"
    // we just want to check if the file exists
    vars.file_age_warning_time = 2147483647
    vars.file_age_critical_time = 2147483647
    }
    apply Service "service_a" {
    import "file_exists-service"
    vars.file_age_file = "/tmp/service_a"
    assign where host.name == NodeName
    }
    apply Service "service_b" {
    import "file_exists-service"
    vars.file_age_file = "/tmp/service_b"
    assign where host.name == NodeName
    }
    apply Service "service_c" {
    import "file_exists-service"
    vars.file_age_file = "/tmp/service_c"
    assign where host.name == NodeName
    }

object Dependency "a2c" { ignore_soft_states = false parent_host_name = NodeName child_host_name = NodeName parent_service_name = "service_a" child_service_name = "service_c" } object Dependency "b2c" { ignore_soft_states = false parent_host_name = NodeName child_host_name = NodeName parent_service_name = "service_b" child_service_name = "service_c" }

3. Start icinga2 and wait for `service_a`, `service_b` and `service_c` to fail.
4. Two notifications will be sent: `service_a` and `service_b` but **not** `service_c`
5. Touch `/tmp/service_{a,b,c}` and wait for all three services to get ok again.
6. Remove `/tmp/service_{a,c}` and wait for `service_a` and `service_c` to fail.
7. Two notfications will be sent: `service_a` and `service_c` although `service_c` depends on `service_a`.

## Expected behavior

In step 6, only a notification for `service_a` should be sent.

## Screenshots

## Your Environment

* Version used (`icinga2 --version`):

icinga2 - The Icinga 2 network monitoring daemon (version: r2.12.3-1)

Copyright (c) 2012-2022 Icinga GmbH (https://icinga.com/) License GPLv2+: GNU GPL version 2 or later http://gnu.org/licenses/gpl2.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law.

System information: Platform: Debian GNU/Linux Platform version: 11 (bullseye) Kernel: Linux Kernel version: 5.15.32.1.amd64-smp Architecture: x86_64

Build information: Compiler: GNU 10.2.1 Build host: x86-ubc-01 OpenSSL version: OpenSSL 1.1.1k 25 Mar 2021

Application information:

General paths: Config directory: /etc/icinga2 Data directory: /var/lib/icinga2 Log directory: /var/log/icinga2 Cache directory: /var/cache/icinga2 Spool directory: /var/spool/icinga2 Run directory: /run/icinga2

Old paths (deprecated): Installation root: /usr Sysconf directory: /etc Run directory (base): /run Local state directory: /var

Internal paths: Package data directory: /usr/share/icinga2 State path: /var/lib/icinga2/icinga2.state Modified attributes path: /var/lib/icinga2/modified-attributes.conf Objects path: /var/cache/icinga2/icinga2.debug Vars path: /var/cache/icinga2/icinga2.vars PID path: /run/icinga2/icinga2.pid

Enabled features (`icinga2 feature list`):

Disabled features: api command compatlog debuglog elasticsearch gelf graphite icingadb influxdb livestatus opentsdb perfdata statusdata Enabled features: checker mainlog notification syslog

* Icinga Web 2 version and modules (System - About): not used here
* Config validation (`icinga2 daemon -C`):

[2022-04-28 14:06:00 +0200] information/cli: Icinga application loader (version: r2.12.3-1) [2022-04-28 14:06:00 +0200] information/cli: Loading configuration file(s). [2022-04-28 14:06:00 +0200] information/ConfigItem: Committing config item(s). [2022-04-28 14:06:00 +0200] warning/ApplyRule: Apply rule 'ping6' (in /etc/icinga2/conf.d/services.conf: 34:1-34:21) for type 'Service' does not match anywhere! [2022-04-28 14:06:00 +0200] information/ConfigItem: Instantiated 1 NotificationComponent. [2022-04-28 14:06:00 +0200] information/ConfigItem: Instantiated 1 SyslogLogger. [2022-04-28 14:06:00 +0200] information/ConfigItem: Instantiated 1 CheckerComponent. [2022-04-28 14:06:00 +0200] information/ConfigItem: Instantiated 1 User. [2022-04-28 14:06:00 +0200] information/ConfigItem: Instantiated 1 UserGroup. [2022-04-28 14:06:00 +0200] information/ConfigItem: Instantiated 1 ScheduledDowntime. [2022-04-28 14:06:00 +0200] information/ConfigItem: Instantiated 3 TimePeriods. [2022-04-28 14:06:00 +0200] information/ConfigItem: Instantiated 3 Zones. [2022-04-28 14:06:00 +0200] information/ConfigItem: Instantiated 3 ServiceGroups. [2022-04-28 14:06:00 +0200] information/ConfigItem: Instantiated 14 Services. [2022-04-28 14:06:00 +0200] information/ConfigItem: Instantiated 1 IcingaApplication. [2022-04-28 14:06:00 +0200] information/ConfigItem: Instantiated 1 Host. [2022-04-28 14:06:00 +0200] information/ConfigItem: Instantiated 2 NotificationCommands. [2022-04-28 14:06:00 +0200] information/ConfigItem: Instantiated 15 Notifications. [2022-04-28 14:06:00 +0200] information/ConfigItem: Instantiated 2 HostGroups. [2022-04-28 14:06:00 +0200] information/ConfigItem: Instantiated 1 Endpoint. [2022-04-28 14:06:00 +0200] information/ConfigItem: Instantiated 1 Downtime. [2022-04-28 14:06:00 +0200] information/ConfigItem: Instantiated 2 Dependencies. [2022-04-28 14:06:00 +0200] information/ConfigItem: Instantiated 1 FileLogger. [2022-04-28 14:06:00 +0200] information/ConfigItem: Instantiated 235 CheckCommands. [2022-04-28 14:06:00 +0200] information/ScriptGlobal: Dumping variables to file '/var/cache/icinga2/icinga2.vars' [2022-04-28 14:06:00 +0200] information/cli: Finished validating the configuration file(s).


* If you run multiple Icinga 2 instances, the `zones.conf` file (or `icinga2 object list --type Endpoint` and `icinga2 object list --type Zone`) from all affected nodes: only one instance

## Additional context
dgoetz commented 2 years ago

There is some discussion in https://github.com/Icinga/icinga2/issues/1869 which lead to the current behaviour as the most required behaviour and which was closed without a solution fitting all needs.

davixd commented 2 years ago

Today I just walked into the same problem (I guess). What I try to achieve?

check_hypervisor_status = Checks if the RHEV Hypervisor is running probably. check_nrpe_vm = Checks if the NRPE-Agent is running probably on the VM. check_nrpe_rhev_agent = Icinga connects via NRPE at the vm and checks if the RHEV agent is running probably.

Now I'm using the following dependency rules: 1.) if check_hypervisor_status fails, do not notify me about check_nrpe_rhev_agent 2.) if check_nrpe_vm fails, do not notify me about check_nrpe_rhev_agent

Now I get still notified about check_nrpe_rhev_agent when check_hypervisor_status failed or check_nrpe_vm failed as by @errror reported. I guess it comes from the multi-parent dependencies report: https://github.com/Icinga/icinga2/issues/1869.

In a big monitoring environment its important to have both options a "or & and" relation option to match the dependency rule.

I'm really looking forward to be able to use both relations dependencies.

davixd commented 1 year ago

Hi guys, is there a estimated range if or when this ticket gonna be in progress? This week we had a outage of our virtualization center, which did lead with the current solution: https://github.com/Icinga/icinga2/issues/1869 to about 3000 unnecessary notifications. As you can imagine our head of department was not amused by it at all. Since two or more dependencies are now by default interpreted as redundancy. We also use network devices or LDAP server, where the idea of redundancy dependency is welcome and used. But we also need to have a solution to setup dependencies as not redundancy. So its really needed to have a option to set in DSL a "and" & "or" option in the dependency settings. Otherwise the idea of the dependency feature will stay always unfulfilled and not fully usable for bigger environments. So my struggle is real and I can just hope not to have further outages. Or I will have to use again the deprecated check_multi to start to sum up checks together to get less notifications in case of a outage.

maggu commented 1 year ago

Just bit me too.

A colleague wanted to specify that eight remote checks are dependent on one remote check. It didn't work. Took me a long while to realize it was because remote checks in our installation have a dependency to the remote executor (in our case NRPE checks to NRPE), and the remote executor wasn't down.

Al2Klimov commented 9 months ago

Hello @errror!

Please upgrade to v2.14 and consult https://icinga.com/blog/2023/10/11/dependency-redundancy-groups-in-icinga-2-14/

Best, A/K