[dev.icinga.com #10114] Service (or host) checks should allow SOFT status for OK as well - Githubissues

icinga-migration commented 9 years ago

This issue has been migrated from Redmine: https://dev.icinga.com/issues/10114

Created by leo9641 on 2015-09-07 15:10:23 +00:00

Assignee: (none) Status: New Target Version: Backlog Last Update: 2015-10-26 08:22:20 +00:00 (in Redmine)

Hi icinga team!

Please consider this request too:

https://github.com/NagiosEnterprises/nagioscore/issues/46

icinga-migration commented 9 years ago

Updated by mfriedrich on 2015-09-07 15:32:42 +00:00

Subject changed from [feature-request] Service (or host) checks should allow SOFT status for OK as well to Service (or host) checks should allow SOFT status for OK as well
Category set to Notifications

From a quick read, the feature request is to delay recovery notifications for some reason. I'm not really sure I get the problem itself, how would a soft recovery requiring additional steps in SOFT-OK then result in a HARD-OK triggering the recovery notification? That sounds pretty weird to me.

Probably you should come up with some drawing boards to illustrate the timing and intervals including all involved configuration attributes influencing the state machine.

Note: I would consider this for Icinga 2 only. We won't implement such (breaking) changes in 1.x.

icinga-migration commented 9 years ago

Updated by leo9641 on 2015-10-06 11:11:54 +00:00

dnsmichi wrote:

From a quick read, the feature request is to delay recovery notifications for some reason. I'm not really sure I get the problem itself, how would a soft recovery requiring additional steps in SOFT-OK then result in a HARD-OK triggering the recovery notification? That sounds pretty weird to me.

Probably you should come up with some drawing boards to illustrate the timing and intervals including all involved configuration attributes influencing the state machine.

Note: I would consider this for Icinga 2 only. We won't implement such (breaking) changes in 1.x.

I wrote a simple wrapper for this feature (only for gw-host check, set UP state for gw-host after successful maxhostattempts retries in a row ): https://gist.github.com/lvasiliev/6c847511e53509c8db51

'check-gw-alive-extadm' command definition define command{ command_name check-gw-alive-extadm command_line $USER3$/extadm/soft_recovery.py --hostname=$HOSTNAME$ --lasthoststate=$LASTHOSTSTATE$ --hostattempt=$HOSTATTEMPT$ --maxhostattempts=$MAXHOSTATTEMPTS$ $USER1$/check_ping -H $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100% -p 5 }

It is necessary to slow down UP-state for gw hosts, because child hosts depend on them (parents -> gw-host).

Template for gw-host (only timing): define host{ name generic-router-extadm check_interval 2 ; Switches are checked every 2 minutes retry_interval 1 ; Schedule host check retries at 1 minute intervals max_check_attempts 5 ; Check each switch 5 times (max) check_command check-gw-alive-extadm ; Default command to check if routers are "alive" }

Template for hosts (only timing): define host{ name freebsd-server-extadm ; The name of this host template check_interval 4 ; Actively check the host every 4 minutes retry_interval 1 ; Schedule host check retries at 1 minute intervals max_check_attempts 10 ; Check each FreeBSD host 10 times (max) }

I want that UP-state from DOWN for gw hosts was more slow (in case of network unstable, packets loss). In this period child hosts has UNREACHABLE state and don't send notifications. Sometimes happens that gw-host can quickly be UP from DOWN state. But checks of child hosts still return non-OK state (WARNING, CRITICAL ) and after max_check_attempts host is DOWN state. Then gw-host is DOWN state again...

I use options soft_state_dependencies=1.

icinga-migration commented 9 years ago

Updated by mfriedrich on 2015-10-26 08:22:14 +00:00

Ok. If someone comes up with a patch which does not break the existing behaviour we might have a look into it.

icinga-migration commented 9 years ago

Updated by mfriedrich on 2015-10-26 08:22:21 +00:00

Target Version set to Backlog

Icinga / icinga-core

[dev.icinga.com #10114] Service (or host) checks should allow SOFT status for OK as well #1561