Open gvfnix opened 4 years ago
Wow, 4 years later and just stumbled onto this same issue. Any updates?
Just make a check that does check_nrpe with no arguments. If it comes back successfully, then NRPE is working on that host. Then make your NRPE-based checks dependent on that check not being in a CRITICAL state, and you've just solved the problem.
Whoops. Wrong editor. Hit "Comment" accidentally. To continue...
Since Nagios doesn't know what technology you're using to check something, it can't just suspend checks when NRPE is not working. You have to do that by hand. You know. With dependencies. :-)
But what if the service_description is "check_nrpe" and is a dependent on dependent_service_description "aaa_check". We'll always get alerted on "aaa_check" first and then an additional alert on "check_nrpe" because of the alphabetical issue @gvfnix discovered.
I just checked this on version 4.4.10, and can duplicate the issue. Meaning, this causes one alert, just the check_nrpe:
define servicedependency{
host_name uniquehostnamehere
service_description check_nrpe
dependent_service_description aaa_check
execution_failure_criteria w,u,c,p
notification_failure_criteria w,u,c,p
}
This causes two alerts, both aaa_check and check_nrpe:
define servicedependency{
host_name uniquehostnamehere
service_description aaa_check
dependent_service_description check_nrpe
execution_failure_criteria w,u,c,p
notification_failure_criteria w,u,c,p
}
So what solution would you like to see implemented?
Pretty much like @gvfnix said, "service description should not influence the dependency feature in that manner". How to go about that, I'm not sure. For now, if we wanted to use the dependency feature properly, we'll have to rename our service_description tags with "0001, 0002, etc" in front of them and go from there. Would be great if display_name was working with the CGI, then we wouldn't have the cosmetic issue this renaming will cause.
So you want Nagios to process the service dependencies in the order in which they are listed in the dependency? What if you have multiple, separate dependencies; how would it know what to do, then? And this might be just for Nagios Core but Nagios XI needs to know how to create the Core config files to match the order you want when the time comes for that, so it's a sticky wicket.
Agreed, gets a little tricky. The display_name fix would help out with the overall issue in that case. Wouldn't matter what the service_description is at the point, the end user would still see the proper (descriptive) display_name in the web UI.
I'd like Nagios Core 4.4.5 to suspend checks via NRPE when NRPE port is unreachable on a host. Also I set
soft_state_dependencies=1
innagios.cfg
to use the latest check result with dependencies. To test this I've created a simple configuration in Nagios:To apply this configuration I stop Nagios, wipe out
retention.dat
andobjects.cache
files in/usr/local/nagios/var
directory and then start Nagios again.After Nagios starts, it has both services pending, but
fstab_mounted
service check is scheduled befornrpe_agent_running
:When Nagios executes the check for
fstab_mounted
service it puts the service in CRITITAL state:Then the check for
nrpe_agent_running
gets executed and both services turn red:I noticed, that if I just rename
fstab_mounted
toz_fstab_mounted
so as this service description came afternrpe_agent_running
regarding alphabetical order, then everything works fine. Both grey:Then
nrpe_agent_running
turn red:And then Nagios keeps rescheduling checks for
z_fstab_mounted
without executing them:I believe that service description should not influence the dependency feature in that manner.