Closed icinga-migration closed 9 years ago
Updated by mfriedrich on 2015-04-26 07:27:26 +00:00
That kind of output logging should never appear, as the check alerts happen using a defined parsable pattern ("SERVICE ALERT: ..."). Can you provide a little more (log) context for what is happening in detail? I am not able to reproduce this one.
Updated by icilib0815 on 2015-04-27 12:16:16 +00:00
The service NRPE for host lbsf06 is a simple "/usr/lib/nagios/plugins/check_nrpe -H $HOSTADDRESS$" And the host lbsf06 is switched off. The host check is in state DOWN.
First I thought it is a problem with the defined service dependency, but even when i remove the service dependency, I get the same log entries in icinga.log Actually the dependeny is wokring, I get the weird log entry only for the service NRPE and not for the dependent services, which are also all checks via check_nrpe
The service dependency: define servicedependency{ host_name lbsf06 service_description NRPE dependent_host_name lbsf06 dependent_service_description CPU Load, Prozesse, Swap, Partition root, Partition home, Partition boot , Bonding IF, Cron, Syslogd, ntp time notification_failure_criteria u,w,c execution_failure_criteria u,w,c }
Updated by berk on 2015-05-18 12:18:18 +00:00
Updated by mfriedrich on 2015-08-04 17:48:08 +00:00
It seems there's a mismatch between hosts and services and their out of bounds handling.
hosts
/* make sure the return code is within bounds */
else if (queued_check_result->return_code < 0 || queued_check_result->return_code > 3) {
logit(NSLOG_RUNTIME_WARNING, TRUE, "Warning: Return code of %d for check of host '%s' was out of bounds.%s\n", queued_check_result->return_code, temp_host->name, (queued_check_result->return_code == 126 || queued_check_result->return_code == 127) ? " Make sure the plugin you're trying to run actually exists." : "");
my_free(temp_host->plugin_output);
my_free(temp_host->long_plugin_output);
my_free(temp_host->perf_data);
asprintf(&temp_host->plugin_output, "(Return code of %d is out of bounds%s)", queued_check_result->return_code, (queued_check_result->return_code == 126 || queued_check_result->return_code == 127) ? " - plugin may be missing" : "");
result = STATE_CRITICAL;
}
services
/* make sure the return code is within bounds */
else if (queued_check_result->return_code < 0 || queued_check_result->return_code > 3) {
if (queued_check_result->return_code == 126) {
asprintf(&temp_service->plugin_output, "The command defined for service %s is not an executable\n", queued_check_result->service_description);
} else if (queued_check_result->return_code == 127) {
asprintf(&temp_service->plugin_output, "The command defined for service %s does not exist\n", queued_check_result->service_description);
} else {
asprintf(&temp_service->plugin_output, "Return code of %d is out of bounds", queued_check_result->return_code);
}
logit(NSLOG_RUNTIME_WARNING, TRUE, "%s", temp_service->plugin_output);
temp_service->current_state = STATE_CRITICAL;
}
Updated by mfriedrich on 2015-08-04 18:16:10 +00:00
Test config attached.
Updated by mfriedrich on 2015-08-04 18:17:28 +00:00
Applied in changeset 82bff8ee85014f9536084bb4c4d649e16ffeb01f.
Updated by mfriedrich on 2016-04-07 21:17:31 +00:00
This issue has been migrated from Redmine: https://dev.icinga.com/issues/9157
Created by icilib0815 on 2015-04-22 12:28:29 +00:00
Assignee: mfriedrich Status: Resolved (closed on 2015-08-04 18:17:28 +00:00) Target Version: 1.14 Last Update: 2015-08-04 18:17:28 +00:00 (in Redmine)
check_nrpe (2.15) returns 255 when the target host is not reachable (I also opened there a bug report, earlier versions returned 2 in this case). Everytime check_nrpe returns 255, I get only a timestamp and "Return code of 255 is out of bounds" in icinga.log, but no information about the host or service. I had to turn on debugging to find out, which checks were to blame. So I think it is a bug in icinga's logging, because host and service are missing in this case. e.g. icinga.debug: [1429607307.472741] [016.1] [pid=18124] HOST: lbsf06, SERVICE: NRPE, CHECK TYPE: Active, OPTIONS: 0, SCHEDULED: Yes, RESCHEDULE: Yes, EXITED OK: Yes, RETURN CODE: 255, OUTPUT: connect to address a.b.c.d port 5666: No route to host\nconnect to host a.b.c.d port 5666: No route to host [1429607307.472861] [016.1] [pid=18124] Service is in a non-OK state! [1429607307.472874] [016.1] [pid=18124] Host is currently DOWN/UNREACHABLE. [1429607307.472886] [016.1] [pid=18124] Assuming host is in same state as before... [1429607307.472911] [032.0] [pid=18124] Host Notification Attempt Host: 'lbsf06', Type: NORMAL, Options: 0, Current State: 1, Last Notification: Thu Jan 1 01:00:00 1970 [1429607307.472927] [001.0] [pid=18124] check_host_notification_viability() [1429607307.472939] [001.0] [pid=18124] check_time_against_period() [1429607307.472958] [032.1] [pid=18124] This host problem has already been acknowledged, so we won't send a notification out! [1429607307.472971] [032.0] [pid=18124] Notification viability test failed. No notification will be sent out. [1429607307.472983] [016.1] [pid=18124] Current/Max Attempt(s): 1/3 [1429607307.472994] [016.1] [pid=18124] Host isn't UP, so we won't retry the service check... [1429607307.473014] [016.1] [pid=18124] Rescheduling next check of service at Tue Apr 21 11:18:18 2015 [1429607307.473026] [001.0] [pid=18124] get_next_valid_time() [1429607307.473037] [001.0] [pid=18124] check_time_against_period() [1429607307.473054] [001.0] [pid=18124] schedule_service_check() [1429607307.473074] [016.0] [pid=18124] Scheduling a non-forced, active check of service 'NRPE' on host 'lbsf06' @ Tue Apr 21 11:18:18 2015
And corresponding entry in icinga.log: [1429607307] Return code of 255 is out of bounds
Attachments
Changesets
2015-08-04 18:16:30 +00:00 by mfriedrich 82bff8ee85014f9536084bb4c4d649e16ffeb01f
Relations: