Icinga / icinga2

The core of our monitoring platform with a powerful configuration language and REST API.
https://icinga.com/docs/icinga2/latest
GNU General Public License v2.0
2.03k stars 579 forks source link

[dev.icinga.com #9773] Add log for missing EventCommand for command_endpoints #3196

Closed icinga-migration closed 9 years ago

icinga-migration commented 9 years ago

This issue has been migrated from Redmine: https://dev.icinga.com/issues/9773

Created by emptywee on 2015-07-29 20:20:33 +00:00

Assignee: mfriedrich Status: Resolved (closed on 2015-07-31 14:05:03 +00:00) Target Version: 2.3.9 Last Update: 2015-08-12 08:34:00 +00:00 (in Redmine)

Icinga Version: 2.3.8
Backport?: Already backported
Include in Changelog: 1

Hello. Created a simple eventcommand:

object EventCommand "cmd_service_restart" {
  import "plugin-event-command"

  command = "/usr/bin/test $service.state_id$ -gt 0 && /usr/bin/sudo /sbin/service $service_name$ restart"
}

Created a service:

apply Service "crond" {
  import "generic-service"

  check_command = "procs"

  if (host.vars.remote_client) {
    command_endpoint = host.vars.remote_client
  }
  vars.procs_command = "crond"
  vars.procs_critical = "1:"

  event_command = "cmd_service_restart"
  vars.service_name = host.vars.crond_name

  assign where host.vars.os == "Linux"
}

Defined a host with the following template:

template Host "generic-linux-host" {
  import "generic-host"

  vars.os = "Linux"

  vars.disks["disk"] = {
  }

  vars.disks["disk /"] = {
    disk_partitions = "/"
  }

  vars.notification["mail"] = {
    groups = [ "icingaadmins" ]
  }

  vars.crond_name = "crond"
  enable_event_handler = true
}

Brought down crond on the remote host and seeing this on the checker node:

[2015-07-29 19:29:54 +0000] notice/Checkable: State Change: Checkable dc1udtlhtst02.stack.qadev.corp!crond soft state change from OK to CRITICAL detected.
[2015-07-29 19:29:54 +0000] notice/Checkable: Executing event handler 'cmd_service_restart' for service 'dc1udtlhtst02.stack.qadev.corp!crond'
[2015-07-29 19:29:54 +0000] notice/ApiListener: Sending message to 'dc1udtlhtst02.stack.qadev.corp'
[2015-07-29 19:29:54 +0000] notice/ApiListener: Relaying 'event::CheckResult' message

EventCommand object on the checker:

Object 'cmd_service_restart' of type 'EventCommand':
  % declared in '/var/lib/icinga2/api/zones/global-templates/events.conf', lines 1:0-1:40
  * __name = "cmd_service_restart"
  * arguments = null
  * command = "/usr/bin/test $service.state_id$ -gt 0 && /usr/bin/sudo /sbin/service $service_name$ restart"
    % = modified in '/var/lib/icinga2/api/zones/global-templates/events.conf', lines 4:3-4:106
  * env = null
  * execute
    % = modified in '/usr/share/icinga2/include/command.conf', lines 47:2-47:22
    * type = "Function"
  * name = "cmd_service_restart"
  * templates = [ "cmd_service_restart", "plugin-event-command" ]
    % = modified in '/var/lib/icinga2/api/zones/global-templates/events.conf', lines 1:0-1:40
    % = modified in '/usr/share/icinga2/include/command.conf', lines 46:1-46:44
  * timeout = 60
  * type = "EventCommand"
  * vars = null
  * zone = "global-templates"

Apparently, no command is being really executed anywhere. I even tried the example from the docs with "by_ssh" event. Same result. Not sure how to debug it further. This is really critical when there's no ability to re-act for events.

# icinga2 --version
icinga2 - The Icinga 2 network monitoring daemon (version: v2.3.8)

Copyright (c) 2012-2015 Icinga Development Team (https://www.icinga.org)
License GPLv2+: GNU GPL version 2 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Application information:
  Installation root: /usr
  Sysconf directory: /etc
  Run directory: /var/run
  Local state directory: /var
  Package data directory: /usr/share/icinga2
  State path: /var/lib/icinga2/icinga2.state
  Objects path: /var/cache/icinga2/icinga2.debug
  Vars path: /var/cache/icinga2/icinga2.vars
  PID path: /var/run/icinga2/icinga2.pid
  Application type: icinga/IcingaApplication

System information:
  Operating system: Linux
  Operating system version: 2.6.32-573.1.1.el6.x86_64
  Architecture: x86_64
  Distribution: Red Hat Enterprise Linux Server release 6.7 (Santiago)

I hope I am not missing anything myself here.

Changesets

2015-07-31 14:04:03 +00:00 by (unknown) 0712a02d1b3f769f06ef3c49108f48626a539f78

Add a warning if EventCommand is not found when using command_endpoint

fixes #9773

2015-08-12 08:33:44 +00:00 by (unknown) 1b3f377809a20a97656b17e5891665d7893fc229

Add a warning if EventCommand is not found when using command_endpoint

fixes #9773

Relations:

icinga-migration commented 9 years ago

Updated by emptywee on 2015-07-30 14:15:24 +00:00

It seems like this is not fired when command_endpoint is set to a remote host. When I brought down crond service on the checker itself, eventcommand was executed. When event happens for a service with command_endpoint set to a remote client address, here's what happens (I have added more debug on the checker):

Checker debug.log:

[2015-07-30 14:05:27 +0000] notice/Checkable: State Change: Checkable dc1udtlhtst02.stack.qadev.corp!crond soft state change from OK to CRITICAL detected.
[2015-07-30 14:05:27 +0000] notice/Checkable: Executing event handler 'cmd_service_restart' for service 'dc1udtlhtst02.stack.qadev.corp!crond'
[2015-07-30 14:05:27 +0000] notice/Checkable: Firing ec->Execute. Handler 'cmd_service_restart' for service 'dc1udtlhtst02.stack.qadev.corp!crond'
[2015-07-30 14:05:27 +0000] notice/ApiListener: Sending message to 'dc1udtlhicn01.stack.qadev.corp'
[2015-07-30 14:05:27 +0000] notice/Checkable: if endpoint true. Handler 'cmd_service_restart' for service 'dc1udtlhtst02.stack.qadev.corp!crond'
[2015-07-30 14:05:27 +0000] notice/Checkable: Params set for Host: dc1udtlhtst02.stack.qadev.corp. Handler 'cmd_service_restart' for service 'dc1udtlhtst02.stack.qadev.corp!crond'
[2015-07-30 14:05:27 +0000] notice/Checkable: Params set for Service: crond. Handler 'cmd_service_restart' for service 'dc1udtlhtst02.stack.qadev.corp!crond'
[2015-07-30 14:05:27 +0000] notice/ApiListener: Sending message to 'dc1udtlhtst02.stack.qadev.corp'
[2015-07-30 14:05:27 +0000] notice/Checkable: Listener true, sending message (sync). Handler 'cmd_service_restart' for service 'dc1udtlhtst02.stack.qadev.corp!crond'

Remote client:

[2015-07-30 14:05:27 +0000] notice/ApiClient: Received 'event::ExecuteCommand' message from 'dc1udtlhtst01.stack.qadev.corp'
[2015-07-30 14:05:27 +0000] notice/Process: Running command '/usr/lib64/nagios/plugins/check_procs' '-C' 'crond' '-c' '1:' '-w' '250': PID 28922
[2015-07-30 14:05:27 +0000] notice/Process: PID 28922 ('/usr/lib64/nagios/plugins/check_procs' '-C' 'crond' '-c' '1:' '-w' '250') terminated with exit code 2
[2015-07-30 14:05:27 +0000] notice/ApiListener: Sending message to 'dc1udtlhtst01.stack.qadev.corp'
[2015-07-30 14:05:27 +0000] notice/ApiClient: Received 'event::ExecuteCommand' message from 'dc1udtlhtst01.stack.qadev.corp'
[2015-07-30 14:05:28 +0000] notice/ApiClient: Received 'log::SetLogPosition' message from 'dc1udtlhtst01.stack.qadev.corp'
[2015-07-30 14:05:29 +0000] notice/CheckerComponent: Pending checkables: 0; Idle checkables: 0; Checks/s: 0
[2015-07-30 14:05:29 +0000] debug/ApiListener: Not connecting to Endpoint 'dc1udtlhtst02.stack.qadev.corp' because that's us.
[2015-07-30 14:05:29 +0000] debug/ApiListener: Not connecting to Endpoint 'dc1udtlhtst01.stack.qadev.corp' because we're already connected to it.
[2015-07-30 14:05:29 +0000] notice/ApiListener: Setting log position for identity 'dc1udtlhtst01.stack.qadev.corp': 2015/07/29 13:16:12

It seems like the remote client receives the message, but ignores it for some reason. I am going to add more debug, maybe I'll find a clue.

icinga-migration commented 9 years ago

Updated by emptywee on 2015-07-30 15:20:35 +00:00

Yeah, I think I figured it out. Remote client was looking for EventCommand 'cmd_service_restart':

[2015-07-30 15:16:52 +0000] notice/ApiEvents: *** command_type is event command
[2015-07-30 15:16:52 +0000] notice/ApiEvents: *** EventCommand::GetByname(cmd_service_restart) returned false.

So I have to register them on each remote client. Probably not a bug. Sorry, guys :)

icinga-migration commented 9 years ago

Updated by emptywee on 2015-07-30 15:26:46 +00:00

Yes, that was it. Please, add this do debug log with something meaningful? This would help a lot and save time for somebody like me in the future :)

lib/icinga/apievents.cpp:

        } else if (command_type == "event_command") {
                if (!EventCommand::GetByName(command))
                {
                Log(LogNotice, "ApiEvents")
                    << "EventCommand::GetByname(" << command << ") returned false. Probably this EventCommand object is not defined on this Icinga2 instance.";

                        return Empty;
                }
        } else
                return Empty;
icinga-migration commented 9 years ago

Updated by mfriedrich on 2015-07-31 13:33:51 +00:00

I'll add such a log message as warning - though you'll only see that on the remove instance. The check command is sent back, maybe we'll come up with a better approach similar to #9749.

icinga-migration commented 9 years ago

Updated by mfriedrich on 2015-07-31 13:34:01 +00:00

icinga-migration commented 9 years ago

Updated by Anonymous on 2015-07-31 14:05:03 +00:00

Applied in changeset 0712a02d1b3f769f06ef3c49108f48626a539f78.

icinga-migration commented 9 years ago

Updated by gbeutner on 2015-08-12 08:34:00 +00:00