Icinga / icinga2

The core of our monitoring platform with a powerful configuration language and REST API.
https://icinga.com/docs/icinga2/latest
GNU General Public License v2.0
2.03k stars 578 forks source link

Add support for in-memory negation of CheckCommand results #6937

Open dnsmichi opened 5 years ago

dnsmichi commented 5 years ago

Expected Behavior

Current Behavior

One needs to incorporate the negate plugin into the current CheckCommand which is cumbersome.

Possible Solution

Add support for built-in (in-memory) status negation. Handling should be available in all CheckCommand objects and automatically inject the check result creation in libmethods. This involves not only pluginchecktask but also our internal in-memory checks.

  vars.negate_ok = Critical
  vars.negate_critical = OK

  vars.negate_replace = "substr"

Context

@dgoetz asked for this on NET Discourse.

Al2Klimov commented 5 years ago

Hey guys, I just thought:

Why not to abstract this to a general check result transformation configurable via our DSL (and implemented completely via our DSL?) and this negate stuff being a concrete use case.

E.g. I've a bunch of DNSBL checks in my private setup. They regularily time out, but I don't care and would like to just turn UNKNOWN to OK.

dnsmichi commented 5 years ago

I've removed this from 2.11, since the argument is valid. Needs re-iteration on the implementation details.

dnsmichi commented 5 years ago

@dgoetz rescheduling this for later this year. This needs a better design round after 2.11 has been released.

dnsmichi commented 5 years ago

Something like this:

object CheckCommand "..." {

  negate_state = true
  negate_output = {{ 
    //make the original output available in this scope
   return output.replace("a", "b")
  }}
}

The scope is a problem, since you may not have access to the original objects on a command endpoint. So this can only be used as command output replacement strategy.

Just a thought from yesterday evening, I'm not sure about it yet.

Al2Klimov commented 5 years ago

I've got the following idea on this:

Either Checkable and/or CheckCommand have a new config attribute "transform_check_result". This shall be either not set (null) or a function which takes one parameter – the check result – and may modify it:

transform_check_result = function(cr) {
  cr.exit_status = 3 - cr.exit_status
}
dnsmichi commented 5 years ago

Such an "event interceptor" could make other things possible, as we know the almighty "check_snmp" whose output is far too generic.

Still, I'd like to take another round here with a design story and follow-up decisions with @lippserd and @bobapple after 2.11 is done.

htriem commented 4 years ago

Let's look into the negate CheckCommand implementation and then decide on further iteration on check result transformation. Discussed this with @dnsmichi offline.

Al2Klimov commented 3 years ago

You're not working on this, right?

Isotop7 commented 2 years ago

Would also be useful as full-blown closures in the DSL to manipulate other returned values:

apply Service for (snmp_command => command in host.vars.snmp) {
  import "generic-service"

  check_command = "snmp"
  display_name = snmp_command

  vars.snmp_oid = command.oid
  vars.snmp_warn = command.warn
  vars.snmp_crit = command.crit

  vars.snmp_perfdata = {
    return command.perfdata / 10
  }

  assign where host.vars.snmp
}

This would eliminate a plethora of wrapper scripts.

+1 for this

julianbrost commented 2 years ago

For completeness to also have that references from here: this was brought up over at https://community.icinga.com/t/use-closure-like-statement-to-manipulate-response/8796 and I also suggested some alternative configuration over there (which I think would be more feasible to implement):

object Service ... {
  transform_checkresult = (cr) => {
    // update `cr.performance_data` here
  }
}
htriem commented 2 years ago

Unassigned myself, because I'm not working on this (and I don't see me working on this in the future.) I like the idea by @julianbrost.

Al2Klimov commented 1 year ago

For the record, I'd also benefit from https://github.com/Icinga/icinga2/issues/6937#issuecomment-509146502 . But -for the case not everyone likes DSL functions- I'm open to alternatives. E.g. to a per check command and/or checkable mapping of target exit codes by plugin's exit code. But now while writing these lines, I can't think of any more in that mapping beyond OK -> crit. and vice-versa. So a bool should do it as well.

Al2Klimov commented 1 year ago

💡 https://github.com/Icinga/icinga2/issues/6937#issuecomment-509146502 is already possible, at least theoretically. Create your service as usual, preferably -for your own sake- w/o notifications. Then create a dummy service taking its output from the first service and negating the exit code via DSL.