Icinga / icinga2

The core of our monitoring platform with a powerful configuration language and REST API.
https://icinga.com/docs/icinga2/latest
GNU General Public License v2.0
2.01k stars 576 forks source link

check_timeout doesn't override CheckCommand’s timeout on remote command_endpoint checks #6992

Closed sinky closed 4 years ago

sinky commented 5 years ago

With the following config (generated by director), the Service-Check services timed out after 10s and not after 15s as expected. (on Windows)

object command "check_services-windows" {
    import "plugin-check-command"
    command = [
        "C:\\Windows\\system32\\WindowsPowerShell\\v1.0\\powershell.exe"
    ]
    timeout = 10s
    arguments += {
        "-ExecutionPolicy" = {
            order = -2
            value = "bypass"
        }
        "-exclude" = "$service_windows_exclude$"
        "-exclude_defaults" = "FooBar"
        "-file" = {
            order = -1
            required = true
            value = "C:\\icinga_checks\\check_services_wmi.ps1"
        }
    }
}

template Service "ts_default" {
    max_check_attempts = "2"
    check_period = "immer"
    check_interval = 5m
    retry_interval = 2m
    enable_notifications = true
    enable_active_checks = true
    enable_passive_checks = true
    enable_flapping = true
    enable_perfdata = true
}

template Service "ts_agent-check" {
    import "ts_default"

    command_endpoint = host_name
}

template Service "ts_services-windows" {
    import "ts_agent-check"

    check_command = "check_services-windows"
    check_timeout = 15s
}

apply Service "services" {
    import "ts_services-windows"

    assign where "th_windows-status" in host.templates
    vars.notification = "1"

    import DirectorOverrideTemplate
}

Expected Behavior

Service-Check should timeout after 15s As described here https://icinga.com/docs/icinga2/latest/doc/09-object-types/#service at Configuration Attribute check_timeout

Current Behavior

Service-Check should timeout after 10s

Steps to Reproduce (for bugs)

Create a CheckCommand with a timeout of 10s Create a ServiceTemplate with a check_timeout of 15s

Your Environment

Al2Klimov commented 5 years ago

Hello @sinky and thank you for reporting!

your configuration is invalid. Please try CheckCommand instead of just command.

Best, AK

sinky commented 5 years ago

Sorry i cannot change this. This config is generated by Icinga Director. i've got this from the /icingaweb2/director/config/files page

image

Al2Klimov commented 5 years ago

Please could you figure out where Icinga 2 stores this particular Director config...

grep -rnFwe 'object command' /var/lib/icinga2/api

... and change it temporarily?

sinky commented 5 years ago

The grep command get's no results, but the file /var/lib/icinga2/api/zones/director-global/director/commands.conf contains the correct object CheckCommand definitions. Possible a display bug in Icinga Director?

But back to the real issue. The service and it's CheckCommand work fine, except that the CheckCommand's timeout value isn't overridden by a timeout defined on service(template).

Al2Klimov commented 5 years ago

@Thomas-Gelf @lazyfrosch Does this work as intended?

lazyfrosch commented 5 years ago

Yeah this seems to be a UI rendering bug when displaying config files. The text config should be fine.

Why timeout is not working correctly needs to be tested.

Al2Klimov commented 5 years ago

I've whiteboxtested the code – everything seems clean. Very strange..

Al2Klimov commented 5 years ago

Just remembered: Icinga 2 adjusts the actual check execution times by random values to emulate actual scheduling of checks with equal execution times.

Please could you re-try with somewhat larger timeouts?

sinky commented 5 years ago

Initialy i‘ve used 30 Seconds on checkcommand and 60 seconds on the Service.

And it timed out After 30,xx second. But i can check it again on Monday if necessary.

Is it possible that the service timeout can only be lower than the timeout on the CheckCommand?

Al2Klimov commented 5 years ago

Yes, please re-check. I've both whiteboxtested and blackboxtested Icinga and it works for me on MacOS (the code is the same across all OSes):

object CheckCommand "timeout30" {
    import "plugin-check-command"
    command = [ "sleep", "35" ]
    timeout = 30
}

object Host NodeName {
    check_command = "hostalive"
    enable_active_checks = false
}

object Service "timeout60" {
    host_name = NodeName
    check_command = "timeout30"
    check_timeout = 60
}
dnsmichi commented 5 years ago

check_timeout overrides the timeout variable of a CheckCommand object upon direct execution prior to spawning a check process. https://github.com/Icinga/icinga2/blob/master/lib/methods/pluginchecktask.cpp#L38

One thing which comes to mind - this is for local execution only, and not passed via command endpoint checks to my knowledge. In order to reproduce the issue, one explicitly needs a remote command endpoint check then possibly with a sleep command, and a lower timeout defined.

Additional info: command_endpoint checks use a virtual checkable host object on the agent where check_timeout isn't set at all. There is no service object available on the remote end.

Al2Klimov commented 5 years ago

@dnsmichi Shouldn't we let one component do both jobs not to have to adjust two of them every time?

Al2Klimov commented 5 years ago

Note: The Director (master) shows the commands as expected.

dnsmichi commented 5 years ago

check_timeout was invented to allow specific timeout overrides coming from the host/service object. The CheckCommand's timeout attribute cannot be easily overridden otherwise. That's the culprit here, as the CheckCommand exists on the remote host, but not the host/service object itself.

The patch shouldn't be that hard, but it involves changed cluster messages and as such it only qualifies for 2.11+.

Cheers, Michael