Icinga / icinga2

The core of our monitoring platform with a powerful configuration language and REST API.
https://icinga.com/docs/icinga2/latest
GNU General Public License v2.0
2.03k stars 579 forks source link

ICINGA service object output is not correct #9723

Open TheCry opened 1 year ago

TheCry commented 1 year ago

Describe the bug

Good morning all, I am currently trying to optimize our call script. For this I use the objects from ICINGA Object Types https://icinga.com/docs/icinga-2/latest/doc/09-object-types/#service

state, last_state and last_hard_state

According to the documentation a number should be output here. The current state (0 = OK, 1 = WARNING, 2 = CRITICAL, 3 = UNKNOWN)

At least that's how I understand it. With "state/last_state" no integer but OK, WARNING, CRITICAL, UNKNOWN is output.

Now I wanted to work with the object "last_hard_state". Here I noticed that integers are output. With a CRITICAL I receive accordingly a 2.

But the object last_hard_state always returns only CRITICAL. For a service that should immediately issue an alarm when it changes from OK to CRITICAL (1/1), the object last_hard_state contains the value CRITICAL although the service was previously in the hard state OK.

Expected behavior

If a service is in HARD state OK the object last_hard_state should output OK when the service changes to a new HARD state. Only when the service comes to a new hard state, the object last_hard_state in the previous hard state may change.

The objects state, last_state and last_hard_state should all have the same output. Either integer or alphanumeric

Your Environment

Icinga DB Web version (System - About): 1.0.2 Icinga Web 2 version (System - About):2.11.4 Web browser: Chrome 110.0.5481.100 Icinga 2 version (icinga2 --version): r2.13.7-1 Icinga DB version (icingadb --version): v1.1.0 PHP version used (php --version): 7.4 Server operating system and version: Debian 10

Al2Klimov commented 1 year ago

Have you tried the previous_hard_state attribute of the current check result?

TheCry commented 1 year ago

Have you tried the previous_hard_state attribute of the current check result?

I don't find the previous_hard_state object in the documentation. Does the object exist?

Al2Klimov commented 1 year ago

Host/Service has the last_check_result attribute which has the previous_hard_state attribute.

TheCry commented 1 year ago

Host/Service has the last_check_result attribute which has the previous_hard_state attribute.

Sorry. I'd checked the whole documentation for the service attributes and i don't find anything which is called previous_hard_state.

image

Al2Klimov commented 1 year ago
Icinga 2 (version: v2.13.0-653-g66b039df9)
Type $help to view available commands.
<1> => h = Host()
null
<2> => h.last_check_result = CheckResult()
null
<3> => h.last_check_result.previous_hard_state
0.000000
<4> =>
Al2Klimov commented 1 year ago

Sorry. I'd checked the whole documentation for the service attributes and i don't find anything which is called previous_hard_state.

Thanks for the report, we'll fix this missing documentation.

miken32 commented 1 year ago

the object last_hard_state contains the value CRITICAL although the service was previously in the hard state OK

We are experiencing the same thing using a notification command. The service's last state is always critical. We have tried using Service.last_hard_state (returns CRITICAL), Service.last_state (returns 2), and also Service.check_result.last_hard_state (returns 99.) We are currently testing with Service.check_result.previous_hard_state; why was it felt there was a need for two separate properties that sound like they should return exactly the same thing?

I can't say for sure but I think this notification command was working fine with the monitoring plugin, but started showing this behaviour once we migrated to icingadb.

yhabteab commented 1 year ago

I am currently trying to optimize our call script. For this I use the objects from ICINGA Object Types https://icinga.com/docs/icinga-2/latest/doc/09-object-types/#service

According to the documentation a number should be output here. The current state (0 = OK, 1 = WARNING, 2 = CRITICAL, 3 = UNKNOWN)

At least that's how I understand it. With "state/last_state" no integer but OK, WARNING, CRITICAL, UNKNOWN is output.

Hi, please share a bit more context about your calling script. Are you retrieving the state attributes via an API request or using the macro function within some Icinga 2 command to retrieve them? I think there is a little confusion with Service Runtime Attributes and Service Runtime Macros.

miken32 commented 1 year ago

And just to follow up on my previous comment, Service.last_check_result.previous_hard_state (via macro("$service.last_check_result.previous_hard_state$")) is working for us, returning 0 as expected.

julianbrost commented 1 year ago

(via macro("service.check_result.previous_hard_state")) is working for us

I doubt that works, it should at least have $ signs around for it to do something useful, like in macro("$service.check_result.previous_hard_state$"). (Didn't test that, this is intended more as a hint for users potentially finding this issue who might run into problems with this.)

Al2Klimov commented 1 year ago

Btw. it's last_check_result I guess.

julianbrost commented 1 year ago

Regarding documentation inconsistencies:

Service Config Snippet ``` object Service "state-macro-test" { host_name = "master-1" check_command = "dummy" vars.dummy_text = {{{ service.state: $service.state$ service.state_id: $service.state_id$ service.state_type: $service.state_type$ service.last_state: $service.last_state$ service.last_state_id: $service.last_state_id$ service.last_state_type: $service.last_state_type$ service.last_state_change: $service.last_state_change$ }}}.replace("\t", "") } ```

Values returned by macro strings:

service.state:              OK
service.state_id:           0
service.state_type:         HARD
service.last_state:         OK
service.last_state_id:      0
service.last_state_type:    HARD
service.last_state_change:  1685612112

Which is consistent with https://icinga.com/docs/icinga-2/latest/doc/03-monitoring-basics/#service-runtime-macros

And runtime attributes returned via the API:

{
    "results": [
        {
            "attrs": {
                "last_hard_state": 0,
                "last_state": 0,
                "last_state_type": 1,
                "state": 0,
                "state_type": 1
            },
            "joins": {},
            "meta": {},
            "name": "master-1!state-macro-test",
            "type": "Service"
        }
    ]
}

Which is also consistent with https://icinga.com/docs/icinga-2/latest/doc/09-object-types/#service

So documentation should be correct. I understand that it can be confusing that on gives 0 for state while the other one says OK, but that's probably impossible to change now without breaking every second installation (number is a wild guess, but these are probably some of the more commonly accessed values).

julianbrost commented 1 year ago

But the object last_hard_state always returns only CRITICAL. For a service that should immediately issue an alarm when it changes from OK to CRITICAL (1/1), the object last_hard_state contains the value CRITICAL although the service was previously in the hard state OK.

Based on the current implementation, last_hard_state is to be interpret as the last (as in most recent one) state that was a hard state, so if the checkable is in a hard state, it represents the current state of the checkable.

Are these names great? Certainly not as they are even inconsistent with each other, for example the last in last_state always refers to the previous state.

What do we do from here on? I don't think changing individual values in some way or another would not result in any less confusion, instead, it will just be inconsistent between versions and break things (the current values aren't completely wrong, they're just not what some might expect given the names). There should be some concept of how to represent all useful values in an understandable format (which will be extra tricky if we'd want to keep the existing ones as-is for compatibility).

miken32 commented 1 year ago

@julianbrost @Al2Klimov thanks, updated my previous comment to fix the typos.