Open mohierf opened 7 years ago
As suggested on the IRC channel, using a new state (UNTESTED
, -1) for the initial state would be of some interest:
More comments and ides are welcomed 😉
I would like to add a split in the 'unknown' status.
2 scenario's:
in both scenario's alignak cannot determine the state of the service (while the service itself may be up/down/etc ... we just can't check ...) but the cause is different.
In scenario 1 the monitoring people should correct the plugin In scenario 2 the sysadmins need to make sure the binaries/scripts are present on the target.
I feel having both scenario's under 'unknown' without being able to differ between the 2 is not good. Specifically for reporting (SLA) it makes a huge difference ... Scenario 1 is beyong the control of the owner of the service (application, sysadmin, etc). While scenario 2 is the responsibility of the app owner
Notifications for scenario 1 and 2 go to different people/teams. Which is not possible if they both report the same state ...
Not want another state, to see if check has been done, check the last_check date. Add this state will come with a complexity and cases when don't think...
@ddurieux :
I think there should be more "states" possible. For example To integrate new things into the future
I would also allow the user to change the status mapping e.g.: OK = ERROR ERROR = WARNING ...
@spea1
is the last thing smth like this ?
http://shinken.readthedocs.io/en/latest/07_advanced/result-modulations.html
I agree with @fjvt for this. It is not the role f the framework to do such things.
And concerning all the new states, it is the same response. Please have a look to the result modulations feature that would allow such things probably
@mohierf Result modulations (are they in alignak ? i assume they are since alignak is a fork) will only fix the modification of states.
Result modulation will not allow to do what both of us wanted (IE split unknown into more than 1 state of have more states).
It only allows to switch between the 4 existing states (0 ok, 1 warning, 2 critical, 3 unknown)
Having only 4 types of state is imo not dynamic enough. In todays modern environments and combined with business rules you need more states ...
Services are not always 'up' or 'down' they can be somewhere in between.
See also my example about the 'unknown' status ... Impossible to detect if the unknown comes from the platform or from the target (well it is possible to detect this in your plugin BUT you can't have alignak notify differnt people since Warning/critical are allready used ...)
I feel that adding more states would make alignak a lot more future proof ...
I thought there was a Pending state as in Shinken? The pending state (same as untested) is a valid state.
@xkilian : in Shinken, the Pending state is only used as long as a service check has not yet been launched. As far as I remember this state is only a running state because while in Pending state the host/service has its initial state, as defined in the configuration files.
But I like your idea to have a Pending state, indicating that the host/service has not yet been checked 😉
We have removed this pending
state to use the initial_state
and we defined it by default as UNREACHABLE
for host and service...
I'm not very agree to add a new state
This issue to sum-up the Alignak states management made for the hosts and services.
Initial state: A new host/service that has not been checked is set in its configured
initial_state
. Currently, the default initial state isUNREACHABLE
(x
or 4) for an host andUNREACHABLE
(x
or 4) for a service.Checks plugins states: When a check plugin is executed, its exit code determines the host/service state identifier (
state_id
).For an host (plugin code -> state identifier -> state):
This tricky 1->2 is for passive checks... Indeed, only an exit code of 0 says that the host is UP 😉
For a service (plugin code -> state identifier -> state):
Note that the Nagios legacy plugins will never return 4 as an exit code... It is an Alignak internal value used when a service attached to an host is unreachable because the service's host is down.
Freshness check: When the freshness check is enabled and the freshness threshold expires, the host/service state is set accordingly to the
freshness_state
configured. Currently, the default freshness state isUNREACHABLE
(x
or 4) for an host andUNREACHABLE
(x
or 4) for a service.