centreon / centreon-plugins

Collection of standard plugins to discover and gather cloud-to-edge metrics and status across your whole IT infrastructure.
https://www.centreon.com
Apache License 2.0
310 stars 274 forks source link

[hardware] --critical-count-... --no-component proper detection #2062

Closed UrBnW closed 4 years ago

UrBnW commented 4 years ago

Hi,

Let's take this example where a hardware component (out of 2 selected) is not detected :

--plugin=hardware::server::dell::idrac::snmp::plugin --mode=hardware --component='^(coolingunit|coolingdevice)$'
OK: All 8 components are ok [8/8 cooling devices].

With --critical-count-coolingdevice=10:, it returns : CRITICAL: '8' components 'coolingdevice' checked

With --critical-count-coolingunit=1:, it returns : OK: All 8 components are ok [8/8 cooling devices].

With --no-component, it returns : OK: All 8 components are ok [8/8 cooling devices].

In other words, when several components are selected, there's no way to get an alert if one of these components is missing. I then think --warning-count- / --critical-count- should be proceeded before ignoring the component. I even wonder whether or not --no-component should be applied per-component, rather than globally.

Thx 👍

garnier-quentin commented 4 years ago

You set the threshold on the number you should have. I don't understand.

garnier-quentin commented 4 years ago

You want an alert if one of the component is missing (so it's 9 or 8). The first threshold is ok.

UrBnW commented 4 years ago

No, here I want an alert if coolingunit is not detected. And it does not work, whatever the option I use, --critical-count-coolingunit=1: or -no-component. My first example shows you that --critical-count only works if the component exists (here with coolingdevice).

garnier-quentin commented 4 years ago

i understand. you want to get the perfdata even if it's 0 for components.

UrBnW commented 4 years ago

Yes, so that one who monitors a new device using a defined template will get an alert saying : "hey, I'm asked to monitor coolingunit, but there's no coolingunit on your device !" And then he can take the required actions.

As said above, I even wonder whether or not --no-component should be applied per-component (so that here coolingunit would trigger), rather than globally.

garnier-quentin commented 4 years ago

i won't change --no-component. I can add an option to check component with 0 count.

UrBnW commented 4 years ago

In this case, if --critical-count-coolingunit works, it's OK for me :+1:

garnier-quentin commented 4 years ago

Yes it will.

garnier-quentin commented 4 years ago

You can test with option: --no-component-count

UrBnW commented 4 years ago

Works perfectly, many thanks @garnier-quentin 👍

--critical-count-coolingdevice=1: --critical-count-coolingunit=1: --no-component-count
CRITICAL: '0' components 'coolingunit' checked |
--critical-count-coolingdevice=1: --critical-count-coolingunit=0 --no-component-count
OK: All 8 components are ok [8/8 cooling devices, 0/0 cooling units]. |