Napsty / check_esxi_hardware

Monitoring Plugin to check the hardware of VMware ESXi servers.
https://www.claudiokuenzler.com/monitoring-plugins/check_esxi_hardware.php
70 stars 18 forks source link

The --ignore option should support regex #35

Closed adoom42 closed 5 years ago

adoom42 commented 5 years ago

https://github.com/Napsty/check_esxi_hardware/blob/df56d7373deb5d915da0a6a61f8ccf93b34bc9b9/check_esxi_hardware.py#L733

It looks like the --ignore option expects a CSV list of static entries to ignore.

There are cases where it would be useful to ignore a range of entries.

Example:

20190430 08:46:07 Check classe VMware_StorageExtent
20190430 08:46:07   Element Name = Disk 1 on HPSA1 : Port 1I Box 1 Bay 1 : 186GB : Data Disk : Not Authenticated
20190430 08:46:07     Element HealthState = 10
20190430 08:46:07 Global exit set to WARNING
20190430 08:46:07   Element Name = Disk 2 on HPSA1 : Port 1I Box 1 Bay 2 : 186GB : Data Disk : Not Authenticated
20190430 08:46:07     Element HealthState = 10
20190430 08:46:07 Global exit set to WARNING
20190430 08:46:07   Element Name = Disk 3 on HPSA1 : Port 2I Box 1 Bay 7 : 558GB : Data Disk : OK
20190430 08:46:07     Element HealthState = 5
20190430 08:46:07   Element Name = Disk 4 on HPSA1 : Port 2I Box 1 Bay 8 : 558GB : Data Disk : OK
20190430 08:46:07     Element HealthState = 5

The first two disks are reported as "Not Authenticated" because they are 3rd party disks, not the officially-supported HP models. I would like to ignore this particular error. The Element Name is dynamic, so the --ignore option doesn't work.

If it accepted regex input, it could match. In this case, something like:

--ignore 'Disk (.+?) Data Disk : Not Authenticated'
Napsty commented 5 years ago

@adoom42 good idea, will take a look at this in the next days. Just out of curiosity, what kind of third party disks are you using in that HP server? Are they not affected by the "high fan usage" (see http://dascomputerconsultants.com/HPCompaqServerDrives.htm).

adoom42 commented 5 years ago

The fan & temperature readings look normal.

Napsty commented 5 years ago

@adoom42 This should be working now with commit https://github.com/Napsty/check_esxi_hardware/commit/8bcf7309d9680a6f4890a777f813e272763de983 . Use the new "-r" parameter to enable regular expression parsing of each "ignore" element. Can you please test it?

Without regular expression the plugin makes a 1:1 comparison of the element names given with the -i parameter:

# ./check_esxi_hardware.py -H esxhost -U root -P secret -V auto -v -i '.*Cache','CPU1 Level-1 Cache'
[...]
20190503 17:15:18 Check classe CIM_Memory
20190503 17:15:18   Element Name = CPU1 Level-1 Cache
20190503 17:15:18     (ignored)
20190503 17:15:18   Element Name = CPU1 Level-2 Cache
20190503 17:15:18     Element Op Status = 0
20190503 17:15:18   Element Name = CPU1 Level-3 Cache
20190503 17:15:18     Element Op Status = 0
20190503 17:15:18   Element Name = CPU2 Level-1 Cache
20190503 17:15:18     Element Op Status = 0
20190503 17:15:18   Element Name = CPU2 Level-2 Cache
20190503 17:15:18     Element Op Status = 0
20190503 17:15:18   Element Name = CPU2 Level-3 Cache
20190503 17:15:18     Element Op Status = 0
20190503 17:15:18   Element Name = Memory
20190503 17:15:18     Element Op Status = 2
[...]
OK - Server: Cisco Systems Inc UCSB-B200-M4 s/n: XXXXXXXXXXX Chassis S/N: XXXXXXXXXXX  System BIOS: B200M4.3.2.3a.0.0226182120 2018-02-26

As you can see, only one element (CPU1 Level-1 Cache) was ignored, because it matched one of the strings from the ignore list.

Now with regular expression enabled (using -r):

# ./check_esxi_hardware.py -H esxhost -U root -P secret -V auto -v -i '.*Cache','CPU1 Level-1 Cache' -r
[...]
20190503 17:17:40 Check classe CIM_Memory
20190503 17:17:41   Element Name = CPU1 Level-1 Cache
20190503 17:17:41     (ignored through regex)
20190503 17:17:41     (ignored through regex)
20190503 17:17:41     (ignored)
20190503 17:17:41   Element Name = CPU1 Level-2 Cache
20190503 17:17:41     (ignored through regex)
20190503 17:17:41     (ignored)
20190503 17:17:41   Element Name = CPU1 Level-3 Cache
20190503 17:17:41     (ignored through regex)
20190503 17:17:41     (ignored)
20190503 17:17:41   Element Name = CPU2 Level-1 Cache
20190503 17:17:41     (ignored through regex)
20190503 17:17:41     (ignored)
20190503 17:17:41   Element Name = CPU2 Level-2 Cache
20190503 17:17:41     (ignored through regex)
20190503 17:17:41     (ignored)
20190503 17:17:41   Element Name = CPU2 Level-3 Cache
20190503 17:17:41     (ignored through regex)
20190503 17:17:41     (ignored)
20190503 17:17:41   Element Name = Memory
20190503 17:17:41     Element Op Status = 2
[...]
OK - Server: Cisco Systems Inc UCSB-B200-M4 s/n: XXXXXXXXXXX Chassis S/N: XXXXXXXXXXX  System BIOS: B200M4.3.2.3a.0.0226182120 2018-02-26

This time all the ".*Cache" elements were ignored. An additional info is shown in the verbose output (ignored through regex) in case someone ignores too many elements with a bad regex.

Let me know if this works for your use case, too. I will merge and release this enhancement soon.

adoom42 commented 5 years ago

It works!

Thanks for the quick response.

Napsty commented 5 years ago

Official release in version 20190510.