Napsty / check_smart

Monitoring Plugin to check hard drives, solid state drives and NVMe drives using SMART
https://www.claudiokuenzler.com/monitoring-plugins/check_smart.php
GNU General Public License v3.0
67 stars 20 forks source link

Display messages for drives with UNKNOWN status #89

Closed nabertrand closed 1 year ago

nabertrand commented 1 year ago

The plugin currently discards messages for drives that have an UNKNOWN status. This patch causes those messages to be included in the output. Example:

$ /usr/lib64/nagios/plugins/check_smart.pl -b 40 -g /dev/sg8 -i 'cciss,[1-21]' -q
UNKNOWN: [cciss,21] - No health status line found[cciss,21] - [cciss,21] -  --- Other drives OK|
Napsty commented 1 year ago

I finally had a few minutes to look at this. I like the idea. But there might be users which use the glob expression to quickly parse through all drives and willingly ignore UNKNOWN status. If they update the plugin they might get a ton of alerts on servers which they are monitoring like this.

So I suggest we add a parameter to disable the unknown drives in the output. E.g. --skip-unknown-drive or --ignore-unknown-drives . By default this should be enabled though.

Napsty commented 1 year ago

OK I misunderstood the PR. Now I see the need to implement this:

root@nas:~# ./check_smart.pl -g '/dev/sd[a-z]' -i auto
UNKNOWN: [/dev/sda] - Device is clean --- [/dev/sdb] - Device is clean --- [/dev/sdc] - Device is clean --- [/dev/sdd] - Device is clean --- [/dev/sdf] - Device is clean|

In this system, the drive /dev/sde is a MMC/Memory Stick. The plugin exits with status UNKNOWN but the output does not indicate which drive caused the unknown status.

With your PR, this makes more sense now:

root@nas:~/check_smart# ./check_smart.pl -g '/dev/sd[a-z]' -i auto
UNKNOWN: [/dev/sde] - No health status line found[/dev/sde] - [/dev/sde] -  --- [/dev/sda] - Device is clean --- [/dev/sdb] - Device is clean --- [/dev/sdc] - Device is clean --- [/dev/sdd] - Device is clean --- [/dev/sdf] - Device is clean|

Thanks to the additional output, the drive /dev/sde is immediately shown as the cause for the UNKNOWN status. Saves troubleshooting time.

Awesome, thanks!