SteScho / manubulon-snmp

Set of Icinga/Nagios plugins to check hosts and hardware with the SNMP protocol.
GNU General Public License v2.0
73 stars 71 forks source link

No out of space check_snmp_storage warning on btrfs #83

Closed waja closed 1 year ago

waja commented 1 year ago

Hi,

in Debian was reported the following issue with the attached changes fixing this:

We've noticed out of space condition in one of btrfs filesystems monitored with check_snmp_storage; problem was not detected by check_snmp_storage.

Filesystem status:

root@mysrv:~# btrfs fi df -b /mnt Data, single: total=2222194688, used=2222194688 System, DUP: total=8388608, used=16384 System, single: total=4194304, used=0 Metadata, DUP: total=484835328, used=163921920 Metadata, single: total=8388608, used=0 GlobalReserve, single: total=16777216, used=0

root@mysrv:~# df -B1 /mnt Filesystem 1B-blocks Used Available Use% Mounted on /dev/mapper/myvg 3221225472 2566848512 0 100% /mnt

Net-snmp snmp infos for this fs:

hrStorageTable (used by check_snmp_storage):

iso.3.6.1.2.1.25.2.3.1.1.69 = INTEGER: 69 // hrStorageIndex iso.3.6.1.2.1.25.2.3.1.2.69 = OID: iso.3.6.1.2.1.25.2.1.4 // hrStorageType iso.3.6.1.2.1.25.2.3.1.3.69 = STRING: "/mnt" // hrStorageDescr iso.3.6.1.2.1.25.2.3.1.4.69 = INTEGER: 4096 // hrStorageAllocationUnits iso.3.6.1.2.1.25.2.3.1.5.69 = INTEGER: 786432 // hrStorageSize iso.3.6.1.2.1.25.2.3.1.6.69 = INTEGER: 626672 // hrStorageUsed

dskTable (not used by check_snmp_storage):

iso.3.6.1.4.1.2021.9.1.1.19 = INTEGER: 19 // dskIndex iso.3.6.1.4.1.2021.9.1.2.19 = STRING: "/mnt" // dskPath iso.3.6.1.4.1.2021.9.1.3.19 = STRING: "/dev/mapper/myvg" // dskDevice iso.3.6.1.4.1.2021.9.1.4.19 = INTEGER: -1 // dskMinimum iso.3.6.1.4.1.2021.9.1.5.19 = INTEGER: 10 // dskMinPercent iso.3.6.1.4.1.2021.9.1.6.19 = INTEGER: 3145728 // dskTotal (Total size of the disk/partion (kBytes)) iso.3.6.1.4.1.2021.9.1.7.19 = INTEGER: 0 // dskAvail (Available space on the disk) iso.3.6.1.4.1.2021.9.1.8.19 = INTEGER: 2506688 // dskUsed (Used space on the disk) iso.3.6.1.4.1.2021.9.1.9.19 = INTEGER: 80 // dskPercent (Percentage of space used on disk) iso.3.6.1.4.1.2021.9.1.10.19 = INTEGER: 0 // dskPercentNode (Percentage of inodes used on disk) iso.3.6.1.4.1.2021.9.1.11.19 = Gauge32: 3145728 // dskTotalLow Total size of the disk/partion (kBytes). Together with dskTotalHigh composes 64-bit number) iso.3.6.1.4.1.2021.9.1.12.19 = Gauge32: 0 // dskTotalHigh (Total size of the disk/partion (kBytes). Together with dskTotalLow composes 64-bit number.) iso.3.6.1.4.1.2021.9.1.13.19 = Gauge32: 0 // dskAvailLow (Available space on the disk (kBytes). Together with dskAvailHigh composes 64-bit number.) iso.3.6.1.4.1.2021.9.1.14.19 = Gauge32: 0 // dskAvailHigh (Available space on the disk (kBytes). Together with dskAvailLow composes 64-bit number.) iso.3.6.1.4.1.2021.9.1.15.19 = Gauge32: 2506688 // dskUsedLow (Used space on the disk (kBytes). Together with dskUsedHigh composes 64-bit number.) iso.3.6.1.4.1.2021.9.1.16.19 = Gauge32: 0 // dskUsedHigh (Used space on the disk (kBytes). Together with dskUsedLow composes 64-bit number.) iso.3.6.1.4.1.2021.9.1.100.19 = INTEGER: 1 // dskErrorFlag (Error flag signaling that the disk or partition is under the minimum required space configured for it.) iso.3.6.1.4.1.2021.9.1.101.19 = STRING: "/mnt: less than 10% free (= 0%)" // dskErrorMsg (A text description providing a warning and the space left on the disk.)

The cause of problem is that in btrfs free space may be less than total-used and by default check_snmp_storage checks used space which was in this case about 80% (with 0% available in the same time and OS was throwing OOS errors on write).

The solution is to configure warn/crit levels for %free not %used and use avail from dskTable because hrStorageTable does not provide this info (check_snmp_storage calculates free=total-used which is wrong for btrfs as above).

This patch works ok for us (this allows one to use new -u switch to use dskTable and its avail info instead of default hrStorageTable and its free=total-used calculation). This also adds a few spaces to plugin output for better message readability.