glensc / nagios-plugin-check_raid

Nagios/Icinga/Sensu plugin to check current server's RAID status ⛺
143 stars 84 forks source link

false CRITICAL on areca #167

Open chanlists opened 7 years ago

chanlists commented 7 years ago

Hi,

it seems that on my system the plugin produces a false CRITICAL with an areca controller:

Output of check_raid -d:

./check_raid.sh -p areca -d
check_raid 4.0.6-dev
Visit <https://github.com/glensc/nagios-plugin-check_raid#reporting-bugs> how to report bugs

DEBUG EXEC: /usr/local/sbin/areca-cli rsf info at /root/check_raid/nagios-plugin-check_raid-master/lib/App/Monitoring/Plugin/CheckRaid/Plugin.pm line 385.
DEBUG EXEC: /usr/local/sbin/areca-cli disk info at /root/check_raid/nagios-plugin-check_raid-master/lib/App/Monitoring/Plugin/CheckRaid/Plugin.pm line 385.
CRITICAL: areca:[Array#1(SYSTEM): Normal, Array#2(ATOMOPTIK): Normal, Array#3(AG Morgner 2TB): Normal, Array#4(QUANTUS): Normal, Array#5(BACKUP): Normal, Drive Assignment: 9,10=Array#1 11,12=Array#2 13,14=Array#3 15,16=Array#4 17,18,23,24=Array#5 19,20,21,22=Free]

Output of each command from check_raid -d

/usr/local/sbin/areca-cli rsf info

 #  Name             Disks TotalCap  FreeCap MinDiskCap         State
===============================================================================
 1  SYSTEM               2 1000.0GB    0.0GB    500.0GB         Normal
 2  ATOMOPTIK            2 4000.0GB    0.0GB   2000.0GB         Normal
 3  AG Morgner 2TB       2 4000.0GB    0.0GB   2000.0GB         Normal
 4  QUANTUS              2 4000.0GB    0.0GB   2000.0GB         Normal
 5  BACKUP               4 8000.0GB    0.0GB   2000.0GB         Normal
===============================================================================
GuiErrMsg<0x00>: Success.

/usr/local/sbin/areca-cli disk info

  # Enc# Slot#   ModelName                        Capacity  Usage
===============================================================================
  1  01  Slot#1  N.A.                                0.0GB  N.A.
  2  01  Slot#2  N.A.                                0.0GB  N.A.
  3  01  Slot#3  N.A.                                0.0GB  N.A.
  4  01  Slot#4  N.A.                                0.0GB  N.A.
  5  01  Slot#5  N.A.                                0.0GB  N.A.
  6  01  Slot#6  N.A.                                0.0GB  N.A.
  7  01  Slot#7  N.A.                                0.0GB  N.A.
  8  01  Slot#8  N.A.                                0.0GB  N.A.
  9  02  SLOT 01 WDC WD5003ABYX-01WERA2            500.1GB  SYSTEM
 10  02  SLOT 02 ST3500418AS                       500.1GB  SYSTEM
 11  02  SLOT 03 SEAGATE ST2000NM0001             2000.4GB  ATOMOPTIK
 12  02  SLOT 04 SEAGATE ST2000NM0001             2000.4GB  ATOMOPTIK
 13  02  SLOT 05 WDC WD2002FYPS-02W3B0            2000.4GB  AG Morgner 2TB
 14  02  SLOT 06 WDC WD2002FYPS-02W3B0            2000.4GB  AG Morgner 2TB
 15  02  SLOT 07 TOSHIBA MK2001TRKB               2000.4GB  QUANTUS
 16  02  SLOT 08 TOSHIBA MK2001TRKB               2000.4GB  QUANTUS
 17  02  SLOT 09 SEAGATE ST32000444SS             2000.4GB  BACKUP
 18  02  SLOT 10 SEAGATE ST32000444SS             2000.4GB  BACKUP
 19  02  SLOT 11 TOSHIBA MK2001TRKB               2000.4GB  Free
 20  02  SLOT 12 TOSHIBA MK2001TRKB               2000.4GB  Free
 21  02  SLOT 13 Hitachi HUA722020ALA330          2000.4GB  Free
 22  02  SLOT 14 Hitachi HUA722020ALA330          2000.4GB  Free
 23  02  SLOT 15 SEAGATE ST32000444SS             2000.4GB  BACKUP
 24  02  SLOT 16 SEAGATE ST32000444SS             2000.4GB  BACKUP
 25  02  SLOT 17 N.A.                                0.0GB  N.A.
 26  02  SLOT 18 N.A.                                0.0GB  N.A.
 27  02  SLOT 19 N.A.                                0.0GB  N.A.
 28  02  SLOT 20 N.A.                                0.0GB  N.A.
===============================================================================
GuiErrMsg<0x00>: Success.

Additional environment details:

Using master. Thanks for looking into this,

Christian

ssamson-tis commented 7 years ago

I think the problem come from "Free" disks. From the sources:

assume critical if Usage is not one of:

            # - existing Array name
            # - HotSpare
            # - Rebuild

Indeed, I have a different output from "cli64 vsf info":

 [root@host ~]# cli64 rsf info
  #  Name                     Disks      Total       Free  State          
 ===============================================================================
  1  Raid Set # 000               8   4800.0GB      0.0GB  Normal        
 ===============================================================================
 GuiErrMsg<0x00>: Success.

The MinDiskCap colomn doesn't exist, so I need to patch check_raid.pl

*** check_raid.pl.ori   2017-05-26 02:02:36.000000000 +0200
--- check_raid.pl   2017-05-29 14:16:27.797695357 +0200
*************** $fatpacked{"App/Monitoring/Plugin/CheckR
*** 1374,1380 ****
                \s+\d+      # Disks
                \s+\S+      # TotalCap
                \s+\S+      # FreeCap
-               \s+\S+      # MinDiskCap/DiskChannels
                \s+(\S+)\s* # State
            $}x);

--- 1374,1379 ----
chanlists commented 7 years ago

Ah, I see. Yes, I have checked that if I convert the free drives to hot spares, it does not complain. On the other hand, shouldn't "Free" be an acceptable state for a drive do be in? If I patch the plugin like this, it won't complain about free drives anymore...

                } elsif ($usage =~ /HotSpare/) {
                        # hotspare is OK
                        push(@{$drivestatus{$array_name}}, $id);
+               } elsif ($usage =~ /Free/) {
+                      # Free is OK
+                       push(@{$drivestatus{$array_name}}, $id);
                } elsif ($usage =~ /Pass Through/) {
                        # Pass Through is OK
                        push(@{$drivestatus{$array_name}}, $id);

Would this be acceptable? Cheers,

Christian