glensc / nagios-plugin-check_raid

Nagios/Icinga/Sensu plugin to check current server's RAID status ⛺
144 stars 84 forks source link

Unknown status in dual HP raid controller (hpssacli plugin) #145

Closed IsmaelSF closed 7 years ago

IsmaelSF commented 8 years ago

HP Apollo 4510 with the following setup:

The execution with debug ON results are:

# check_raid.pl -d
DEBUG EXEC: /proc/mdstat at ./raid.pl line 474.
DEBUG EXEC: /sbin/hpssacli controller all show status at ./raid.pl line 474.
DEBUG EXEC: /sbin/hpssacli controller slot=1 (HBA Mode) logicaldrive all show at ./raid.pl line 474.
DEBUG EXEC: /sbin/hpssacli controller slot=0 logicaldrive all show at ./raid.pl line 474.
UNKNOWN: hpssacli:[Smart HBA H244br: Array A(OK)[LUN1:OK], Smart Array P840: ]

Content of every EXEC:

# cat /proc/mdstat
Personalities :
unused devices: <none>
# /sbin/hpssacli controller slot=1 (HBA Mode) logicaldrive all show
-bash: syntax error near unexpected token `('
# /sbin/hpssacli controller slot=0 logicaldrive all show

Smart HBA H244br in Slot 0 (Embedded)

   array A

      logicaldrive 1 (838.3 GB, RAID 1, OK)
glensc commented 8 years ago

controller all show status output missing!

IsmaelSF commented 8 years ago

Ups...attached

# /sbin/hpssacli controller all show status

Smart Array P840 in Slot 1 (HBA Mode)
   Controller Status: OK

Smart HBA H244br in Slot 0 (Embedded) (RAID Mode)
   Controller Status: OK

Thank you for your work!!

glensc commented 8 years ago

please try git master or cherry-pick the ffcd5768c7bc0b2148bd9e1d1254190005de3616 change that's linked to this ticket

IsmaelSF commented 8 years ago

Result:

Good, one of the errors has gone!!

[root@ tmp] # perl ./check_raid.pl -d
Visit <https://github.com/glensc/nagios-plugin-check_raid#reporting-bugs> how to report bugs

DEBUG EXEC: /proc/mdstat at ./check_raid2.pl line 482.
DEBUG EXEC: /sbin/hpssacli controller all show status at ./check_raid2.pl line 482.
DEBUG EXEC: /sbin/hpssacli controller slot=0 logicaldrive all show at ./check_raid2.pl line 482.
DEBUG EXEC: /sbin/hpssacli controller slot=1 logicaldrive all show at ./check_raid2.pl line 482.
UNKNOWN: hpssacli:[Smart HBA H244br: Array A(OK)[LUN1:OK], Smart Array P840: ]

Now the "/sbin/hpssacli controller slot=1 logicaldrive all show" execute OK after removing (HBA mode) but the real problem is that the state is always unknown. In that "HBA Mode" no virtualdrive is configured at all, physicaldrives are pass untouch to Operating System. An option can be avoid RAID health check in that mode and only do physical check to see if a physical drive in not OK.

Output in that controller of physical drives show command:

[root@tmp] # /sbin/hpssacli controller slot=0 physicaldrive all show

Smart HBA H244br in Slot 0 (Embedded)

   array A

      physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 900.1 GB, OK)
      physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS, 900.1 GB, OK)

[root@tmp] # /sbin/hpssacli controller slot=1 physicaldrive all show

Smart Array P840 in Slot 1 (HBA Mode)

   HBA Drives

      physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 6001.1 GB, OK)
      physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS, 6001.1 GB, OK)
      physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SAS, 6001.1 GB, OK)
      physicaldrive 1I:1:4 (port 1I:box 1:bay 4, SAS, 6001.1 GB, OK)
      physicaldrive 1I:1:5 (port 1I:box 1:bay 5, SAS, 6001.1 GB, OK)
      physicaldrive 1I:1:6 (port 1I:box 1:bay 6, SAS, 6001.1 GB, OK)
      physicaldrive 1I:1:7 (port 1I:box 1:bay 7, SAS, 6001.1 GB, OK)
      physicaldrive 1I:1:8 (port 1I:box 1:bay 8, SAS, 6001.1 GB, OK)
      physicaldrive 1I:1:9 (port 1I:box 1:bay 9, SAS, 6001.1 GB, OK)
      physicaldrive 1I:1:10 (port 1I:box 1:bay 10, SAS, 6001.1 GB, OK)
      physicaldrive 1I:1:11 (port 1I:box 1:bay 11, SAS, 6001.1 GB, OK)
      physicaldrive 1I:1:12 (port 1I:box 1:bay 12, SAS, 6001.1 GB, OK)
      physicaldrive 1I:1:13 (port 1I:box 1:bay 13, SAS, 6001.1 GB, OK)
      physicaldrive 1I:1:14 (port 1I:box 1:bay 14, SAS, 6001.1 GB, OK)
      physicaldrive 1I:1:15 (port 1I:box 1:bay 15, SAS, 6001.1 GB, OK)
      physicaldrive 1I:1:16 (port 1I:box 1:bay 16, SAS, 6001.1 GB, OK)
      physicaldrive 1I:1:17 (port 1I:box 1:bay 17, SAS, 6001.1 GB, OK)
      physicaldrive 1I:1:18 (port 1I:box 1:bay 18, SAS, 6001.1 GB, OK)
      physicaldrive 1I:1:19 (port 1I:box 1:bay 19, SAS, 6001.1 GB, OK)
      physicaldrive 1I:1:20 (port 1I:box 1:bay 20, SAS, 6001.1 GB, OK)
      physicaldrive 1I:1:21 (port 1I:box 1:bay 21, SAS, 6001.1 GB, OK)
      physicaldrive 1I:1:22 (port 1I:box 1:bay 22, SAS, 6001.1 GB, OK)
      physicaldrive 1I:1:23 (port 1I:box 1:bay 23, SAS, 6001.1 GB, OK)
      physicaldrive 1I:1:24 (port 1I:box 1:bay 24, SAS, 6001.1 GB, OK)
      physicaldrive 1I:1:25 (port 1I:box 1:bay 25, SAS, 6001.1 GB, OK)
      physicaldrive 1I:1:26 (port 1I:box 1:bay 26, SAS, 6001.1 GB, OK)
      physicaldrive 1I:1:27 (port 1I:box 1:bay 27, SAS, 6001.1 GB, OK)
      physicaldrive 1I:1:28 (port 1I:box 1:bay 28, SAS, 6001.1 GB, OK)
      physicaldrive 1I:1:29 (port 1I:box 1:bay 29, SAS, 6001.1 GB, OK)
      physicaldrive 1I:1:30 (port 1I:box 1:bay 30, SAS, 6001.1 GB, OK)
      physicaldrive 1I:1:65 (port 1I:box 1:bay 65, SAS, 6001.1 GB, OK)
      physicaldrive 1I:1:66 (port 1I:box 1:bay 66, SAS, 6001.1 GB, OK)
      physicaldrive 1I:1:67 (port 1I:box 1:bay 67, SAS, 6001.1 GB, OK)
      physicaldrive 1I:1:68 (port 1I:box 1:bay 68, SAS, 6001.1 GB, OK)
      physicaldrive 2I:1:31 (port 2I:box 1:bay 31, SAS, 6001.1 GB, OK)
      physicaldrive 2I:1:32 (port 2I:box 1:bay 32, SAS, 6001.1 GB, OK)
      physicaldrive 2I:1:33 (port 2I:box 1:bay 33, SAS, 6001.1 GB, OK)
      physicaldrive 2I:1:34 (port 2I:box 1:bay 34, SAS, 6001.1 GB, OK)
      physicaldrive 2I:1:35 (port 2I:box 1:bay 35, SAS, 6001.1 GB, OK)
      physicaldrive 2I:1:36 (port 2I:box 1:bay 36, SAS, 6001.1 GB, OK)
      physicaldrive 2I:1:37 (port 2I:box 1:bay 37, SAS, 6001.1 GB, OK)
      physicaldrive 2I:1:38 (port 2I:box 1:bay 38, SAS, 6001.1 GB, OK)
      physicaldrive 2I:1:39 (port 2I:box 1:bay 39, SAS, 6001.1 GB, OK)
      physicaldrive 2I:1:40 (port 2I:box 1:bay 40, SAS, 6001.1 GB, OK)
      physicaldrive 2I:1:41 (port 2I:box 1:bay 41, SAS, 6001.1 GB, OK)
      physicaldrive 2I:1:42 (port 2I:box 1:bay 42, SAS, 6001.1 GB, OK)
      physicaldrive 2I:1:43 (port 2I:box 1:bay 43, SAS, 6001.1 GB, OK)
      physicaldrive 2I:1:44 (port 2I:box 1:bay 44, SAS, 6001.1 GB, OK)
      physicaldrive 2I:1:45 (port 2I:box 1:bay 45, SAS, 6001.1 GB, OK)
      physicaldrive 2I:1:46 (port 2I:box 1:bay 46, SAS, 6001.1 GB, OK)
      physicaldrive 2I:1:47 (port 2I:box 1:bay 47, SAS, 6001.1 GB, OK)
      physicaldrive 2I:1:48 (port 2I:box 1:bay 48, SAS, 6001.1 GB, OK)
      physicaldrive 2I:1:49 (port 2I:box 1:bay 49, SAS, 6001.1 GB, OK)
      physicaldrive 2I:1:50 (port 2I:box 1:bay 50, SAS, 6001.1 GB, OK)
      physicaldrive 2I:1:51 (port 2I:box 1:bay 51, SAS, 6001.1 GB, OK)
      physicaldrive 2I:1:52 (port 2I:box 1:bay 52, SAS, 6001.1 GB, OK)
      physicaldrive 2I:1:53 (port 2I:box 1:bay 53, SAS, 6001.1 GB, OK)
      physicaldrive 2I:1:54 (port 2I:box 1:bay 54, SAS, 6001.1 GB, OK)
      physicaldrive 2I:1:55 (port 2I:box 1:bay 55, SAS, 6001.1 GB, OK)
      physicaldrive 2I:1:56 (port 2I:box 1:bay 56, SAS, 6001.1 GB, OK)
      physicaldrive 2I:1:57 (port 2I:box 1:bay 57, SAS, 6001.1 GB, OK)
      physicaldrive 2I:1:58 (port 2I:box 1:bay 58, SAS, 6001.1 GB, OK)
      physicaldrive 2I:1:59 (port 2I:box 1:bay 59, SAS, 6001.1 GB, OK)
      physicaldrive 2I:1:60 (port 2I:box 1:bay 60, SAS, 6001.1 GB, OK)
      physicaldrive 2I:1:61 (port 2I:box 1:bay 61, SAS, 6001.1 GB, OK)
      physicaldrive 2I:1:62 (port 2I:box 1:bay 62, SAS, 6001.1 GB, OK)
      physicaldrive 2I:1:63 (port 2I:box 1:bay 63, SAS, 6001.1 GB, OK)
      physicaldrive 2I:1:64 (port 2I:box 1:bay 64, SAS, 6001.1 GB, OK)

Let me know if you need more information or traces about it.

glensc commented 7 years ago

@IsmaelSF the output for following commands still missing:

DEBUG EXEC: /sbin/hpssacli controller slot=0 logicaldrive all show at ./check_raid2.pl line 482.
DEBUG EXEC: /sbin/hpssacli controller slot=1 logicaldrive all show at ./check_raid2.pl line 482.
glensc commented 7 years ago

Current plugin doesn't check physical drives at all...

glensc commented 7 years ago

HBA mode is now parsed: f95c611

however need your output to decide further.

glensc commented 7 years ago

ps: HBA just means 'host bus adapter'

IsmaelSF commented 7 years ago

Logicaldrive output:

[root@tmp] # /sbin/hpssacli controller slot=0 logicaldrive all show

Smart HBA H244br in Slot 0 (Embedded)

   array A

      logicaldrive 1 (838.3 GB, RAID 1, OK)
[root@tmp] # /sbin/hpssacli controller slot=1 logicaldrive all show

Error: The specified device does not have any logical drives.

You can find the Physical drive output in 30Jul comment

glensc commented 7 years ago

85423dd - the "Error: The specified device does not have any logical drives." are marked with --noraid=STATE. you can specify which slots to monitor --plugin-option=hpacucli-target=slot=0

same as #151

use check_raid.pl from snapshot release once build finishes, or git master

i plan to make release soon anyway.