Open danci1973 opened 9 years ago
please share output of these two commands:
DEBUG EXEC: /usr/bin/lsscsi -g at ./check_raid.pl line 452.
DEBUG EXEC: >&2 /usr/sbin/cciss_vol_status -v at ./check_raid.pl line 448.
and what is your kernel version, and do you use cciss
or hpsa
driver?
as lsscsi
is supposed to detect devices and fresh enough cciss_vol_status
can report individual disks.
see 3.2.0 changelog
also be sure to save all outputs of your system now, i may need additional information, i.e command outputs.
# /usr/bin/lsscsi -g
[0:0:0:0] storage HP P410i 6.60 - /dev/sg0
[0:0:0:1] disk HP LOGICAL VOLUME 6.60 /dev/sda /dev/sg1
# >&2 /usr/sbin/cciss_vol_status -v
cciss_vol_status version 1.09
The server is running RHEL 6.1 (no updates).
can you try cciss_vol_status 1.10+ ?
ps: you can write code block with triple backticks instead of indenting with 4 spaces. it's actually written in CONTRIBUTING.md
# /usr/sbin/cciss_vol_status -v
cciss_vol_status version 1.10
#./check_raid.pl -d
usage: sudo -h | -K | -k | -L | -V
usage: sudo -v [-AknS] [-g groupname|#gid] [-p prompt] [-u user name|#uid]
usage: sudo -l[l] [-AknS] [-g groupname|#gid] [-p prompt] [-U user name] [-u
user name|#uid] [-g groupname|#gid] [command]
usage: sudo [-AbEHknPS] [-r role] [-t type] [-C fd] [-g groupname|#gid] [-p
prompt] [-u user name|#uid] [-g groupname|#gid] [VAR=value] [-i|-s]
[<command>]
usage: sudo -e [-AknS] [-r role] [-t type] [-C fd] [-g groupname|#gid] [-p
prompt] [-u user name|#uid] file ...
DEBUG EXEC: /proc/mdstat at ./check_raid.pl line 452.
DEBUG EXEC: /usr/bin/lsscsi -g at ./check_raid.pl line 452.
DEBUG EXEC: >&2 /usr/sbin/cciss_vol_status -v at ./check_raid.pl line 448.
DEBUG EXEC: /usr/sbin/cciss_vol_status -V /dev/sg0 at ./check_raid.pl line 452.
Unparsed[ Failed drives:] at ./check_raid.pl line 3490, <$fh> line 7.
Unparsed[ connector 1I box 1 bay 4 HP EF0300FARMU 6SJ8ND840000N5191GHP HPD6] at ./check_raid.pl line 3490, <$fh> line 8.
Unparsed[] at ./check_raid.pl line 3490, <$fh> line 9.
Unparsed[ Total of 1 failed physical drives detected on this logical drive.] at ./check_raid.pl line 3490, <$fh> line 10.
DEBUG EXEC: /usr/sbin/hpacucli controller all show status at ./check_raid.pl line 452.
DEBUG EXEC: /usr/sbin/hpacucli controller slot=0 logicaldrive all show at ./check_raid.pl line 452.
OK: cciss:[/dev/sda(Smart Array P410i): Volume 0 (RAID 5): OK, Drives(3): 1I-1-1,1I-1-2,1I-1-3=OK]; hpacucli:[Smart Array P410i: Array A(OK)[LUN1:OK]]
# /usr/sbin/cciss_vol_status -V /dev/sg0
Controller: Smart Array P410i
Board ID: 0x3245103c
Logical drives: 1
Running firmware: 6.60
ROM firmware: 6.60
/dev/sda: (Smart Array P410i) RAID 5 Volume 0 status: OK. At least one spare drive designated. At least one spare drive has failed.
Failed drives:
connector 1I box 1 bay 4 HP EF0300FARMU 6SJ8ND840000N5191GHP HPD6
Total of 1 failed physical drives detected on this logical drive.
Physical drives: 3
connector 1I box 1 bay 1 HP EF0300FATFD JXY1BLJN HPDB OK
connector 1I box 1 bay 2 HP EF0300FATFD JXXY6ZUN HPDB OK
connector 1I box 1 bay 3 HP EF0300FATFD JXY1MARN HPDB OK
i'm working on cciss_vol_status
improvement (using hpacucli
for monitoring is not recommended by the driver developers).
so, how do you want to represent this issue's problem?
just include messages about spare drives to output? these are spare drive status messages (two of them):
. At least one spare drive designated.
At least one spare drive has failed.
or should the state be changed as well?
I think a failed spare drive is just as 'critical' as any other, so it should be reflected in the state.
Any update on when this will make into a new release?
HP Smart Array P400i with 4 drives - 3 are in RAID5 and one is a spare. The spare drive is in failed state, but check_raid doesn't report it.
The controller is detected by two plugins - cciss and hpacucli
_cciss_volstatus actually detects a failed physical drive, but doesn't say exactly which one it is:
hpacucli, as it is used now, doesn't:
However, hpacucli can show status of each physical drive, where the failure is visible:
I suggest using this command line as an additional check to improve failed drive detection.