Open SLoeuillet opened 9 years ago
Only my soft RAID (Intel ICH) device is detected :
sloeuillet@Domicile:~/dev/GIT$ /home/sloeuillet/check_raid.pl -l lsscsi mdstat
sloeuillet@Domicile:~/dev/GIT$ /home/sloeuillet/check_raid.pl -p mdstat OK: mdstat:[md125(643.21 GiB raid1):UU, md126(102.00 GiB raid1):UU, md127(5.16 MiB):]
sloeuillet@Domicile:~/dev/GIT$ /home/sloeuillet/check_raid.pl -p lsscsi No RAID configuration found (tried: lsscsi)
lsscsi is not real plugin, but what check_raid plugin you expect to be used for your system? also, read contributing.md, you should run check_raid via -d to capture commands and outputs it runs.
i will have to close this with no resolution if you don't provide more information! can't even figure out which plugin this applies!
lsscsi
plugin is used by hpsa/cciss plugin. do you have appropriate userspace tools installed? hpsa uses cciss_vol_status
Interested too in Marvell Support @glensc
The Marvell raid controllers use a userspace tool "mvcli", by default installed as "/opt/marvell/storage/cli/mvcli"
RPMs available fi from www datoptic.com/Download/LINUXMSU4.1.0.2019.rar
Tested on Centos 7.1, EL7.x installs need a workaround, "modprobe sg" (scsi_generic) needs to be run on time bevor mvcli, otherwise mvcli finds no devices. See RH Relnote "sg3_utils component, BZ#1186462".
[root@ovirt ~]# cat marvell.scr
info -o BLK
smart -p 2
smart -p 3
exit
[root@ovirt ~]# cat marvell.scr | /opt/marvell/storage/cli/mvcli
CLI Version: 4.1.0.9 RaidAPI Version: 5.0.0.1024
Welcome to Marvell RAID Command Line Interface.
> info -o BLK
Block Information
-----------------
Block id: 0
PD id: 2
VD id: 0
Block status: assigned
Size: 29977152 K
Starting offset: 16384 K
Block id: 4
PD id: 3
VD id: 0
Block status: assigned
Size: 29977152 K
Starting offset: 16384 K
> smart -p 2
Smart Info
ID Attribute Name Current Worst Threshhold RawValue
01 Read Error Rate 100 100 0 000000000000
05 Reallocated Sectors 100 100 0 000000000000
09 Power-On Hours Count 100 100 0 0000000005AC
0C Power Cycle Count 100 100 0 00000000000C
AB Unknown 100 100 0 000000000000
AC Unknown 100 100 0 000000000000
AD Unknown 100 100 0 000000000006
AE Unknown 100 100 0 000000000004
B4 Unused Reserved Block 0 0 0 00000000086F
B7 SATA Downshift Error 100 100 0 000000000000
B8 End-to-End error 100 100 0 000000000000
BB Reported Uncorrectable 100 100 0 000000000000
C2 HDA temperature 44 23 0 000000000038
C4 Reallocation count 100 100 0 000000000010
C5 Current pending 100 100 0 000000000000
C6 Offline scan wrong 100 100 0 000000000000
C7 UDMA CRC error rate 100 100 0 000000000000
CA Address Mark errors 100 100 0 000000000000
CE Flying height 100 100 0 000000000000
D2 Vibration During Write 100 100 0 000000000000
F6 Invalid 100 100 0 0000000042F0
F7 Invalid 100 100 0 000000007E33
F8 Invalid 100 100 0 000000009711
> smart -p 3
Smart Info
ID Attribute Name Current Worst Threshhold RawValue
01 Read Error Rate 100 100 0 000000000000
05 Reallocated Sectors 100 100 0 000000000000
09 Power-On Hours Count 100 100 0 0000000005AC
0C Power Cycle Count 100 100 0 00000000000C
AB Unknown 100 100 0 000000000000
AC Unknown 100 100 0 000000000000
AD Unknown 100 100 0 000000000007
AE Unknown 100 100 0 000000000004
B4 Unused Reserved Block 0 0 0 00000000086F
B7 SATA Downshift Error 100 100 0 000000000000
B8 End-to-End error 100 100 0 000000000000
BB Reported Uncorrectable 100 100 0 000000000000
C2 HDA temperature 75 21 0 000000000019
C4 Reallocation count 100 100 0 000000000010
C5 Current pending 100 100 0 000000000000
C6 Offline scan wrong 100 100 0 000000000000
C7 UDMA CRC error rate 100 100 0 000000000000
CA Address Mark errors 100 100 0 000000000000
CE Flying height 100 100 0 000000000000
D2 Vibration During Write 100 100 0 000000000000
F6 Invalid 100 100 0 000000006328
F7 Invalid 100 100 0 000000000C9F
F8 Invalid 100 100 0 000000003FDF
> exit
looks like need to create new plugin from scratch named mvcli
Great @glensc, let me know if there is something i can test! THX a lot
does mvcli
accept those commands via commandline as well? it would be easier to script that way than just STDIN stream
I will have a look
@olidietzel did you had look?
anyway, there's WIP plugin in repo: lib/App/Monitoring/Plugin/CheckRaid/Plugins/mvcli.pm
Hi @glensc, I'm trying to use check_raid but don't get some RAID controller related output from the tool. Here are some informations about my environment (Debian 9.2.1 Stretch with Proxmox VE on HP Microserver Gen10 X3421) and check_raid 4.0.8 (2017-09-01):
# uname -a
Linux zeus 4.13.8-2-pve #1 SMP PVE 4.13.8-28 (Wed, 29 Nov 2017 09:49:35 +0100) x86_64 GNU/Linux
# lspci | grep -i marvell
01:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9230 PCIe SATA 6Gb/s Controller (rev 11)
# dmesg | grep -i marvell
[ 2.050088] ata9.00: ATAPI: MARVELL VIRTUAL, 1.09, max UDMA/66
[ 2.104320] ata2.00: ATA-7: MARVELL Raid VD, MV.R00-0, max UDMA7
[ 2.104986] scsi 1:0:0:0: Direct-Access ATA MARVELL Raid VD 00-0 PQ: 0 ANSI: 5
[ 2.140748] scsi 8:0:0:0: Processor Marvell Console 1.01 PQ: 0 ANSI: 5
# /bin/mvcli
SG driver version 3.5.36.
CLI Version: 4.1.0.30 RaidAPI Version: 5.0.0.1067
Welcome to RAID Command Line Interface.
# /bin/mvcli info -o blk
SG driver version 3.5.36.
Block Information
-----------------
Block id: 0
PD id: 0
VD id: 0
Block status: assigned
Size: 3906936640 K
Starting offset: 16384 K
Block id: 4
PD id: 1
VD id: 0
Block status: assigned
Size: 3906936640 K
Starting offset: 16384 K
# /bin/mvcli info -o vd
SG driver version 3.5.36.
Virtual Disk Information
-------------------------
id: 0
name: VD0
status: functional
Stripe size: 64
RAID mode: RAID1
Cache mode: Not Support
size: 3815367 M
BGA status: not running
Block ids: 0 4
# of PDs: 2
PD RAID setup: 0 1
Running OS: yes
Total # of VD: 1
# /bin/mvcli smart -p 0
SG driver version 3.5.36.
SMART STATUS RETURN: OK.
Smart Info
ID Attribute Name Current Worst Threshhold RawValue Status
01 Read Error Rate 200 200 51 000000000000 OK
03 Spin-Up Time 188 184 21 0000000015D7 OK
04 Start/Stop Count 100 100 0 000000000021 OK
05 Reallocated Sectors 200 200 140 000000000000 OK
07 Seek Error Rate 200 200 0 000000000000 OK
09 Power-On Hours Count 100 100 0 000000000023 OK
0A Spin Retry Count 100 253 0 000000000000 OK
0B Calibration retry 100 253 0 000000000000 OK
0C Power Cycle Count 100 100 0 000000000021 OK
C0 Power-off retract 200 200 0 00000000000C OK
C1 Load/Unload cycle 200 200 0 000000000026 OK
C2 HDA temperature 121 117 0 00000000001D OK
C4 Reallocation count 200 200 0 000000000000 OK
C5 Current pending 200 200 0 000000000000 OK
C6 Offline scan wrong 100 253 0 000000000000 OK
C7 UDMA CRC error rate 200 200 0 000000000000 OK
C8 Write error rate 100 253 0 000000000000 OK
And some check_raid output (run from cmd line as root for testing only):
# ./check_raid -d
check_raid 4.0.8
Visit <https://github.com/glensc/nagios-plugin-check_raid#reporting-bugs> how to report bugs
Please include output of **ALL** commands in bugreport
DEBUG EXEC: /sbin/dmsetup status --noflush at ./check_raid line 484.
DEBUG EXEC: /proc/mdstat at ./check_raid line 484.
DEBUG EXEC: /bin/mvcli at ./check_raid line 484.
./check_raid --plugin=mvcli -d
check_raid 4.0.8
Visit <https://github.com/glensc/nagios-plugin-check_raid#reporting-bugs> how to report bugs
Please include output of **ALL** commands in bugreport
DEBUG EXEC: /bin/mvcli at ./check_raid line 484.
#./check_raid --plugin=mvcli
<no answere>
./check_raid --plugin=mvcli --plugin-option='mvcli blk'
<no answere>
./check_raid --plugin=mvcli --plugin-option='mvcli smart'
<no answere>
./check_raid --plugin=mvcli --plugin-option='info -o blk'
<no answere>
./check_raid --plugin=mvcli --plugin-option='smart -p 0'
<no answere>
Actually I'd just like to get an information if the RAID1 is working fine or demaged. But, so far, I don't get any reply from check_raid. Am I doing s.th. wrong? Do you need more information(s)? Anyway, thanks for building this plug-in!
@GuidoHaase the plugin was never finished, i don't own the hardware, and previous reporters failed to provide enough information to finish the plugin.
i've grabbed output of your commands, and added to unit test. so the parser is there. see https://github.com/glensc/nagios-plugin-check_raid/compare/9fd0697...46f08a5
however not sure how it would behave with multiple virtual disks. could you provide create two virtual disks and provide output from mvcli?
6f72349: parses just VD and if status is not functional
will trigger CRITICAL. don't have other test data to handle other values.
Hi @glensc,
thanks for your feedback and modifying the parser!
I'll provide a few more input strings for mvcli to test the parser soon. Please note, there are also two strings which don't work on RAID[0|1]. I'll add them too just for let you know and for testing.
however not sure how it would behave with multiple virtual disks. could you provide create two virtual disks and provide output from mvcli?
I'll do as you suggested soon. But, first I'll have to dive in my hw grove looking for two equal or most similar drives ;-)
6f72349: parses just VD and if status is not functional will trigger CRITICAL. don't have other test data to handle other values.
As far as I remember there are more states availalbe, like 'rebuilding' including %, 'paused', etc.. I'll have a look at home and we'll share the result with you here.
Please allow a few days for feedback - I'm working on it. Thanks!
Hi @glensc,
here are some more data from the marvell raid controller.
As suggested by you I added two more (old) drives and built a further VD running as RAID1.
First, have a look at the mvcli commands:
# mvcli ? SG driver version 3.5.36.
Legend: [options] - the options within [] are optional. <x|y|z> - choose one of the x, y or z. [<x|y|z>] - choose none or one of the x, y or z.
Abbreviation: VD - Virtual Disk, Array - Disk Array PD - Physical Disk, BGA - BackGround Activity
Type '-output [filename]' to output to a file. Type 'help' to display this page. Type 'help command' to display the help page of 'command'. Type 'command -h' to display help for 'command'.
Command name is not case sensitive and may be abbreviated if the abbreviation is unique. Most commands support both short (-) and long (--) options.Long option names may be abbreviated if the abbreviation is unique. A long option may take a parameter of the form '--arg=param' or '--arg param'. Option name is case sensitive, option parameter is not.
COMMAND BRIEF DESCRIPTION
? :Get brief help for all commands. help :Get brief help for all commands or detail help for one command. rebuild :Start, stop, pause, resume rebuilding VD. smart :Display the smart info of physical disk. flash :Update, backup or erase flash image and erase hba or pd page. enc :Get enclosure, enclosure element or enclosure config information. adapter :Default adapter the following CLI commands refers to. rescan :Rescan the adapters create :Create virtual disk. delete :Delete virtual disk or spare drive. event :Get the current events. get :Get configuration information of VD, PD, Array, HBA or Driver. info :Display adapter(hba), virtual disk(vd), disk array, physical disk(pd), Port multiplexer(pm), expander(exp), block disk(blk) or spare drive information. set :Set configuration parameters of VD, PD or HBA. import :Import a virtual disk. locate :Locate the specified PD. report :report a conflicted virtual disk to OS. devmap :Map device ID to device magic number in the OS.
Now following the output of mvcli running with 2 RAID1 VDs and 4 drives:
# mvcli info -o blk
SG driver version 3.5.36. Block Information
Block id: 0 PD id: 0 VD id: 0 Block status: assigned Size: 3906936640 K Starting offset: 16384 K
Block id: 4 PD id: 1 VD id: 0 Block status: assigned Size: 3906936640 K Starting offset: 16384 K
Block id: 8 PD id: 2 VD id: 1 Block status: assigned Size: 244116608 K Starting offset: 16384 K
Block id: 12 PD id: 3 VD id: 1 Block status: assigned Size: 244116608 K Starting offset: 16384 K
# mvcli info -o hba
SG driver version 3.5.36.
Adapter ID: 0 Product: 1b4b-9230 Sub Product: 1b4b-9230 Chip revision: A1 slot number: 0 Max PCIe speed: 5Gb/s Current PCIe speed: 5Gb/s Max PCIe link: 2 Current PCIe link: 2 BIOS version: 1.0.0.1028 Firmware version: 2.3.0.1078 Boot loader version: 2.1.0.1009 # of ports: 4 Buzzer: Not supported Supported port type: SATA Supported RAID mode: RAID0 RAID1 RAID10 hc(hypper capacity) hs(hypper safe) Maximum SSD in one HyperDuo: 3 Maximum SSD Segment: 3 Maximum disk in one VD: 4 PM: Supported Expander: Not supported Rebuild: Supported Background init: Not supported Sync: Not supported Migrate: Not supported Media patrol: Not supported Foreground init: Not supported Copy back: Not supported Maximum supported disk: 7 Maximum supported VD: 4 Max total blocks: 128 Features: rebuild Advanced features: event sense code,multi VD,spc 4,image health,timer,smart poll,bga rate,ata pass through,remap,access register Max buffer size: 3 Stripe size supported: 32K 64K Image health: Healthy Autoload image health: Healthy Boot loader image health: Healthy Firmware image health: Healthy Boot ROM image health: Healthy HBA info image health: Healthy
# mvcli info -o pd
SG driver version 3.5.36.
Physical Disk Information
Adapter: 0 PD ID: 0 Type: SATA PD Linked at: HBA port 0 Size: 3907018584 K Write cache: not supported SMART: supported (on) NCQ: supported (on) 48 bits LBA: supported supported speed: 1.5 3 6 Gb/s Current speed: 6 Gb/s model: WDC WD40EFRX-68N32N0 Serial: WD-WCC7K5KD3PFS Firmware version: 82.00A82 Locate LED status: Not Support Running OS: yes block ids: 0 associated VDs: 0 PD valid size: 0 K
Adapter: 0 PD ID: 1 Type: SATA PD Linked at: HBA port 1 Size: 3907018584 K Write cache: not supported SMART: supported (on) NCQ: supported (on) 48 bits LBA: supported supported speed: 1.5 3 6 Gb/s Current speed: 6 Gb/s model: WDC WD40EFRX-68N32N0 Serial: WD-WCC7K5KD3TSY Firmware version: 82.00A82 Locate LED status: Not Support Running OS: yes block ids: 4 associated VDs: 0 PD valid size: 0 K
Adapter: 0 PD ID: 2 Type: SATA PD Linked at: HBA port 2 Size: 244198584 K Write cache: not supported SMART: supported (on) NCQ: supported (on) 48 bits LBA: supported supported speed: 1.5 3 Gb/s Current speed: 3 Gb/s model: VB0250EAVER Serial: 6VMVE70A Firmware version: HPG0 Locate LED status: Not Support Running OS: no block ids: 8 associated VDs: 1 PD valid size: 0 K
Adapter: 0 PD ID: 3 Type: SATA PD Linked at: HBA port 3 Size: 244198584 K Write cache: not supported SMART: supported (on) NCQ: supported (on) 48 bits LBA: supported supported speed: 1.5 3 Gb/s Current speed: 3 Gb/s model: VB0250EAVER Serial: 6VMVC03W Firmware version: HPG0 Locate LED status: Not Support Running OS: no block ids: 12 associated VDs: 1 PD valid size: 0 K
Total # of PD: 4
# mvcli info -o vd # VD1 degraded and rebuilding paused SG driver version 3.5.36.
Virtual Disk Information
id: 0 name: VD0 status: functional Stripe size: 64 RAID mode: RAID1 Cache mode: Not Support size: 3815367 M BGA status: not running Block ids: 0 4 # of PDs: 2 PD RAID setup: 0 1 Running OS: yes
id: 1 name: VD1 status: degraded Stripe size: 64 RAID mode: RAID1 Cache mode: Not Support size: 238395 M BGA status: paused Block ids: 8 12 # of PDs: 2 PD RAID setup: 2 3 Running OS: no BGA progress: rebuilding is paused at 0%
Total # of VD: 2
# mvcli info -o vd # VD1 degraded and rebuilding working SG driver version 3.5.36.
Virtual Disk Information
id: 0 name: VD0 status: functional Stripe size: 64 RAID mode: RAID1 Cache mode: Not Support size: 3815367 M BGA status: not running Block ids: 0 4 # of PDs: 2 PD RAID setup: 0 1 Running OS: yes
id: 1 name: VD1 status: degraded Stripe size: 64 RAID mode: RAID1 Cache mode: Not Support size: 238395 M BGA status: running Block ids: 8 12 # of PDs: 2 PD RAID setup: 2 3 Running OS: no BGA progress: rebuilding is 2% done
Total # of VD: 2
# mvcli smart -p 0 SG driver version 3.5.36. SMART STATUS RETURN: OK.
Smart Info ID Attribute Name Current Worst Threshhold RawValue Status 01 Read Error Rate 200 200 51 000000000000 OK 03 Spin-Up Time 207 171 21 000000001208 OK 04 Start/Stop Count 100 100 0 000000000028 OK 05 Reallocated Sectors 200 200 140 000000000000 OK 07 Seek Error Rate 100 253 0 000000000000 OK 09 Power-On Hours Count 100 100 0 000000000046 OK 0A Spin Retry Count 100 253 0 000000000000 OK 0B Calibration retry 100 253 0 000000000000 OK 0C Power Cycle Count 100 100 0 000000000028 OK C0 Power-off retract 200 200 0 00000000000F OK C1 Load/Unload cycle 200 200 0 00000000002B OK C2 HDA temperature 121 117 0 00000000001D OK C4 Reallocation count 200 200 0 000000000000 OK C5 Current pending 200 200 0 000000000000 OK C6 Offline scan wrong 100 253 0 000000000000 OK C7 UDMA CRC error rate 200 200 0 000000000000 OK C8 Write error rate 100 253 0 000000000000 OK
# mvcli smart -p 1 SG driver version 3.5.36. SMART STATUS RETURN: OK.
Smart Info ID Attribute Name Current Worst Threshhold RawValue Status 01 Read Error Rate 200 200 51 000000000000 OK 03 Spin-Up Time 205 171 21 00000000126C OK 04 Start/Stop Count 100 100 0 000000000028 OK 05 Reallocated Sectors 200 200 140 000000000000 OK 07 Seek Error Rate 200 200 0 000000000000 OK 09 Power-On Hours Count 100 100 0 000000000045 OK 0A Spin Retry Count 100 253 0 000000000000 OK 0B Calibration retry 100 253 0 000000000000 OK 0C Power Cycle Count 100 100 0 000000000028 OK C0 Power-off retract 200 200 0 000000000010 OK C1 Load/Unload cycle 200 200 0 00000000002A OK C2 HDA temperature 121 117 0 00000000001D OK C4 Reallocation count 200 200 0 000000000000 OK C5 Current pending 200 200 0 000000000000 OK C6 Offline scan wrong 100 253 0 000000000000 OK C7 UDMA CRC error rate 200 200 0 000000000000 OK C8 Write error rate 100 253 0 000000000000 OK
# mvcli smart -p 2 SG driver version 3.5.36. SMART STATUS RETURN: OK.
Smart Info ID Attribute Name Current Worst Threshhold RawValue Status 01 Read Error Rate 111 99 6 000000004F09 OK 03 Spin-Up Time 97 97 0 000000000000 OK 04 Start/Stop Count 100 100 20 0000000000E9 OK 05 Reallocated Sectors 100 100 36 000000000001 OK 07 Seek Error Rate 74 60 30 00000000DEFC OK 09 Power-On Hours Count 66 66 0 000000007507 OK 0A Spin Retry Count 100 100 97 000000000000 OK 0C Power Cycle Count 100 100 20 000000000076 OK B4 Used Reserved Block(Total) 100 100 0 00000000FFFF OK B7 SATA Downshift Error 100 100 0 000000000000 OK B8 End-to-End error 100 100 97 000000000000 OK BB Reported Uncorrectable 100 100 0 000000000000 OK BC Command Timeout 100 99 0 000000000005 OK BD High Fly Writes 100 100 0 000000000000 OK BE Temperature Diff 66 59 45 000000000022 OK C2 HDA temperature 34 41 0 000000000022 OK C3 ECC recovered 53 38 0 000000004F09 OK C4 Reallocation count 100 100 36 000000000001 OK C5 Current pending 100 100 0 000000000000 OK C6 Offline scan wrong 100 100 0 000000000000 OK C7 UDMA CRC error rate 200 200 0 000000000000 OK
# mvcli smart -p 3 SG driver version 3.5.36. SMART STATUS RETURN: OK.
Smart Info ID Attribute Name Current Worst Threshhold RawValue Status 01 Read Error Rate 117 99 6 0000000015F1 OK 03 Spin-Up Time 97 97 0 000000000000 OK 04 Start/Stop Count 100 100 20 0000000000E9 OK 05 Reallocated Sectors 100 100 36 000000000000 OK 07 Seek Error Rate 74 60 30 000000009DC4 OK 09 Power-On Hours Count 66 66 0 0000000074FB OK 0A Spin Retry Count 100 100 97 000000000000 OK 0C Power Cycle Count 100 100 20 000000000076 OK B4 Used Reserved Block(Total) 100 100 0 00000000FFFF OK B7 SATA Downshift Error 100 100 0 000000000000 OK B8 End-to-End error 100 100 97 000000000000 OK BB Reported Uncorrectable 100 100 0 000000000000 OK BC Command Timeout 100 97 0 000000000085 OK BD High Fly Writes 100 100 0 000000000000 OK BE Temperature Diff 68 60 45 000000000020 OK C2 HDA temperature 32 40 0 000000000020 OK C3 ECC recovered 59 36 0 0000000015F1 OK C4 Reallocation count 100 100 36 000000000000 OK C5 Current pending 100 100 0 000000000000 OK C6 Offline scan wrong 100 100 0 000000000000 OK C7 UDMA CRC error rate 200 200 0 000000000000 OK
# mvcli info -o spare SG driver version 3.5.36. Spare Disk Information
PD id: 0 Status: none spare
PD id: 1 Status: none spare
PD id: 2 Status: none spare
PD id: 3 Status: none spare
Commands not supported by "my" adapter:
# mvcli info -o array SG driver version 3.5.36. -o : array is not supported on this adapter
# mvcli info -o exp SG driver version 3.5.36. Expander is not supported on adapter 0.
# mvcli info -o pm SG driver version 3.5.36. No port multiplexer is found.
Commands with option "get":
# mvcli get -o VD SG driver version 3.5.36. virtual disk id: 0 cache mode: Not support
virtual disk id: 1 cache mode: Not support
# mvcli get -o PD SG driver version 3.5.36. hard disk id: 0 write cache: not support drive speed: 6 Gb/s
hard disk id: 1 write cache: not support drive speed: 6 Gb/s
hard disk id: 2 write cache: not support drive speed: 3 Gb/s
hard disk id: 3 write cache: not support drive speed: 3 Gb/s
# mvcli get -o HBA SG driver version 3.5.36. HBA Index: 0 Auto Rebuild: not support Rebuild Rate: 50 Raw Update: on
I hope this will be usefull for further development of the mvcli.pm .
Greetinx
Guido
As you see, there are 2 SCSI devices : 6:0:0:0 which is the VirtualDisk (type=disk) and 13:0:0:0 which is a control device for the RAID array.