eLvErDe / hwraid

HWRaid: Free code from http://hwraid.le-vert.net
GNU General Public License v2.0
273 stars 103 forks source link

Partially degraded status with all disks apparently OK #63

Open ghost opened 7 years ago

ghost commented 7 years ago

Hi, I have been fighting for a few weeks with the my NAS returning Partially Degraded status, in alarm when I reboot it, but then showing all disks as apparently OK. Can someone help or point me to help?

My setup

Here is what megaclisas-status returns -- Controller information -- -- ID | H/W Model | RAM | Temp | BBU | Firmware c0 | LSI MegaRAID SAS 9260-16i | 512MB | N/A | Absent | FW: 12.12.0-0111

-- Array information -- -- ID | Type | Size | Strpsz | Flags | DskCache | Status | OS Path | CacheCade |InProgress c0u0 | RAID-6 | 12731G | 64 KB | RA,WB | Enabled | Partially Degraded | /dev/sda | None |None

-- Disk information -- -- ID | Type | Drive Model | Size | Status | Speed | Temp | Slot ID | LSI Device ID c0u0p0 | HDD | WD-WCC4M1583504WDC WD20EFRX-68EUZN0 80.00A80 | 1.818 TB | Online, Spun Up | 6.0Gb/s | 36C | [245:0] | 18 c0u0p1 | HDD | WD-WCC4M1495492WDC WD20EFRX-68EUZN0 80.00A80 | 1.818 TB | Online, Spun Up | 6.0Gb/s | 37C | [245:1] | 19 c0u0p2 | HDD | WD-WCC1T1404488WDC WD20EFRX-68AX9N0 80.00A80 | 1.818 TB | Online, Spun Up | 6.0Gb/s | 37C | [245:2] | 20 c0u0p3 | HDD | WD-WCC1T1397322WDC WD20EFRX-68AX9N0 80.00A80 | 1.818 TB | Online, Spun Up | 6.0Gb/s | 37C | [245:3] | 21 c0u0p4 | HDD | WD-WCC4M3080096WDC WD20EFRX-68EUZN0 80.00A80 | 1.818 TB | Online, Spun Up | 6.0Gb/s | 36C | [245:4] | 16 c0u0p5 | HDD | WD-WCC4M1584280WDC WD20EFRX-68EUZN0 80.00A80 | 1.818 TB | Online, Spun Up | 6.0Gb/s | 37C | [245:5] | 17 c0u0p6 | HDD | WD-WCC4M2SLXSDUWDC WD20EFRX-68EUZN0 82.00A82 | 1.818 TB | Online, Spun Up | 6.0Gb/s | 36C | [245:6] | 22 c0u0p7 | HDD | WD-WCC4M3XRZX7YWDC WD20EFRX-68EUZN0 82.00A82 | 1.818 TB | Online, Spun Up | 6.0Gb/s | 37C | [245:7] | 23

There is at least one disk/array in a NOT OPTIMAL state.

Happy to provide more information as needed Thank you for your support E

ElCoyote27 commented 7 years ago

Hi Egiac, Can you see if your issue is similar to the one described here: https://community.spiceworks.com/topic/1405290-megaraid-status-is-partially-degraded-but-all-drives-are-online It seems the array is 'partially degraded' because it's a RAID6 (or 6+0) and it's missing one drive. If you have a 12731G RAID-6 made of 1.818Tb drives, there should be 9 drives total, not 8 as in your output. Please check if you don't happen to have a dead drive somewhere. Regards, Vincent

ghost commented 7 years ago

Hi Vincent, I have 8 2TB HDDs in RAID minus 2 for RAID6 double parity makes 6x2TB = 12TB. I double checked just in case, the only non RAID disk is the SSD that runs the OS E

ElCoyote27 commented 7 years ago

Hi Egiac, Ok, my bad, sorry about that. Thought you were experiencing the same kind of issue.

Here are three things to try: 1) run megaclisas-status --debug, this will show you the various commands that are being run by the script. Check the output of the ones just before the HBA to see if there's anything relevant to your issue.

2) Ask the HBA to spit out its log, there might be information there: MegaCli -FwTermLog -Dsply -aALL

3) Force a consistency check on the VD: MegaCli -LDCC -Start -L0 -a0 MegaCli -LDCC -ShowProg -L0 -a0

ghost commented 7 years ago

Hi Vincent, Thanks for your suggestions, no problem I am attaching the outputs of 1. and 2. - I believe the issue comes up from rows 113 and 114 of the megaclisas-status debug, where it shows 8 drives in the system and 9 drives configured. I frankly don't know how to read / interpret the FwTermLog. As I type, the consistency check is in progress. Will take close to a full day. Logs attached for your reference. Any further suggestion on how to move forward? Thank you again for your help, EG

megacli_fwtermlog_output.txt megaclisas_status_debug.txt

ghost commented 7 years ago

Hi, Following up here - system shows 9 drives configured but only 8 are physically present in the machine. Can anybody kindly help? - I'm trying to figure out how to show the ID of the 9th "ghost" drive to remove it - Hints welcome. Thanks a ton, E