Closed geerlingguy closed 2 years ago
Supposedly it's working on 32-bit Pi OS but not 64-bit. More to come.
Output of sudo lspci -vvv
on a custom 32-Bit Kernel:
01:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS3616 Fusion-MPT Tri-Mode I/O Controller Chip (IOC) (rev 02)
Subsystem: LSI Logic / Symbios Logic SAS3616 Fusion-MPT Tri-Mode I/O Controller Chip (IOC)
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 64
Region 0: Memory at 600200000 (64-bit, prefetchable) [size=1M]
Region 2: Memory at 600300000 (64-bit, prefetchable) [size=1M]
Region 4: Memory at 600000000 (32-bit, non-prefetchable) [size=1M]
Region 5: I/O ports at <unassigned> [disabled]
[virtual] Expansion ROM at 600100000 [disabled] [size=1M]
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
Address: 0000000000000000 Data: 0000
Masking: 00000000 Pending: 00000000
Capabilities: [70] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 1024 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM L0s L1, Exit Latency L0s unlimited, L1 <64us
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [b0] MSI-X: Enable+ Count=128 Masked-
Vector table: BAR=0 offset=00002000
PBA: BAR=0 offset=00003000
Capabilities: [100 v2] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
Capabilities: [148 v1] Power Budgeting <?>
Capabilities: [158 v1] Alternative Routing-ID Interpretation (ARI)
ARICap: MFVC- ACS-, Next Function: 0
ARICtl: MFVC- ACS-, Function Group: 0
Capabilities: [168 v1] #19
Capabilities: [264 v1] #16
Capabilities: [294 v1] Vendor Specific Information: ID=0002 Rev=2 Len=100 <?>
Capabilities: [394 v1] Vendor Specific Information: ID=0001 Rev=1 Len=038 <?>
Capabilities: [3cc v0] Virtual Channel
Caps: LPEVC=1 RefClk=100ns PATEntryBits=1
Arb: Fixed- WRR32- WRR64- WRR128- ??6+
Ctrl: ArbSelect=Fixed
Status: InProgress-
VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
Arb: Fixed- WRR32+ WRR64- WRR128- TWRR128- WRR256-
Ctrl: Enable- ID=0 ArbSelect=Fixed TC/VC=00
Status: NegoPending- InProgress-
Kernel driver in use: mpt3sas
Repo for custom kernel: https://github.com/45Drives/linux Steps for building:
cd linux/cross_compile
./build_cm4.sh 32
(no to customize config)Follow Jeff's instructions, and in the .config, set:
CONFIG_SCSI_MPT3SAS=y (Not sure if this route will work, as we replaced the mpt3sas driver in the kernel source with the latest from Broadcom's downloads for the 9405W. It will need further testing to see if the stock mpt3sas kernel driver will work.)
We had a few 250 G SSDs around so we built a ZFS raid with it. Unfortunately, ZFS only works with 64-bit kernels, but ZFS Fuse works fine with 32-bit. lsblk:
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 232.9G 0 disk
sdb 8:16 0 232.9G 0 disk
sdc 8:32 0 232.9G 0 disk
sdd 8:48 0 232.9G 0 disk
sde 8:64 0 232.9G 0 disk
sdf 8:80 0 232.9G 0 disk
sdg 8:96 0 232.9G 0 disk
sdh 8:112 0 232.9G 0 disk
sdi 8:128 0 232.9G 0 disk
sdj 8:144 0 232.9G 0 disk
mmcblk0 179:0 0 29.1G 0 disk
├─mmcblk0p1 179:1 0 256M 0 part /boot
└─mmcblk0p2 179:2 0 28.9G 0 part /
mmcblk0boot0 179:32 0 4M 1 disk
mmcblk0boot1 179:64 0 4M 1 disk
zpool status:
pool: weenie-hut-jr
state: ONLINE
scrub: scrub completed after 0h0m with 0 errors on Tue Aug 24 14:44:40 2021
config:
NAME STATE READ WRITE CKSUM
weenie-hut-jr ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
disk/by-id/wwn-0x5001b448b856e403 ONLINE 0 0 0
disk/by-id/wwn-0x588891410069f515 ONLINE 0 0 0
disk/by-id/ata-HDSTOR_-_HSAV25ST250AX_HS2001211005ECA56 ONLINE 0 0 0
disk/by-id/wwn-0x5001b448b6affbfa ONLINE 0 0 0
disk/by-id/wwn-0x5001b448b670ea7d ONLINE 0 0 0
raidz2-1 ONLINE 0 0 0
disk/by-id/wwn-0x5001b448b670d475 ONLINE 0 0 0
disk/by-id/ata-WDC_WDS250G2B0A-00SM50_191540800565 ONLINE 0 0 0
disk/by-id/wwn-0x5001b448b630fedb ONLINE 0 0 0
disk/by-id/ata-WDC_WDS250G2B0A_191934804052 ONLINE 0 0 0
disk/by-id/wwn-0x5001b448b8f9b97c ONLINE 0 0 0
errors: No known data errors
As for 64-bit usage, the kernel was built the same way as above, but use ./build-cm4.sh 64
. This kernel works fine with nothing plugged into the PCIe slot, but when the 9405W is plugged in, there is a kernel panic at boot. Here is a screenshot of said kernel panic:
The error seems to happen in pci_generic_config_read(), seemingly part of the PCI driver, not the mpt3sas driver.
@joshuaboud - I seem to remember this bit of code leading to that crash: https://github.com/raspberrypi/linux/blob/2697f7403187bb2bb61cc716f33ee9f6cfb9af7c/drivers/scsi/megaraid/megaraid_sas_fusion.c#L262-L265
Can you try the following patch and see if that helps on 64-bit?
diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c b/drivers/scsi/megaraid/megaraid_sas_fusion.c
index b0c01cf0428f..c4accee42e84 100644
--- a/drivers/scsi/megaraid/megaraid_sas_fusion.c
+++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c
@@ -259,7 +259,8 @@ static void
megasas_write_64bit_req_desc(struct megasas_instance *instance,
union MEGASAS_REQUEST_DESCRIPTOR_UNION *req_desc)
{
-#if defined(writeq) && defined(CONFIG_64BIT)
+//#if defined(writeq) && defined(CONFIG_64BIT)
+#if 0
u64 req_data = (((u64)le32_to_cpu(req_desc->u.high) << 32) |
le32_to_cpu(req_desc->u.low));
writeq(req_data, &instance->reg_set->inbound_low_queue_port);
That specific patch won't do much as the megaraid driver isn't being compiled for my kernel, however - there is a very similar chunk of code in the mpt3sas driver: drivers/scsi/mpt3sas/mpt3sas_base.c starting at line 5809:
/**
* _base_writeq - 64 bit write to MMIO
* @ioc: per adapter object
* @b: data payload
* @addr: address in MMIO space
* @writeq_lock: spin lock
*
* Glue for handling an atomic 64 bit word to MMIO. This special handling takes
* care of 32 bit environment where its not quarenteed to send the entire word
* in one transfer.
*/
#if defined(writeq) && defined(CONFIG_64BIT)
static inline void
_base_writeq(__u64 b, volatile void __iomem *addr, spinlock_t *writeq_lock)
{
writeq(b, addr);
}
#else
static inline void
_base_writeq(__u64 b, volatile void __iomem *addr, spinlock_t *writeq_lock)
{
unsigned long flags;
__u64 data_out = b;
spin_lock_irqsave(writeq_lock, flags);
writel((u32)(data_out), addr);
writel((u32)(data_out >> 32), (addr + 4));
spin_unlock_irqrestore(writeq_lock, flags);
}
#endif
I suppose if I apply the same change here, disabling writeq(), it may fix the issue. I will report back after trying.
Ah yes, sorry was looking at what I was working on for the other card and forgot it was using a different module. Let me know if it helps!
just tried out that fix with the patch:
@@ -5817,7 +5817,8 @@ _base_mpi_ep_writeq(__u64 b, volatile void __iomem *addr, spinlock_t *writeq_loc
* care of 32 bit environment where its not quarenteed to send the entire word
* in one transfer.
*/
-#if defined(writeq) && defined(CONFIG_64BIT)
+//#if defined(writeq) && defined(CONFIG_64BIT)
+#if 0
static inline void
_base_writeq(__u64 b, volatile void __iomem *addr, spinlock_t *writeq_lock)
{
but unfortunately it did not fix the kernel panic issue.
I have been trying to track down any possible bugs in the kernel source code, but the second last function call, pci_bus_read_config_byte()
does not seem to have a definition anywhere in the source. The function that actually causes the error, pci_generic_config_read()
, could be having an issue with dereferencing pointers though. I am going to try to get the full dmesg output from the failed boot.
@joshuaboud - That sounds eerily similar to some of the problems we've been encountering with AMD's GPU drivers in https://github.com/geerlingguy/raspberry-pi-pcie-devices/issues/4 — I wonder if the mpt3sas driver is missing some of the megaraid sas changes that were made for better ARM compatibility in general?
This is very strange. Removing quiet
from /boot/cmdline.txt fixed the kernel panic issue with the Storinator JR custom 64-bit kernel.
We have no idea. Maybe there is a timing issue in the PCI driver that is solved by the slow down from printing console messages.
lspci -vvv
:
01:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS3616 Fusion-MPT Tri-Mode I/O Controller Chip (IOC) (rev 02)
Subsystem: LSI Logic / Symbios Logic SAS3616 Fusion-MPT Tri-Mode I/O Controller Chip (IOC)
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 49
Region 0: Memory at 600200000 (64-bit, prefetchable) [size=1M]
Region 2: Memory at 600300000 (64-bit, prefetchable) [size=1M]
Region 4: Memory at 600000000 (32-bit, non-prefetchable) [size=1M]
Region 5: I/O ports at <unassigned> [disabled]
[virtual] Expansion ROM at 600100000 [disabled] [size=1M]
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
Address: 0000000000000000 Data: 0000
Masking: 00000000 Pending: 00000000
Capabilities: [70] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 1024 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM L0s L1, Exit Latency L0s unlimited, L1 <64us
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [b0] MSI-X: Enable+ Count=128 Masked-
Vector table: BAR=0 offset=00002000
PBA: BAR=0 offset=00003000
Capabilities: [100 v2] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
Capabilities: [148 v1] Power Budgeting <?>
Capabilities: [158 v1] Alternative Routing-ID Interpretation (ARI)
ARICap: MFVC- ACS-, Next Function: 0
ARICtl: MFVC- ACS-, Function Group: 0
Capabilities: [168 v1] #19
Capabilities: [264 v1] #16
Capabilities: [294 v1] Vendor Specific Information: ID=0002 Rev=2 Len=100 <?>
Capabilities: [394 v1] Vendor Specific Information: ID=0001 Rev=1 Len=038 <?>
Capabilities: [3cc v0] Virtual Channel
Caps: LPEVC=1 RefClk=100ns PATEntryBits=1
Arb: Fixed- WRR32- WRR64- WRR128- ??6+
Ctrl: ArbSelect=Fixed
Status: InProgress-
VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
Arb: Fixed- WRR32+ WRR64- WRR128- TWRR128- WRR256-
Ctrl: Enable- ID=0 ArbSelect=Fixed TC/VC=00
Status: NegoPending- InProgress-
Kernel driver in use: mpt3sas
lsblk
:
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 232.9G 0 disk
sdb 8:16 0 232.9G 0 disk
sdc 8:32 0 232.9G 0 disk
sdd 8:48 0 232.9G 0 disk
sde 8:64 0 232.9G 0 disk
sdf 8:80 0 232.9G 0 disk
sdg 8:96 0 232.9G 0 disk
sdh 8:112 0 232.9G 0 disk
sdi 8:128 0 232.9G 0 disk
sdj 8:144 0 232.9G 0 disk
mmcblk0 179:0 0 29.1G 0 disk
├─mmcblk0p1 179:1 0 256M 0 part /boot
└─mmcblk0p2 179:2 0 28.9G 0 part /
mmcblk0boot0 179:32 0 4M 1 disk
mmcblk0boot1 179:64 0 4M 1 disk
Reason
We have no idea.
🤣 love it
Oooooo this is about to get interesting.. I have the following:
FILESYSTEM TYPE (=) USED FREE (-) %USED AVAILABLE TOTAL MOUNTED ON
/dev/sda1 ext4 [||||||||||||||||----] 77.4% 56.8G 251.0G /
/dev/sdb xfs [||||||||||||--------] 56.3% 1.4T 3.2T /mnt/ssdarray1
/dev/md0 xfs [|-------------------] 0.7% 10.8T 10.9T /mnt/ssdarray2
/dev/sdo ext4 [||||||||||||||||||||] 98.5% 60.7G 4.0T /mnt/plots
/dev/sdh ext4 [||||||||||||||||||||] 98.0% 80.5G 4.0T /mnt/plots2
/dev/sdi ext4 [||||||||||||||||||||] 98.0% 80.4G 4.0T /mnt/plots3
/dev/sdj ext4 [||||||||||||||||||||] 98.0% 80.3G 4.0T /mnt/plots4
/dev/sdk ext4 [||||||||||||||||||||] 98.0% 80.1G 4.0T /mnt/plots5
/dev/sdl ext4 [|||||||||||||||||||-] 94.2% 232.4G 4.0T /mnt/plots6
/dev/sdm ext4 [||||||||||||||||||||] 98.0% 80.3G 4.0T /mnt/plots7
/dev/sdn ext4 [||||||||||||||||||||] 98.0% 80.3G 4.0T /mnt/plots8
/dev/sdg ext4 [||||||||||||||||||||] 98.0% 80.2G 4.0T /mnt/plots9
/dev/sdp ext4 [||||||||||||||||||||] 98.2% 73.8G 4.0T /mnt/plots10
/dev/sdq ext4 [||||||||||||||||||||] 98.3% 67.5G 4.0T /mnt/plots11
/dev/sdr ext4 [|||||||||||||||||||-] 92.7% 291.2G 4.0T /mnt/plots12
/dev/sdt ext4 [||||||||||||||||||||] 98.7% 40.4G 3.0T /mnt/plots14
/dev/sdu ext4 [||||||||||||||||||||] 98.0% 80.5G 4.0T /mnt/plots15
/dev/sdv ext4 [||||||||||||||||||||] 98.0% 80.0G 4.0T /mnt/plots16
/dev/sdw ext4 [||||||||||||||||||||] 98.0% 80.4G 4.0T /mnt/plots17
/dev/sdx ext4 [||||||||||||||||||||] 98.0% 80.1G 4.0T /mnt/plots18
/dev/sdad ext4 [||||||||||||||||||||] 98.0% 39.5G 2.0T /mnt/plots19
/dev/sdy ext4 [||||||||||||||||||||] 99.1% 92.5G 10.0T /mnt/plots20
/dev/sdz ext4 [||||||||||||||||||||] 99.1% 93.5G 10.0T /mnt/plots21
/dev/sdaa ext4 [||||||||||||||||||||] 99.1% 92.4G 10.0T /mnt/plots22
/dev/sdab ext4 [||||||||||||||||||||] 99.8% 24.1G 12.0T /mnt/plots23
/dev/sdah ext4 [||||||||||||||||||||] 99.3% 52.8G 8.0T /mnt/plots24
/dev/sdag ext4 [||||||||||||||||||||] 99.8% 19.9G 8.0T /mnt/plots25
/dev/sdam ext4 [||||||||||||||||||||] 99.3% 54.0G 8.0T /mnt/plots26
/dev/sdac1 btrfs [||||||||||||||||||||] 99.3% 69.7G 10.0T /mnt/plots27
/dev/sdal1 btrfs [||||||||||||||||||||] 99.3% 66.0G 10.0T /mnt/plots28
/dev/sdak1 btrfs [||||||||||||||||||||] 99.3% 68.7G 10.0T /mnt/plots29
/dev/sdai ext4 [||||||||||||||||||||] 99.1% 93.5G 10.0T /mnt/plots30
/dev/sdaj ext4 [||||||||||||||||||||] 99.1% 93.6G 10.0T /mnt/plots31
/dev/sdaf ext4 [||||||||||||||||||||] 99.1% 92.4G 10.0T /mnt/plots32
/dev/sdbe f2fs [||||||||||||||||||||] 99.7% 27.9G 10.0T /mnt/plots33
/dev/sdbf ext4 [||||||||||||||||||||] 99.8% 12.4G 6.0T /mnt/plots34
/dev/sdbg ext4 [||||||||||||||||||||] 98.0% 80.8G 4.0T /mnt/plots35
/dev/sdbh ext4 [||||||||||||||||||||] 98.0% 80.7G 4.0T /mnt/plots36
/dev/sdbi f2fs [||||||||||||||||||||] 99.0% 103.1G 10.0T /mnt/plots37
/dev/sdbj ext4 [||||||||||||||||||||] 98.0% 80.8G 4.0T /mnt/plots38
/dev/sdbk ext4 [||||||||||||||||||||] 98.0% 80.7G 4.0T /mnt/plots39
/dev/sdbl btrfs [||||||||||||||||||||] 99.3% 69.8G 10.0T /mnt/plots40
/dev/sdbm ext4 [||||||||||||||||||||] 99.8% 12.9G 6.0T /mnt/plots41
/dev/sdbn ext4 [||||||||||||||||||||] 98.0% 80.9G 4.0T /mnt/plots42
/dev/sdbo ext4 [||||||||||||||||||||] 98.0% 81.0G 4.0T /mnt/plots43
/dev/sdbp f2fs [||||||||||||||||||||] 99.7% 27.9G 10.0T /mnt/plots44
/dev/sdbq ext4 [||||||||||||||||||||] 99.8% 13.0G 6.0T /mnt/plots45
/dev/sdbr ext4 [||||||||||||||||||||] 98.0% 80.6G 4.0T /mnt/plots46
/dev/sdbs ext4 [||||||||||||||||||||] 98.0% 80.8G 4.0T /mnt/plots47
/dev/sdbt ext4 [||||||||||||||||||||] 98.0% 80.7G 4.0T /mnt/plots48
/dev/sdbu f2fs [||||||||||||||||||||] 99.7% 28.0G 10.0T /mnt/plots49
/dev/sdbv ext4 [||||||||||||||||||||] 99.8% 12.3G 6.0T /mnt/plots50
/dev/sdbw ext4 [||||||||||||||||||||] 98.0% 80.8G 4.0T /mnt/plots51
/dev/sdbx ext4 [||||||||||||||||||||] 98.0% 81.0G 4.0T /mnt/plots52
/dev/sdby ext4 [||||||||||||||||||||] 98.0% 80.7G 4.0T /mnt/plots53
/dev/sdbz f2fs [||||||||||||||||||||] 99.7% 27.9G 10.0T /mnt/plots54
/dev/sdca ext4 [||||||||||||||||||||] 98.5% 88.3G 6.0T /mnt/plots55
/dev/sdae f2fs [||||||||||||||||||||] 99.7% 28.3G 10.0T /mnt/plots56
/dev/sdcb ext4 [||||||||||||||||||||] 98.0% 80.8G 4.0T /mnt/plots57
/dev/sdcc f2fs [||||||||||||||||||||] 99.7% 27.8G 10.0T /mnt/plots58
/dev/sdcd ext4 [||||||||||||||||||||] 98.0% 80.7G 4.0T /mnt/plots59
/dev/sdce ext4 [||||||||||||||||||||] 98.0% 80.7G 4.0T /mnt/plots60
/dev/sdcf ext4 [||||||||||||||||||||] 98.0% 80.7G 4.0T /mnt/plots61
/dev/sdcg ext4 [||||||||||||||||||||] 98.0% 80.7G 4.0T /mnt/plots62
/dev/sdch ext4 [||||||||||||||||||||] 98.0% 80.8G 4.0T /mnt/plots63
/dev/sdci f2fs [||||||||||||||||||||] 99.7% 27.8G 10.0T /mnt/plots64
/dev/sdck f2fs [||||||||||||||||||||] 99.7% 27.8G 10.0T /mnt/plots65
/dev/sdcl ext4 [||||||||||||||||||||] 98.0% 80.9G 4.0T /mnt/plots66
/dev/sdcm ext4 [||||||||||||||||||||] 98.0% 80.9G 4.0T /mnt/plots67
/dev/sdco f2fs [||||||||||||||||||||] 99.7% 28.2G 10.0T /mnt/plots68
/dev/sdcp f2fs [||||||||||||||||||||] 99.0% 103.4G 10.0T /mnt/plots69
/dev/sdcq ext4 [||||||||||||||||||||] 98.0% 80.9G 4.0T /mnt/plots70
/dev/sdcr ext4 [||||||||||||||||||||] 98.0% 80.9G 4.0T /mnt/plots71
/dev/sdcs f2fs [||||||||||||||||||||] 99.7% 28.0G 10.0T /mnt/plots72
/dev/sdct ext4 [||||||||||||||||||||] 98.0% 80.7G 4.0T /mnt/plots73
/dev/sdas ext4 [||||||||||||||||||||] 98.0% 80.9G 4.0T /mnt/plots74
/dev/sdar ext4 [||||||||||||||||||||] 98.0% 80.6G 4.0T /mnt/plots75
/dev/sdaq ext4 [||||||||||||||||||||] 98.0% 80.6G 4.0T /mnt/plots76
/dev/sdap ext4 [||||||||||||||||||||] 98.0% 80.6G 4.0T /mnt/plots77
/dev/sdao ext4 [||||||||||||||||||||] 98.0% 80.9G 4.0T /mnt/plots78
/dev/sdan ext4 [||||||||||||||||||||] 98.0% 80.9G 4.0T /mnt/plots79
/dev/sday ext4 [||||||||||||||||||||] 98.0% 80.8G 4.0T /mnt/plots80
/dev/sdax ext4 [||||||||||||||||||||] 98.0% 80.8G 4.0T /mnt/plots81
/dev/sdaw ext4 [||||||||||||||||||||] 98.0% 80.8G 4.0T /mnt/plots82
/dev/sdav ext4 [||||||||||||||||||||] 98.0% 80.7G 4.0T /mnt/plots83
/dev/sdau ext4 [||||||||||||||||||||] 98.0% 80.8G 4.0T /mnt/plots84
/dev/sdat ext4 [||||||||||||||||||||] 98.0% 80.7G 4.0T /mnt/plots85
/dev/sdcu ext4 [||||||||||||||||||||] 98.0% 80.8G 4.0T /mnt/plots86
/dev/sdcv ext4 [||||||||||||||||||||] 98.0% 80.5G 4.0T /mnt/plots87
/dev/sdcw f2fs [||||||||||||||||||||] 99.7% 28.0G 10.0T /mnt/plots88
/dev/sdcy ext4 [||||||||||||||||||||] 98.0% 80.9G 4.0T /mnt/plots89
/dev/sdcz ext4 [||||||||||||||||||||] 98.0% 80.7G 4.0T /mnt/plots90
/dev/sdda ext4 [||||||||||||||||||||] 98.0% 80.7G 4.0T /mnt/plots91
/dev/sddb ext4 [||||||||||||||||||||] 99.8% 12.7G 6.0T /mnt/plots92
/dev/sddc ext4 [||||||||||||||||||||] 98.0% 80.8G 4.0T /mnt/plots93
/dev/sdbd ext4 [||||||||||||||||||||] 99.1% 93.6G 10.0T /mnt/plots94
/dev/sdcx ext4 [||||||||||||||||||||] 99.1% 93.9G 10.0T /mnt/plots95
/dev/sdcj ext4 [||||||||||||||||||||] 99.1% 94.0G 10.0T /mnt/plots96
/dev/sdcn ext4 [||||||||||||||||||||] 99.1% 93.6G 10.0T /mnt/plots97
SUM: [||||||||||||||||||||] 96.7% 19.3T 589.2T
Using the delicious breadcrumbs left by you wonderful folks in this GH issue, I'll be trying to get one of these working. Is there any interest in seeing my results shared somewhere? (AKA - where can I ask for help when I fail)
@wallentx - According to the Broadcom engineer I spoke with, none of the 93xx series cards will work on the Pi because drivers don't support ARM for that generation. Only 94xx/95xx and newer cards should be able to work (so your SAS 9440-8i has a fighting chance!).
@geerlingguy My SAS 9440-8i shows up in lspci just out of the box. No mpt3sas modules available to be loaded though. I built the kernel, but I'm a little lost with the way the cm4 bootloader works.. Are you guys just renaming your compiled kernel image to kernel8.img
and overwriting? I saw somewhere on another writeup about setting kernel=<mykernel.img>
in /boot/config.txt
, but that just got me stuck at the rainbow screen.
@wallentx - Here's the exact process I follow: https://github.com/geerlingguy/raspberry-pi-pcie-devices/tree/master/extras/cross-compile
Note that there are a number of ways you can stick multiple kernels on the Pi and switch between them, but in my case, since I normally nuke the microSD card multiple times per day, I just overwrite the kernel in place (following the steps in the guide above).
@geerlingguy if I'm sharing more about my 9440-8i, should I put this in a separate issue, or are individual issues intended to be sort of exclusive for your own testing/tracking? If a new issue is needed, I'll edit this comment and migrate the details elsewhere.
Moved - https://github.com/geerlingguy/raspberry-pi-pcie-devices/issues/250
@wallentx - Can you open a new/separate issue for it (since it's a different model)?
I'll be doing a little more testing on one of these, now that I have it in my possession.
That's a lotta PCIe:
All right, so testing a bringup on a fork of the rpi-5.15.y
branch:
menuconfig
and enabled mpt3sas
(option "LSI MPT Fusion SAS 3.0 & SAS 2.0 Device Driver"—see below)drivers/scsi/mpt3sas/mpt3sas_base.c
file with @joshuaboud's patch from this comment above.The mpt3sas
option is under:
-> Device Drivers
-> SCSI device support
-> SCSI low-level drivers (SCSI_LOWLEVEL [=y])
-> LSI MPT Fusion SAS 3.0 & SAS 2.0 Device Driver
After a reboot I'm seeing:
0f:00.0 Serial Attached SCSI controller: Broadcom / LSI SAS3616 Fusion-MPT Tri-Mode I/O Controller Chip (IOC) (rev 02)
Subsystem: Broadcom / LSI HBA 9405W-16e
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Interrupt: pin A routed to IRQ 255
Region 0: Memory at 600f00000 (64-bit, prefetchable) [disabled] [size=1M]
Region 2: Memory at 601000000 (64-bit, prefetchable) [disabled] [size=1M]
Region 4: Memory at 600600000 (32-bit, non-prefetchable) [disabled] [size=1M]
Region 5: I/O ports at <unassigned> [disabled]
Expansion ROM at 600700000 [virtual] [disabled] [size=1M]
Capabilities: <access denied>
Kernel modules: mpt3sas
So the module's loaded at least. I'm going to have to stop for the evening and pick it back up later!
Oh also, from dmesg:
[ 7.332681] mpt3sas 0000:0c:00.0: enabling device (0000 -> 0002)
[ 7.332756] mpt3sas_cm0: 63 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (3885552 kB)
...
[ 7.476877] mpt3sas_cm0: CurrentHostPageSize is 0: Setting default host page size to 4k
[ 7.476951] mpt3sas_cm0: MSI-X vectors supported: 128
[ 7.476968] no of cores: 4, max_msix_vectors: -1
[ 7.476982] mpt3sas_cm0: 0 4 4
...
[ 7.550816] mpt3sas_cm0: High IOPs queues : disabled
[ 7.550857] mpt3sas0-msix0: PCI-MSI-X enabled: IRQ 64
[ 7.550871] mpt3sas0-msix1: PCI-MSI-X enabled: IRQ 65
[ 7.550883] mpt3sas0-msix2: PCI-MSI-X enabled: IRQ 66
[ 7.550895] mpt3sas0-msix3: PCI-MSI-X enabled: IRQ 67
[ 7.550905] mpt3sas_cm0: iomem(0x0000000600900000), mapped(0x000000008bfd810c), size(1048576)
[ 7.550926] mpt3sas_cm0: ioport(0x0000000000000000), size(0)
[ 7.604287] checking generic (3e3cf000 7f8000) vs hw (0 ffffffffffffffff)
...
[ 7.770218] mpt3sas_cm0: CurrentHostPageSize is 0: Setting default host page size to 4k
[ 7.800730] mpt3sas_cm0: scatter gather: sge_in_main_msg(1), sge_per_chain(7), sge_per_io(128), chains_per_io(19)
[ 7.818705] mpt3sas_cm0: request pool(0x000000004d73eefe) - dma(0x41b600000): depth(7272), frame_size(128), pool_size(909 kB)
...
[ 10.479253] mpt3sas_cm0: sense pool(0x00000000107cc924) - dma(0x41bc00000): depth(7059), element_size(96), pool_size (661 kB)
[ 10.479279] mpt3sas_cm0: sense pool(0x00000000107cc924)- dma(0x41bc00000): depth(7059),element_size(96), pool_size(4 kB)
[ 10.479719] mpt3sas_cm0: reply pool(0x000000008920f035) - dma(0x41bd00000): depth(7336), frame_size(128), pool_size(917 kB)
[ 10.479835] mpt3sas_cm0: config page(0x00000000f394c332) - dma(0x44c765000): size(512)
[ 10.479842] mpt3sas_cm0: Allocated physical memory: size(31210 kB)
[ 10.479848] mpt3sas_cm0: Current Controller Queue Depth(7056),Max Controller Queue Depth(7168)
[ 10.479853] mpt3sas_cm0: Scatter Gather Elements per IO(128)
[ 10.599660] mpt3sas_cm0: _base_display_fwpkg_version: complete
[ 10.599669] mpt3sas_cm0: FW Package Ver(05.00.00.00)
[ 10.599812] mpt3sas_cm0: TimeSync Interval in Manuf page-11 is not enabled. Periodic Time-Sync will be disabled
[ 10.600313] mpt3sas_cm0: SAS3616: FWVersion(05.00.00.00), ChipRevision(0x02), BiosVersion(00.00.00.00)
[ 10.600322] NVMe
[ 10.600325] mpt3sas_cm0: Protocol=(Initiator,Target), Capabilities=(TLR,EEDP,Diag Trace Buffer,Task Set Full,NCQ)
[ 10.600447] mpt3sas_cm0: log_info(0x300301e0): originator(IOP), code(0x03), sub_code(0x01e0)
[ 10.600477] mpt3sas_cm0: log_info(0x300301e0): originator(IOP), code(0x03), sub_code(0x01e0)
[ 10.600618] mpt3sas 0000:0c:00.0: Max SCSIIO MPT commands: 7056 shared with nr_hw_queues = 4
[ 10.600630] scsi host0: Fusion MPT SAS Host
[ 10.626156] mpt3sas_cm0: sending port enable !!
[ 13.477768] mpt3sas_cm0: hba_port entry: 000000008adcd575, port: 0 is added to hba_port list
[ 13.479519] mpt3sas_cm0: hba_port entry: 00000000dde23a00, port: 8 is added to hba_port list
[ 13.481680] mpt3sas_cm0: host_add: handle(0x0001), sas_addr(0x500605b00de7bd50), phys(17)
[ 13.482181] mpt3sas_cm0: handle(0x11) sas_address(0x510600b00de7bd50) port_type(0x0)
[ 13.483080] scsi 0:0:0:0: Enclosure LSI VirtualSES 01 PQ: 0 ANSI: 6
[ 13.483099] scsi 0:0:0:0: set ignore_delay_remove for handle(0x0011)
[ 13.483106] scsi 0:0:0:0: SES: handle(0x0011), sas_addr(0x510600b00de7bd50), phy(16), device_name(0x510600b00de7bd50)
[ 13.483111] scsi 0:0:0:0: enclosure logical id (0x500605b00de7bd50), slot(16)
[ 13.483115] scsi 0:0:0:0: enclosure level(0x0000), connector name( )
[ 13.483121] scsi 0:0:0:0: qdepth(1), tagged(0), scsi_level(7), cmd_que(0)
[ 13.483161] mpt3sas_cm0: log_info(0x31200206): originator(PL), code(0x20), sub_code(0x0206)
[ 13.483816] end_device-0:0: add: handle(0x0011), sas_addr(0x510600b00de7bd50)
...
[ 18.728604] mpt3sas_cm0: port enable: SUCCESS
[ 18.729364] pci 0000:0b:03.0: enabling device (0000 -> 0002)
[ 18.729393] mpt3sas 0000:0d:00.0: enabling device (0000 -> 0002)
[ 18.729437] mpt3sas_cm1: 63 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (3885552 kB)
[ 18.787144] mpt3sas_cm1: CurrentHostPageSize is 0: Setting default host page size to 4k
[ 18.787182] mpt3sas_cm1: MSI-X vectors supported: 128
[ 18.787187] no of cores: 4, max_msix_vectors: -1
[ 18.787192] mpt3sas_cm1: 0 4 4
[ 18.787558] mpt3sas_cm1: High IOPs queues : disabled
[ 18.787565] mpt3sas1-msix0: PCI-MSI-X enabled: IRQ 69
[ 18.787570] mpt3sas1-msix1: PCI-MSI-X enabled: IRQ 70
[ 18.787575] mpt3sas1-msix2: PCI-MSI-X enabled: IRQ 71
[ 18.787579] mpt3sas1-msix3: PCI-MSI-X enabled: IRQ 72
[ 18.787583] mpt3sas_cm1: iomem(0x0000000600b00000), mapped(0x0000000032381adf), size(1048576)
[ 18.787592] mpt3sas_cm1: ioport(0x0000000000000000), size(0)
[ 18.845932] mpt3sas_cm1: CurrentHostPageSize is 0: Setting default host page size to 4k
[ 18.873717] mpt3sas_cm1: scatter gather: sge_in_main_msg(1), sge_per_chain(7), sge_per_io(128), chains_per_io(19)
[ 18.875031] mpt3sas_cm1: request pool(0x000000005b8761c2) - dma(0x41bf00000): depth(7272), frame_size(128), pool_size(909 kB)
[ 20.161005] mpt3sas_cm1: sense pool(0x000000003af0b070) - dma(0x41c000000): depth(7059), element_size(96), pool_size (661 kB)
[ 20.161029] mpt3sas_cm1: sense pool(0x000000003af0b070)- dma(0x41c000000): depth(7059),element_size(96), pool_size(4 kB)
[ 20.161490] mpt3sas_cm1: reply pool(0x000000002c451284) - dma(0x41c100000): depth(7336), frame_size(128), pool_size(917 kB)
[ 20.161613] mpt3sas_cm1: config page(0x0000000060a9007a) - dma(0x4524e0000): size(512)
[ 20.161621] mpt3sas_cm1: Allocated physical memory: size(31210 kB)
[ 20.161626] mpt3sas_cm1: Current Controller Queue Depth(7056),Max Controller Queue Depth(7168)
[ 20.161631] mpt3sas_cm1: Scatter Gather Elements per IO(128)
[ 20.281165] mpt3sas_cm1: _base_display_fwpkg_version: complete
[ 20.281176] mpt3sas_cm1: FW Package Ver(05.00.00.00)
[ 20.281321] mpt3sas_cm1: TimeSync Interval in Manuf page-11 is not enabled. Periodic Time-Sync will be disabled
[ 20.281821] mpt3sas_cm1: SAS3616: FWVersion(05.00.00.00), ChipRevision(0x02), BiosVersion(09.09.00.00)
[ 20.281830] NVMe
[ 20.281833] mpt3sas_cm1: Protocol=(Initiator,Target), Capabilities=(TLR,EEDP,Diag Trace Buffer,Task Set Full,NCQ)
[ 20.281955] mpt3sas_cm1: log_info(0x300301e0): originator(IOP), code(0x03), sub_code(0x01e0)
[ 20.281984] mpt3sas_cm1: log_info(0x300301e0): originator(IOP), code(0x03), sub_code(0x01e0)
[ 20.282026] mpt3sas 0000:0d:00.0: Max SCSIIO MPT commands: 7056 shared with nr_hw_queues = 4
[ 20.282037] scsi host1: Fusion MPT SAS Host
[ 20.314803] mpt3sas_cm1: sending port enable !!
[ 22.684129] mpt3sas_cm1: hba_port entry: 00000000ea664476, port: 0 is added to hba_port list
[ 22.688538] mpt3sas_cm1: hba_port entry: 00000000869770b5, port: 8 is added to hba_port list
[ 22.693720] mpt3sas_cm1: host_add: handle(0x0001), sas_addr(0x500605b00de7bf50), phys(17)
[ 22.694621] mpt3sas_cm1: handle(0x11) sas_address(0x510600b00de7bf50) port_type(0x0)
[ 22.695294] mpt3sas_cm1: handle(0x20) sas_address(0x300605b00de7bf59) port_type(0x1)
[ 28.412605] mpt3sas_cm1: port enable: SUCCESS
[ 28.905761] scsi 1:0:0:0: Direct-Access ATA WDC WD5000AVDS-6 0A01 PQ: 0 ANSI: 6
[ 28.905787] scsi 1:0:0:0: SATA: handle(0x0020), sas_addr(0x300605b00de7bf59), phy(9), device_name(0x0000000000000000)
[ 28.905792] scsi 1:0:0:0: enclosure logical id (0x500605b00de7bf50), slot(0)
[ 28.905797] scsi 1:0:0:0: enclosure level(0x0000), connector name( C0 )
[ 28.905875] scsi 1:0:0:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
[ 28.905886] scsi 1:0:0:0: qdepth(128), tagged(1), scsi_level(7), cmd_que(1)
[ 28.912046] end_device-1:0: add: handle(0x0020), sas_addr(0x300605b00de7bf59)
[ 28.912085] sd 1:0:0:0: Power-on or device reset occurred
[ 28.914233] scsi 1:0:1:0: Enclosure LSI VirtualSES 01 PQ: 0 ANSI: 6
[ 28.914256] scsi 1:0:1:0: set ignore_delay_remove for handle(0x0011)
[ 28.914263] scsi 1:0:1:0: SES: handle(0x0011), sas_addr(0x510600b00de7bf50), phy(16), device_name(0x510600b00de7bf50)
[ 28.914268] scsi 1:0:1:0: enclosure logical id (0x500605b00de7bf50), slot(16)
[ 28.914272] scsi 1:0:1:0: enclosure level(0x0000), connector name( )
[ 28.914278] scsi 1:0:1:0: qdepth(1), tagged(0), scsi_level(7), cmd_que(0)
[ 28.914317] mpt3sas_cm1: log_info(0x31200206): originator(PL), code(0x20), sub_code(0x0206)
[ 28.915018] end_device-1:1: add: handle(0x0011), sas_addr(0x510600b00de7bf50)
[ 28.915769] pci 0000:0b:05.0: enabling device (0000 -> 0002)
[ 28.915800] mpt3sas 0000:0e:00.0: enabling device (0000 -> 0002)
[ 28.915847] mpt3sas_cm2: 63 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (3885552 kB)
[ 28.917936] sd 1:0:0:0: [sda] 975724592 512-byte logical blocks: (500 GB/465 GiB)
[ 28.924717] sd 1:0:0:0: [sda] Write Protect is off
[ 28.924729] sd 1:0:0:0: [sda] Mode Sense: 9b 00 10 08
[ 28.926756] sd 1:0:0:0: [sda] Write cache: enabled, read cache: enabled, supports DPO and FUA
[ 28.971834] mpt3sas_cm2: CurrentHostPageSize is 0: Setting default host page size to 4k
[ 28.971872] mpt3sas_cm2: MSI-X vectors supported: 128
[ 28.971879] no of cores: 4, max_msix_vectors: -1
[ 28.971884] mpt3sas_cm2: 0 4 4
[ 28.972276] mpt3sas_cm2: High IOPs queues : disabled
[ 28.972283] mpt3sas2-msix0: PCI-MSI-X enabled: IRQ 73
[ 28.972288] mpt3sas2-msix1: PCI-MSI-X enabled: IRQ 74
[ 28.972293] mpt3sas2-msix2: PCI-MSI-X enabled: IRQ 75
[ 28.972297] mpt3sas2-msix3: PCI-MSI-X enabled: IRQ 76
[ 28.972302] mpt3sas_cm2: iomem(0x0000000600d00000), mapped(0x000000009160f51c), size(1048576)
[ 28.972310] mpt3sas_cm2: ioport(0x0000000000000000), size(0)
[ 28.986402] sd 1:0:0:0: [sda] Attached SCSI disk
[ 29.029063] mpt3sas_cm2: CurrentHostPageSize is 0: Setting default host page size to 4k
[ 29.056799] mpt3sas_cm2: scatter gather: sge_in_main_msg(1), sge_per_chain(7), sge_per_io(128), chains_per_io(19)
[ 29.058033] mpt3sas_cm2: request pool(0x0000000078770a9d) - dma(0x41c200000): depth(7272), frame_size(128), pool_size(909 kB)
[ 30.211333] mpt3sas_cm2: sense pool(0x00000000fd83fac7) - dma(0x41c300000): depth(7059), element_size(96), pool_size (661 kB)
[ 30.211360] mpt3sas_cm2: sense pool(0x00000000fd83fac7)- dma(0x41c300000): depth(7059),element_size(96), pool_size(4 kB)
[ 30.212561] mpt3sas_cm2: reply pool(0x00000000c747b5de) - dma(0x41c400000): depth(7336), frame_size(128), pool_size(917 kB)
[ 30.213000] mpt3sas_cm2: config page(0x00000000ca0f15e1) - dma(0x45652d000): size(512)
[ 30.213010] mpt3sas_cm2: Allocated physical memory: size(31210 kB)
[ 30.213015] mpt3sas_cm2: Current Controller Queue Depth(7056),Max Controller Queue Depth(7168)
[ 30.213019] mpt3sas_cm2: Scatter Gather Elements per IO(128)
[ 30.332717] mpt3sas_cm2: _base_display_fwpkg_version: complete
[ 30.332725] mpt3sas_cm2: FW Package Ver(05.00.00.00)
[ 30.332879] mpt3sas_cm2: TimeSync Interval in Manuf page-11 is not enabled. Periodic Time-Sync will be disabled
[ 30.333378] mpt3sas_cm2: SAS3616: FWVersion(05.00.00.00), ChipRevision(0x02), BiosVersion(00.00.00.00)
[ 30.333387] NVMe
[ 30.333391] mpt3sas_cm2: Protocol=(Initiator,Target), Capabilities=(TLR,EEDP,Diag Trace Buffer,Task Set Full,NCQ)
[ 30.333513] mpt3sas_cm2: log_info(0x300301e0): originator(IOP), code(0x03), sub_code(0x01e0)
[ 30.333542] mpt3sas_cm2: log_info(0x300301e0): originator(IOP), code(0x03), sub_code(0x01e0)
[ 30.333749] mpt3sas 0000:0e:00.0: Max SCSIIO MPT commands: 7056 shared with nr_hw_queues = 4
[ 30.333760] scsi host2: Fusion MPT SAS Host
[ 30.358905] mpt3sas_cm2: sending port enable !!
[ 33.153649] mpt3sas_cm2: hba_port entry: 00000000296158ae, port: 0 is added to hba_port list
[ 33.155900] mpt3sas_cm2: hba_port entry: 00000000cf3a1228, port: 8 is added to hba_port list
[ 33.158714] mpt3sas_cm2: host_add: handle(0x0001), sas_addr(0x500605b00f3dfa80), phys(17)
[ 33.159204] mpt3sas_cm2: handle(0x11) sas_address(0x510600b00f3dfa80) port_type(0x0)
[ 33.160212] scsi 2:0:0:0: Enclosure LSI VirtualSES 01 PQ: 0 ANSI: 6
[ 33.160233] scsi 2:0:0:0: set ignore_delay_remove for handle(0x0011)
[ 33.160239] scsi 2:0:0:0: SES: handle(0x0011), sas_addr(0x510600b00f3dfa80), phy(16), device_name(0x510600b00f3dfa80)
[ 33.160244] scsi 2:0:0:0: enclosure logical id (0x500605b00f3dfa80), slot(16)
[ 33.160248] scsi 2:0:0:0: enclosure level(0x0000), connector name( )
[ 33.160255] scsi 2:0:0:0: qdepth(1), tagged(0), scsi_level(7), cmd_que(0)
[ 33.160294] mpt3sas_cm2: log_info(0x31200206): originator(PL), code(0x20), sub_code(0x0206)
[ 33.161232] end_device-2:0: add: handle(0x0011), sas_addr(0x510600b00f3dfa80)
[ 33.764634] cam-dummy-reg: disabling
[ 38.404635] mpt3sas_cm2: port enable: SUCCESS
[ 38.406201] pci 0000:0b:07.0: enabling device (0000 -> 0002)
[ 38.406259] mpt3sas 0000:0f:00.0: enabling device (0000 -> 0002)
[ 38.406336] mpt3sas_cm3: 63 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (3885552 kB)
[ 38.465052] mpt3sas_cm3: CurrentHostPageSize is 0: Setting default host page size to 4k
[ 38.465115] mpt3sas_cm3: MSI-X vectors supported: 128
[ 38.465129] no of cores: 4, max_msix_vectors: -1
[ 38.465141] mpt3sas_cm3: 0 4 4
[ 38.465923] mpt3sas_cm3: High IOPs queues : disabled
[ 38.465939] mpt3sas3-msix0: PCI-MSI-X enabled: IRQ 77
[ 38.465952] mpt3sas3-msix1: PCI-MSI-X enabled: IRQ 78
[ 38.465963] mpt3sas3-msix2: PCI-MSI-X enabled: IRQ 79
[ 38.465974] mpt3sas3-msix3: PCI-MSI-X enabled: IRQ 80
[ 38.465985] mpt3sas_cm3: iomem(0x0000000600f00000), mapped(0x0000000085d0d7dc), size(1048576)
[ 38.466004] mpt3sas_cm3: ioport(0x0000000000000000), size(0)
[ 38.524899] mpt3sas_cm3: CurrentHostPageSize is 0: Setting default host page size to 4k
[ 38.552664] mpt3sas_cm3: scatter gather: sge_in_main_msg(1), sge_per_chain(7), sge_per_io(128), chains_per_io(19)
[ 38.555881] mpt3sas_cm3: request pool(0x000000003b3bae2a) - dma(0x41c600000): depth(7272), frame_size(128), pool_size(909 kB)
[ 39.716969] mpt3sas_cm3: sense pool(0x00000000a81f00bf) - dma(0x41c700000): depth(7059), element_size(96), pool_size (661 kB)
[ 39.716996] mpt3sas_cm3: sense pool(0x00000000a81f00bf)- dma(0x41c700000): depth(7059),element_size(96), pool_size(4 kB)
[ 39.718228] mpt3sas_cm3: reply pool(0x00000000e72d3333) - dma(0x41c800000): depth(7336), frame_size(128), pool_size(917 kB)
[ 39.718385] mpt3sas_cm3: config page(0x0000000043a8ee65) - dma(0x45a2e9000): size(512)
[ 39.718392] mpt3sas_cm3: Allocated physical memory: size(31210 kB)
[ 39.718397] mpt3sas_cm3: Current Controller Queue Depth(7056),Max Controller Queue Depth(7168)
[ 39.718402] mpt3sas_cm3: Scatter Gather Elements per IO(128)
[ 39.838113] mpt3sas_cm3: _base_display_fwpkg_version: complete
[ 39.838123] mpt3sas_cm3: FW Package Ver(05.00.00.00)
[ 39.838268] mpt3sas_cm3: TimeSync Interval in Manuf page-11 is not enabled. Periodic Time-Sync will be disabled
[ 39.838768] mpt3sas_cm3: SAS3616: FWVersion(05.00.00.00), ChipRevision(0x02), BiosVersion(00.00.00.00)
[ 39.838777] NVMe
[ 39.838781] mpt3sas_cm3: Protocol=(Initiator,Target), Capabilities=(TLR,EEDP,Diag Trace Buffer,Task Set Full,NCQ)
[ 39.838903] mpt3sas_cm3: log_info(0x300301e0): originator(IOP), code(0x03), sub_code(0x01e0)
[ 39.838932] mpt3sas_cm3: log_info(0x300301e0): originator(IOP), code(0x03), sub_code(0x01e0)
[ 39.838971] mpt3sas 0000:0f:00.0: Max SCSIIO MPT commands: 7056 shared with nr_hw_queues = 4
[ 39.838982] scsi host3: Fusion MPT SAS Host
[ 39.865436] mpt3sas_cm3: sending port enable !!
[ 42.653845] mpt3sas_cm3: hba_port entry: 00000000cd02c1a1, port: 0 is added to hba_port list
[ 42.656344] mpt3sas_cm3: hba_port entry: 00000000d693f100, port: 8 is added to hba_port list
[ 42.659492] mpt3sas_cm3: host_add: handle(0x0001), sas_addr(0x500605b00f3df7f0), phys(17)
[ 42.659983] mpt3sas_cm3: handle(0x11) sas_address(0x510600b00f3df7f0) port_type(0x0)
[ 42.661220] scsi 3:0:0:0: Enclosure LSI VirtualSES 01 PQ: 0 ANSI: 6
[ 42.661240] scsi 3:0:0:0: set ignore_delay_remove for handle(0x0011)
[ 42.661247] scsi 3:0:0:0: SES: handle(0x0011), sas_addr(0x510600b00f3df7f0), phy(16), device_name(0x510600b00f3df7f0)
[ 42.661252] scsi 3:0:0:0: enclosure logical id (0x500605b00f3df7f0), slot(16)
[ 42.661256] scsi 3:0:0:0: enclosure level(0x0000), connector name( )
[ 42.661262] scsi 3:0:0:0: qdepth(1), tagged(0), scsi_level(7), cmd_que(0)
[ 42.661302] mpt3sas_cm3: log_info(0x31200206): originator(PL), code(0x20), sub_code(0x0206)
[ 42.662322] end_device-3:0: add: handle(0x0011), sas_addr(0x510600b00f3df7f0)
[ 47.904637] mpt3sas_cm3: port enable: SUCCESS
[ 48.094448] scsi 0:0:0:0: Attached scsi generic sg0 type 13
[ 48.095649] sd 1:0:0:0: Attached scsi generic sg1 type 0
[ 48.095747] scsi 1:0:1:0: Attached scsi generic sg2 type 13
[ 48.095842] scsi 2:0:0:0: Attached scsi generic sg3 type 13
[ 48.096113] scsi 3:0:0:0: Attached scsi generic sg4 type 13
And hey, look at that!
pi@sas:~ $ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 465.3G 0 disk
mmcblk0 179:0 0 29.7G 0 disk
├─mmcblk0p1 179:1 0 256M 0 part /boot
└─mmcblk0p2 179:2 0 29.5G 0 part /
Hello, lonely little WD Green that I was willing to sacrifice if this entire thing went kaboom!
Next test is to get one hard drive off each card and see if that works too.
I don't have enough HD Mini SAS (SFF-8643) to SATA adapter cables (I'm using these from CableCreation) to test all four cards... but two cards are working with two drives each:
pi@sas:~ $ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 465.8G 0 disk
sdb 8:16 0 111.8G 0 disk
sdc 8:32 0 465.3G 0 disk
sdd 8:48 0 111.8G 0 disk
I used my guide to create an mdadm RAID array in Linux to create a 4 disk RAID 0 array.
And the array seems to be working:
pi@sas:~ $ sudo mdadm --detail /dev/md0
/dev/md0:
Version : 1.2
Creation Time : Wed Apr 27 21:55:08 2022
Raid Level : raid0
Array Size : 1210157056 (1154.10 GiB 1239.20 GB)
Raid Devices : 4
Total Devices : 4
Persistence : Superblock is persistent
Update Time : Wed Apr 27 21:55:08 2022
State : clean
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0
Layout : original
Chunk Size : 512K
Consistency Policy : none
Name : sas:0 (local to host sas)
UUID : 221f1350:c9590fd4:03deaae8:d09bae04
Events : 0
Number Major Minor RaidDevice State
0 8 1 0 active sync /dev/sda1
1 8 17 1 active sync /dev/sdb1
2 8 33 2 active sync /dev/sdc1
3 8 49 3 active sync /dev/sdd1
And after mounting the array:
pi@sas:~ $ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/root 29G 1.5G 27G 6% /
devtmpfs 1.7G 0 1.7G 0% /dev
tmpfs 1.9G 0 1.9G 0% /dev/shm
tmpfs 759M 976K 758M 1% /run
tmpfs 5.0M 4.0K 5.0M 1% /run/lock
/dev/mmcblk0p1 253M 45M 208M 18% /boot
tmpfs 380M 0 380M 0% /run/user/1000
/dev/md0 1.2T 28K 1.2T 1% /mnt/raid0
And here are the results of disk-benchmark.sh:
Running fio sequential read test...
fio-rand-read-sequential: (g=0): rw=read, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=64
...
fio-3.25
Starting 4 processes
Jobs: 4 (f=4): [R(4)][27.3%][r=404MiB/s][r=403 IOPS][eta 00m:08s]
Jobs: 4 (f=4): [R(4)][36.4%][r=402MiB/s][r=401 IOPS][eta 00m:07s]
Jobs: 4 (f=4): [R(4)][45.5%][r=389MiB/s][r=388 IOPS][eta 00m:06s]
Jobs: 4 (f=4): [R(4)][54.5%][r=392MiB/s][r=391 IOPS][eta 00m:05s]
Jobs: 4 (f=4): [R(4)][63.6%][r=408MiB/s][r=408 IOPS][eta 00m:04s]
Jobs: 4 (f=4): [R(4)][72.7%][r=394MiB/s][r=393 IOPS][eta 00m:03s]
Jobs: 4 (f=4): [R(4)][81.8%][r=396MiB/s][r=396 IOPS][eta 00m:02s]
Jobs: 4 (f=4): [R(4)][90.9%][r=397MiB/s][r=396 IOPS][eta 00m:01s]
Jobs: 4 (f=4): [R(4)][24.4%][r=385MiB/s][r=384 IOPS][eta 00m:34s]
fio-rand-read-sequential: (groupid=0, jobs=4): err= 0: pid=2530: Wed Apr 27 22:01:57 2022
read: IOPS=396, BW=397MiB/s (416MB/s)(4188MiB/10559msec)
slat (usec): min=75, max=100046, avg=4600.93, stdev=12722.72
clat (msec): min=134, max=1243, avg=634.65, stdev=121.09
lat (msec): min=136, max=1243, avg=639.25, stdev=121.80
clat percentiles (msec):
| 1.00th=[ 155], 5.00th=[ 456], 10.00th=[ 575], 20.00th=[ 609],
| 30.00th=[ 625], 40.00th=[ 634], 50.00th=[ 642], 60.00th=[ 651],
| 70.00th=[ 659], 80.00th=[ 676], 90.00th=[ 701], 95.00th=[ 743],
| 99.00th=[ 1083], 99.50th=[ 1133], 99.90th=[ 1200], 99.95th=[ 1217],
| 99.99th=[ 1250]
bw ( KiB/s): min=91961, max=452374, per=95.61%, avg=388307.83, stdev=19725.13, samples=83
iops : min= 86, max= 441, avg=378.25, stdev=19.40, samples=83
lat (msec) : 250=2.67%, 500=2.87%, 750=89.68%, 1000=3.01%, 2000=1.77%
cpu : usr=0.17%, sys=3.29%, ctx=3682, majf=0, minf=65634
IO depths : 1=0.1%, 2=0.2%, 4=0.4%, 8=0.8%, 16=1.5%, 32=3.1%, >=64=94.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=99.9%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued rwts: total=4188,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
READ: bw=397MiB/s (416MB/s), 397MiB/s-397MiB/s (416MB/s-416MB/s), io=4188MiB (4391MB), run=10559-10559msec
Disk stats (read/write):
md0: ios=16752/0, merge=0/0, ticks=9172540/0, in_queue=9172540, util=99.19%, aggrios=4188/1, aggrmerge=0/0, aggrticks=2296652/0, aggrin_queue=2296652, aggrutil=98.43%
sdd: ios=4188/1, merge=0/0, ticks=1898746/0, in_queue=1898746, util=97.04%
sdc: ios=4188/1, merge=0/0, ticks=2574014/0, in_queue=2574015, util=97.74%
sdb: ios=4188/1, merge=0/0, ticks=2302127/0, in_queue=2302127, util=98.43%
sda: ios=4188/1, merge=0/0, ticks=2411723/0, in_queue=2411723, util=97.87%
Running iozone 1024K random read and write tests...
Iozone: Performance Test of File I/O
Version $Revision: 3.492 $
Compiled for 64 bit mode.
Build: linux-arm
Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins
Al Slater, Scott Rhine, Mike Wisner, Ken Goss
Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR,
Randy Dunlap, Mark Montague, Dan Million, Gavin Brebner,
Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy, Dave Boone,
Erik Habbinga, Kris Strecker, Walter Wong, Joshua Root,
Fabrice Bacchella, Zhenghua Xue, Qin Li, Darren Sawyer,
Vangel Bojaxhi, Ben England, Vikentsi Lapa,
Alexey Skidanov, Sudhir Kumar.
Run began: Wed Apr 27 22:01:58 2022
Include fsync in write timing
O_DIRECT feature enabled
Auto Mode
File size set to 102400 kB
Record Size 1024 kB
Command line used: ./iozone -e -I -a -s 100M -r 1024k -i 0 -i 2 -f /mnt/raid0/iozone
Output is in kBytes/sec
Time Resolution = 0.000001 seconds.
Processor cache size set to 1024 kBytes.
Processor cache line size set to 32 bytes.
File stride size set to 17 * record size.
random random bkwd record stride
kB reclen write rewrite read reread read write read rewrite read fwrite frewrite fread freread
102400 1024 258650 181828 92813 167495
iozone test complete.
Running iozone 4K random read and write tests...
Iozone: Performance Test of File I/O
Version $Revision: 3.492 $
Compiled for 64 bit mode.
Build: linux-arm
Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins
Al Slater, Scott Rhine, Mike Wisner, Ken Goss
Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR,
Randy Dunlap, Mark Montague, Dan Million, Gavin Brebner,
Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy, Dave Boone,
Erik Habbinga, Kris Strecker, Walter Wong, Joshua Root,
Fabrice Bacchella, Zhenghua Xue, Qin Li, Darren Sawyer,
Vangel Bojaxhi, Ben England, Vikentsi Lapa,
Alexey Skidanov, Sudhir Kumar.
Run began: Wed Apr 27 22:02:01 2022
Include fsync in write timing
O_DIRECT feature enabled
Auto Mode
File size set to 102400 kB
Record Size 4 kB
Command line used: ./iozone -e -I -a -s 100M -r 4k -i 0 -i 2 -f /mnt/raid0/iozone
Output is in kBytes/sec
Time Resolution = 0.000001 seconds.
Processor cache size set to 1024 kBytes.
Processor cache line size set to 32 bytes.
File stride size set to 17 * record size.
random random bkwd record stride
kB reclen write rewrite read reread read write read rewrite read fwrite frewrite fread freread
102400 4 25394 32450 5270 5323
iozone test complete.
So we can still put through 400 MiB/s (416 MB/s) using multiple HBAs (two, in this case), which is a positive sign that the Pi won't be hampered too much by multiple cards on a PCIe switch. I hoped that would be the case, but was prepared for disappointment. Luckily I'm not disappointed, lol.
The random IO would be a lot faster if I used all faster drives. As it is, I have a 120 GB MakerDisk SSD, a 120 GB Kingston A400 SSD, and two WD GreenPower WD5000AVDS drives that are very very slow, dragging the array down.
But I wanted to see how a mixed environment would work.
I just ordered another 2 pack of the HD Mini SAS cables and hopefully I will see the same result spanning the disks one disk per card.
Created a PR with the patch: https://github.com/geerlingguy/linux/pull/4
Glad to see this worked out for you. Have you tried recreating the kernel panic issue when quiet is turned on?
@joshuaboud Current /boot/cmdline.txt
:
console=serial0,115200 console=tty1 root=PARTUUID=1b8530a1-02 rootfstype=ext4 fsck.repair=yes rootwait
I just modified the file to:
console=serial0,115200 console=tty3 root=PARTUUID=1b8530a1-02 rootfstype=ext4 fsck.repair=yes loglevel=3 quiet rootwait logo.nologo
After reboot, it does indeed lock up, with the following kernel panic:
It's so strange, I have no idea what could cause that, though at least it isn't just me
@joshuaboud Yeah... I'm perplexed, didn't look too deep but it is good to see it's consistent and also still happening with 5.15.y and the latest firmware. Just reset the cmdline.txt back and rebooted again, and everything's working as normal.
Something weird with the behavior of PCIe initialization if you try silencing the console...
Finally got the extra cables in, and at this point I can confirm I can split one drive per controller, and they're all accessible, and perform similarly (afaict):
Run status group 0 (all jobs):
READ: bw=397MiB/s (416MB/s), 397MiB/s-397MiB/s (416MB/s-416MB/s), io=4186MiB (4389MB), run=10557-10557msec
random random bkwd record stride
kB reclen write rewrite read reread read write read rewrite read fwrite frewrite fread freread
102400 1024 177821 267258 91817 181764
random random bkwd record stride
kB reclen write rewrite read reread read write read rewrite read fwrite frewrite fread freread
102400 4 26498 37931 5299 5682
I downloaded StorCLI (from here: https://www.broadcom.com/support/download-search?pg=&pf=&pn=&pa=&po=&dk=storcli&pl= - MR 7.20) and ran the arm64 version:
pi@sas:~ $ ./storcli64 show
CLI Version = 007.2007.0000.0000 Feb 11, 2022
Operating system = Linux 5.15.35-v8+
Status Code = 0
Status = Success
Description = None
Number of Controllers = 0
Host Name = sas
Operating System = Linux 5.15.35-v8+
Do I need to do anything else special to get storcli to work with these cards through the PCIe switches?
Oh... lol:
pi@sas:~ $ sudo ./storcli64 show
CLI Version = 007.2103.0000.0000 Dec 08, 2021
Operating system = Linux 5.15.35-v8+
Status Code = 0
Status = Success
Description = None
Number of Controllers = 4
Host Name = sas
Operating System = Linux 5.15.35-v8+
StoreLib IT Version = 07.2103.0200.0000
IT System Overview :
==================
----------------------------------------------------------------------------
Ctl Model AdapterType VendId DevId SubVendId SubDevId PCI Address
----------------------------------------------------------------------------
0 HBA 9405W-16i SAS3616(B0) 0x1000 0xD1 0x1000 0x3080 00:0c:00:00
1 HBA 9405W-16i SAS3616(B0) 0x1000 0xD1 0x1000 0x3080 00:0d:00:00
2 HBA 9405W-16i SAS3616(B0) 0x1000 0xD1 0x1000 0x3080 00:0e:00:00
3 HBA 9405W-16i SAS3616(B0) 0x1000 0xD1 0x1000 0x3080 00:0f:00:00
----------------------------------------------------------------------------
And then all the drives:
$ sudo ./storcli64 /c0 show
...
------------------------------------------------------------------------------
EID:Slt DID State DG Size Intf Med SED PI SeSz Model Sp
------------------------------------------------------------------------------
0:0 1 JBOD - 111.790 GB SATA SSD - - 512B KINGSTON SA400S37120G -
------------------------------------------------------------------------------
$ sudo ./storcli64 /c1 show
...
------------------------------------------------------------------------------
EID:Slt DID State DG Size Intf Med SED PI SeSz Model Sp
------------------------------------------------------------------------------
0:0 1 JBOD - 465.261 GB SATA HDD - - 512B WDC WD5000AVDS-63U7B1 -
------------------------------------------------------------------------------
$ sudo ./storcli64 /c2 show
...
-----------------------------------------------------------------
EID:Slt DID State DG Size Intf Med SED PI SeSz Model Sp
-----------------------------------------------------------------
0:0 1 JBOD - 111.790 GB SATA SSD - - 512B SATA SSD -
-----------------------------------------------------------------
$ sudo ./storcli64 /c3 show
...
------------------------------------------------------------------------------
EID:Slt DID State DG Size Intf Med SED PI SeSz Model Sp
------------------------------------------------------------------------------
0:0 1 JBOD - 465.761 GB SATA HDD - - 512B WDC WD5000AVDS-61U7B1 -
------------------------------------------------------------------------------
Besides the weird issue with quiet
not working, I think we've explored this card enough to give it a thumbs up overall.
Still testing, in a sense.
$ ls /dev/sd*[a-z] | wc -l
60
heh... First test is RAID 0 using my mdadm
guide:
# Partition all 60 disks (optional... I just obliterate the partitioning when I create the array).
$ for i in `ls /dev/sd*[a-z]`; do sudo sgdisk -n 1:0:0 $i; done
# Get list of all the drives to copy out to next command.
$ ls /dev/sd*[a-z]
# Create a RAID0 array.
$ sudo mdadm --create --verbose /dev/md0 --level=0 --raid-devices=60 /dev/sda /dev/sdae /dev/sdaj /dev/sdao /dev/sdat /dev/sday /dev/sdbc /dev/sdbh /dev/sdg /dev/sdl /dev/sdq /dev/sdv /dev/sdaa /dev/sdaf /dev/sdak /dev/sdap /dev/sdau /dev/sdaz /dev/sdbd /dev/sdc /dev/sdh /dev/sdm /dev/sdr /dev/sdw /dev/sdab /dev/sdag /dev/sdal /dev/sdaq /dev/sdav /dev/sdb /dev/sdbe /dev/sdd /dev/sdi /dev/sdn /dev/sds /dev/sdx /dev/sdac /dev/sdah /dev/sdam /dev/sdar /dev/sdaw /dev/sdba /dev/sdbf /dev/sde /dev/sdj /dev/sdo /dev/sdt /dev/sdy /dev/sdad /dev/sdai /dev/sdan /dev/sdas /dev/sdax /dev/sdbb /dev/sdbg /dev/sdf /dev/sdk /dev/sdp /dev/sdu /dev/sdz
# Verify the array is working.
$ sudo mdadm --detail /dev/md0
/dev/md0:
Version : 1.2
Creation Time : Thu May 12 15:12:43 2022
Raid Level : raid0
Array Size : 1171901583360 (1117612.44 GiB 1200027.22 GB)
Raid Devices : 60
Total Devices : 60
Persistence : Superblock is persistent
Update Time : Thu May 12 15:12:43 2022
State : clean
Active Devices : 60
Working Devices : 60
Failed Devices : 0
Spare Devices : 0
Layout : -unknown-
Chunk Size : 512K
Consistency Policy : none
Name : sas:0 (local to host sas)
UUID : f8b39a60:357b0007:acc18066:aa6fdf97
Events : 0
Number Major Minor RaidDevice State
0 8 0 0 active sync /dev/sda
1 65 224 1 active sync /dev/sdae
2 66 48 2 active sync /dev/sdaj
3 66 128 3 active sync /dev/sdao
4 66 208 4 active sync /dev/sdat
5 67 32 5 active sync /dev/sday
6 67 96 6 active sync /dev/sdbc
7 67 176 7 active sync /dev/sdbh
8 8 96 8 active sync /dev/sdg
9 8 176 9 active sync /dev/sdl
10 65 0 10 active sync /dev/sdq
11 65 80 11 active sync /dev/sdv
12 65 160 12 active sync /dev/sdaa
13 65 240 13 active sync /dev/sdaf
14 66 64 14 active sync /dev/sdak
15 66 144 15 active sync /dev/sdap
16 66 224 16 active sync /dev/sdau
17 67 48 17 active sync /dev/sdaz
18 67 112 18 active sync /dev/sdbd
19 8 32 19 active sync /dev/sdc
20 8 112 20 active sync /dev/sdh
21 8 192 21 active sync /dev/sdm
22 65 16 22 active sync /dev/sdr
23 65 96 23 active sync /dev/sdw
24 65 176 24 active sync /dev/sdab
25 66 0 25 active sync /dev/sdag
26 66 80 26 active sync /dev/sdal
27 66 160 27 active sync /dev/sdaq
28 66 240 28 active sync /dev/sdav
29 8 16 29 active sync /dev/sdb
30 67 128 30 active sync /dev/sdbe
31 8 48 31 active sync /dev/sdd
32 8 128 32 active sync /dev/sdi
33 8 208 33 active sync /dev/sdn
34 65 32 34 active sync /dev/sds
35 65 112 35 active sync /dev/sdx
36 65 192 36 active sync /dev/sdac
37 66 16 37 active sync /dev/sdah
38 66 96 38 active sync /dev/sdam
39 66 176 39 active sync /dev/sdar
40 67 0 40 active sync /dev/sdaw
41 67 64 41 active sync /dev/sdba
42 67 144 42 active sync /dev/sdbf
43 8 64 43 active sync /dev/sde
44 8 144 44 active sync /dev/sdj
45 8 224 45 active sync /dev/sdo
46 65 48 46 active sync /dev/sdt
47 65 128 47 active sync /dev/sdy
48 65 208 48 active sync /dev/sdad
49 66 32 49 active sync /dev/sdai
50 66 112 50 active sync /dev/sdan
51 66 192 51 active sync /dev/sdas
52 67 16 52 active sync /dev/sdax
53 67 80 53 active sync /dev/sdbb
54 67 160 54 active sync /dev/sdbg
55 8 80 55 active sync /dev/sdf
56 8 160 56 active sync /dev/sdk
57 8 240 57 active sync /dev/sdp
58 65 64 58 active sync /dev/sdu
59 65 144 59 active sync /dev/sdz
# Format the array.
$ sudo mkfs.ext4 -m 0 -E lazy_itable_init=0,lazy_journal_init=0 /dev/md0
# Mount the array.
$ sudo mkdir /mnt/raid0
$ sudo mount /dev/md0 /mnt/raid0
When I started the formatting operation, I got the following call trace printed by mdadm:
[ 1506.932433] ------------[ cut here ]------------
[ 1506.932450] WARNING: CPU: 1 PID: 1405 at lib/vsprintf.c:2742 vsnprintf+0x54c/0x6e0
[ 1506.932468] Modules linked in: raid0 md_mod sg cmac algif_hash aes_arm64 algif_skcipher af_alg bnep hci_uart btbcm bluetooth ecdh_generic ecc hid_logitech_hidpp 8021q garp stp llc joydev snd_soc_hdmi_codec hid_logitech_dj brcmfmac brcmutil bcm2835_codec(C) v3d cfg80211 bcm2835_isp(C) vc4 bcm2835_v4l2(C) gpu_sched v4l2_mem2mem bcm2835_mmal_vchiq(C) videobuf2_vmalloc videobuf2_dma_contig videobuf2_memops cec videobuf2_v4l2 drm_kms_helper rfkill videobuf2_common raspberrypi_hwmon snd_soc_core i2c_brcmstb mpt3sas videodev raid_class snd_compress scsi_transport_sas vc_sm_cma(C) snd_pcm_dmaengine snd_bcm2835(C) snd_pcm snd_timer mc snd syscopyarea sysfillrect sysimgblt rpivid_mem nvmem_rmem fb_sys_fops uio_pdrv_genirq uio drm fuse drm_panel_orientation_quirks backlight ip_tables x_tables ipv6
[ 1506.932622] CPU: 1 PID: 1405 Comm: mdadm Tainted: G C 5.15.35-v8+ #1
[ 1506.932629] Hardware name: Raspberry Pi Compute Module 4 Rev 1.0 (DT)
[ 1506.932633] pstate: a0000005 (NzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 1506.932639] pc : vsnprintf+0x54c/0x6e0
[ 1506.932644] lr : snprintf+0x60/0x88
[ 1506.932649] sp : ffffffc018b0b880
[ 1506.932652] x29: ffffffc018b0b880 x28: ffffff8121707580 x27: 00000000000000ca
[ 1506.932661] x26: 000000000000002d x25: ffffffd939eb5168 x24: ffffffd9b808b000
[ 1506.932670] x23: ffffffc018b0bb1a x22: 00000000ffffffd8 x21: ffffffd939eb5178
[ 1506.932678] x20: ffffffc018b0b9b0 x19: ffffffd9b808b048 x18: 0000000000000001
[ 1506.932686] x17: 0000000000000001 x16: ffffffd9b785f308 x15: 00000221b58e2000
[ 1506.932694] x14: ffffffd9b808b048 x13: ffffff8121707418 x12: ffffff8121707414
[ 1506.932702] x11: 000000000000003c x10: ffffffc018b0b9b0 x9 : 00000000ffffffd8
[ 1506.932711] x8 : ffffffc018b0b980 x7 : 0000000000000003 x6 : 00000000ffffffff
[ 1506.932718] x5 : 0000000000000000 x4 : ffffffd9b808b048 x3 : ffffffc018b0b930
[ 1506.932726] x2 : ffffffd939eb5178 x1 : fffffffffffffffe x0 : ffffffc018b0b9b0
[ 1506.932735] Call trace:
[ 1506.932737] vsnprintf+0x54c/0x6e0
[ 1506.932742] snprintf+0x60/0x88
[ 1506.932747] dump_zones.isra.17+0x100/0x190 [raid0]
[ 1506.932758] raid0_run+0x148/0x250 [raid0]
[ 1506.932764] md_run+0x488/0xb18 [md_mod]
[ 1506.932791] do_md_run+0x80/0x178 [md_mod]
[ 1506.932807] md_ioctl+0xd48/0x1640 [md_mod]
[ 1506.932824] blkdev_ioctl+0x23c/0x3d0
[ 1506.932830] block_ioctl+0x54/0x70
[ 1506.932835] __arm64_sys_ioctl+0xb0/0xf0
[ 1506.932842] invoke_syscall+0x4c/0x110
[ 1506.932849] el0_svc_common.constprop.3+0xfc/0x120
[ 1506.932854] do_el0_svc+0x2c/0x90
[ 1506.932860] el0_svc+0x24/0x60
[ 1506.932867] el0t_64_sync_handler+0x90/0xb8
[ 1506.932872] el0t_64_sync+0x180/0x184
[ 1506.932877] ---[ end trace 6a720dbd06819c8f ]---
[ 1506.933188] md0: detected capacity change from 0 to 2343803166720
But it seemed to proceed normally.
EXT4 initialization started at 3:15 pm, took until 5:23 pm, so total time of 2 hours, 8 minutes (it was writing around 150 MiB/sec the entire time, and it can get a bit toasty!).
I also noticed a few more errors in dmesg during the formatting:
[ 3514.441499] perf: interrupt took too long (2511 > 2500), lowering kernel.perf_event_max_sample_rate to 79500
[ 3889.553149] perf: interrupt took too long (3158 > 3138), lowering kernel.perf_event_max_sample_rate to 63250
[ 4506.851519] perf: interrupt took too long (3956 > 3947), lowering kernel.perf_event_max_sample_rate to 50500
[ 5603.499470] perf: interrupt took too long (4948 > 4945), lowering kernel.perf_event_max_sample_rate to 40250
It looks like others have reported similar messages during heavy Disk I/O (e.g. on older systems running a btrfs scrub, like here. From this patch.
I also want to try overclocking to 2.2 GHz after running initial benchmarks since it looks like I have plenty of headroom (CPU temp is around 34-37°C with those high-CFM fans blowing directly over the CM4 heatsink).
Hmm...
$ sudo mount /dev/md0 /mnt/raid0
mount: /mnt/raid0: wrong fs type, bad option, bad superblock on /dev/md0, missing codepage or helper program, or other error.
$ sudo mdadm --detail /dev/md0
/dev/md0:
Version : 1.2
Creation Time : Thu May 12 15:12:43 2022
Raid Level : raid0
Array Size : 1171901583360 (1117612.44 GiB 1200027.22 GB)
Raid Devices : 60
Total Devices : 60
Persistence : Superblock is persistent
Update Time : Thu May 12 15:12:43 2022
State : broken
Active Devices : 60
Working Devices : 60
Failed Devices : 0
Spare Devices : 0
Layout : -unknown-
Chunk Size : 512K
Consistency Policy : none
Name : sas:0 (local to host sas)
UUID : f8b39a60:357b0007:acc18066:aa6fdf97
Events : 0
Number Major Minor RaidDevice State
0 8 0 0 active sync /dev/sda
1 65 224 1 active sync /dev/sdae
2 66 48 2 active sync /dev/sdaj
3 66 128 3 active sync /dev/sdao
4 66 208 4 active sync /dev/sdat
5 67 32 5 active sync
6 67 96 6 active sync /dev/sdbc
7 67 176 7 active sync /dev/sdbh
8 8 96 8 active sync /dev/sdg
9 8 176 9 active sync /dev/sdl
10 65 0 10 active sync /dev/sdq
11 65 80 11 active sync /dev/sdv
12 65 160 12 active sync /dev/sdaa
13 65 240 13 active sync /dev/sdaf
14 66 64 14 active sync /dev/sdak
15 66 144 15 active sync /dev/sdap
16 66 224 16 active sync /dev/sdau
17 67 48 17 active sync /dev/sdaz
18 67 112 18 active sync /dev/sdbd
19 8 32 19 active sync /dev/sdc
20 8 112 20 active sync /dev/sdh
21 8 192 21 active sync /dev/sdm
22 65 16 22 active sync /dev/sdr
23 65 96 23 active sync /dev/sdw
24 65 176 24 active sync /dev/sdab
25 66 0 25 active sync /dev/sdag
26 66 80 26 active sync /dev/sdal
27 66 160 27 active sync /dev/sdaq
28 66 240 28 active sync /dev/sdav
29 8 16 29 active sync /dev/sdb
30 67 128 30 active sync /dev/sdbe
31 8 48 31 active sync /dev/sdd
32 8 128 32 active sync /dev/sdi
33 8 208 33 active sync /dev/sdn
34 65 32 34 active sync /dev/sds
35 65 112 35 active sync /dev/sdx
36 65 192 36 active sync /dev/sdac
37 66 16 37 active sync /dev/sdah
38 66 96 38 active sync /dev/sdam
39 66 176 39 active sync /dev/sdar
40 67 0 40 active sync /dev/sdaw
41 67 64 41 active sync /dev/sdba
42 67 144 42 active sync /dev/sdbf
43 8 64 43 active sync /dev/sde
44 8 144 44 active sync /dev/sdj
45 8 224 45 active sync /dev/sdo
46 65 48 46 active sync /dev/sdt
47 65 128 47 active sync /dev/sdy
48 65 208 48 active sync /dev/sdad
49 66 32 49 active sync /dev/sdai
50 66 112 50 active sync /dev/sdan
51 66 192 51 active sync /dev/sdas
52 67 16 52 active sync /dev/sdax
53 67 80 53 active sync /dev/sdbb
54 67 160 54 active sync /dev/sdbg
55 8 80 55 active sync /dev/sdf
56 8 160 56 active sync /dev/sdk
57 8 240 57 active sync /dev/sdp
58 65 64 58 active sync /dev/sdu
59 65 144 59 active sync /dev/sdz
$ sudo cat /proc/mdstat
Personalities : [raid0]
md0 : active raid0 sdz[59] sdu[58] sdp[57] sdk[56] sdf[55] sdbg[54] sdbb[53] sdax[52] sdas[51] sdan[50] sdai[49] sdad[48] sdy[47] sdt[46] sdo[45] sdj[44] sde[43] sdbf[42] sdba[41] sdaw[40] sdar[39] sdam[38] sdah[37] sdac[36] sdx[35] sds[34] sdn[33] sdi[32] sdd[31] sdbe[30] sdb[29] sdav[28] sdaq[27] sdal[26] sdag[25] sdab[24] sdw[23] sdr[22] sdm[21] sdh[20] sdc[19] sdbd[18] sdaz[17] sdau[16] sdap[15] sdak[14] sdaf[13] sdaa[12] sdv[11] sdq[10] sdl[9] sdg[8] sdbh[7] sdbc[6] sday[5] sdat[4] sdao[3] sdaj[2] sdae[1] sda[0]
1171901583360 blocks super 1.2 512k chunks
unused devices: <none>
So... it looks like the missing drive in that list is /dev/sday
(I grabbed the list and pressed F5 in Sublime text to sort alphabetically, then did my ABC's through it until I found the missing letter).
Ah... dmesg
showing a bunch of errors: https://gist.github.com/geerlingguy/1004b7925de52aff730ecd84769d2b0d
...
[ 9226.938805] scsi 3:0:18:0: Attached scsi generic sg47 type 13
[ 9226.939107] end_device-3:18: add: handle(0x0011), sas_addr(0x510600b00f3df7f0)
[ 9226.939131] mpt3sas_cm3: AFTER adding end device: handle (0x0011), sas_addr(0x510600b00f3df7f0)
[ 9226.939567] mpt3sas_cm3: BEFORE adding end device: handle (0x0024), sas_addr(0x300605b00f3df7f7)
[ 9226.940346] mpt3sas_cm3: handle(0x24) sas_address(0x300605b00f3df7f7) port_type(0x1)
[ 9227.199460] sd 3:0:4:0: Power-on or device reset occurred
[ 9227.199613] sd 3:0:3:0: Power-on or device reset occurred
[ 9227.199658] sd 3:0:10:0: Power-on or device reset occurred
[ 9227.199676] sd 3:0:2:0: Power-on or device reset occurred
[ 9227.199692] sd 3:0:5:0: Power-on or device reset occurred
[ 9227.199706] sd 3:0:6:0: Power-on or device reset occurred
[ 9227.199721] sd 3:0:1:0: Power-on or device reset occurred
[ 9227.199737] sd 3:0:12:0: Power-on or device reset occurred
[ 9227.199752] sd 3:0:13:0: Power-on or device reset occurred
[ 9227.200013] sd 3:0:11:0: Power-on or device reset occurred
[ 9227.200041] sd 3:0:16:0: Power-on or device reset occurred
[ 9227.200056] sd 3:0:8:0: Power-on or device reset occurred
[ 9227.200070] sd 3:0:9:0: Power-on or device reset occurred
[ 9227.200084] sd 3:0:14:0: Power-on or device reset occurred
[ 9227.200098] sd 3:0:15:0: Power-on or device reset occurred
[ 9227.200853] scsi 3:0:19:0: Direct-Access ATA ST20000NM007D-3D SN01 PQ: 0 ANSI: 6
[ 9227.200893] scsi 3:0:19:0: SATA: handle(0x0024), sas_addr(0x300605b00f3df7f7), phy(7), device_name(0x0000000000000000)
[ 9227.200899] scsi 3:0:19:0: enclosure logical id (0x500605b00f3df7f0), slot(11)
[ 9227.200903] scsi 3:0:19:0: enclosure level(0x0000), connector name( C2 )
[ 9227.201090] scsi 3:0:19:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
[ 9227.201105] scsi 3:0:19:0: qdepth(128), tagged(1), scsi_level(7), cmd_que(1)
[ 9227.205251] sd 3:0:19:0: Attached scsi generic sg54 type 0
[ 9227.205378] sd 3:0:19:0: Power-on or device reset occurred
[ 9227.205425] end_device-3:19: add: handle(0x0024), sas_addr(0x300605b00f3df7f7)
...
[ 9227.210798] mpt3sas_cm3: scan devices: complete
[ 9227.236642] sd 3:0:19:0: [sdbi] Write Protect is off
[ 9227.236663] sd 3:0:19:0: [sdbi] Mode Sense: 9b 00 10 08
[ 9227.237916] sd 3:0:19:0: [sdbi] Write cache: enabled, read cache: enabled, supports DPO and FUA
[ 9227.326525] sd 3:0:19:0: [sdbi] Attached SCSI disk
[ 9227.806817] md: md0: raid0 array has a missing/failed member
[ 9229.139126] Buffer I/O error on dev md0, logical block 34, lost async page write
[ 9229.139180] Buffer I/O error on dev md0, logical block 514, lost async page write
[ 9229.139198] Buffer I/O error on dev md0, logical block 515, lost async page write
[ 9229.139214] Buffer I/O error on dev md0, logical block 516, lost async page write
[ 9229.139230] Buffer I/O error on dev md0, logical block 517, lost async page write
[ 9229.139246] Buffer I/O error on dev md0, logical block 518, lost async page write
[ 9229.139262] Buffer I/O error on dev md0, logical block 146487672831, lost async page write
[ 9229.139278] Buffer I/O error on dev md0, logical block 146487705600, lost async page write
[ 9229.139293] Buffer I/O error on dev md0, logical block 146487705601, lost async page write
[ 9229.139309] Buffer I/O error on dev md0, logical block 146487705602, lost async page write
[ 9234.156436] buffer_io_error: 266237 callbacks suppressed
[ 9234.156455] Buffer I/O error on dev md0, logical block 19326304256, lost async page write
...
[ 9249.168163] Buffer I/O error on dev md0, logical block 159016517632, lost async page write
[ 9445.238579] F2FS-fs (md0): Magic Mismatch, valid(0xf2f52010) - read(0x0)
[ 9445.238622] F2FS-fs (md0): Can't find valid F2FS filesystem in 1th superblock
[ 9445.239182] F2FS-fs (md0): Magic Mismatch, valid(0xf2f52010) - read(0x0)
[ 9445.239206] F2FS-fs (md0): Can't find valid F2FS filesystem in 2th superblock
...
Four reboots later, and:
$ sudo mdadm --misc --detail /dev/md0
/dev/md0:
Version : 1.2
Creation Time : Thu May 12 15:12:43 2022
Raid Level : raid0
Array Size : 1171901583360 (1117612.44 GiB 1200027.22 GB)
Raid Devices : 60
Total Devices : 60
Persistence : Superblock is persistent
Update Time : Thu May 12 15:12:43 2022
State : clean
Active Devices : 60
Working Devices : 60
Failed Devices : 0
Spare Devices : 0
Layout : -unknown-
Chunk Size : 512K
Consistency Policy : none
Name : sas:0 (local to host sas)
UUID : f8b39a60:357b0007:acc18066:aa6fdf97
Events : 0
Number Major Minor RaidDevice State
0 8 32 0 active sync /dev/sdc
1 65 224 1 active sync /dev/sdae
2 66 48 2 active sync /dev/sdaj
3 66 80 3 active sync /dev/sdal
4 66 224 4 active sync /dev/sdau
5 67 0 5 active sync /dev/sdaw
6 67 160 6 active sync /dev/sdbg
7 67 144 7 active sync /dev/sdbf
8 8 64 8 active sync /dev/sde
9 8 0 9 active sync /dev/sda
10 65 32 10 active sync /dev/sds
11 65 160 11 active sync /dev/sdaa
12 65 80 12 active sync /dev/sdv
13 66 32 13 active sync /dev/sdai
14 66 16 14 active sync /dev/sdah
15 66 128 15 active sync /dev/sdao
16 67 16 16 active sync /dev/sdax
17 67 80 17 active sync /dev/sdbb
18 67 176 18 active sync /dev/sdbh
19 8 80 19 active sync /dev/sdf
20 8 176 20 active sync /dev/sdl
21 8 128 21 active sync /dev/sdi
22 65 16 22 active sync /dev/sdr
23 65 144 23 active sync /dev/sdz
24 65 48 24 active sync /dev/sdt
25 65 240 25 active sync /dev/sdaf
26 66 112 26 active sync /dev/sdan
27 65 208 27 active sync /dev/sdad
28 66 240 28 active sync /dev/sdav
29 8 48 29 active sync /dev/sdd
30 67 128 30 active sync /dev/sdbe
31 8 112 31 active sync /dev/sdh
32 8 16 32 active sync /dev/sdb
33 8 208 33 active sync /dev/sdn
34 65 96 34 active sync /dev/sdw
35 65 64 35 active sync /dev/sdu
36 65 0 36 active sync /dev/sdq
37 66 176 37 active sync /dev/sdar
38 66 160 38 active sync /dev/sdaq
39 66 144 39 active sync /dev/sdap
40 66 192 40 active sync /dev/sdas
41 67 112 41 active sync /dev/sdbd
42 67 64 42 active sync /dev/sdba
43 8 144 43 active sync /dev/sdj
44 8 96 44 active sync /dev/sdg
45 8 192 45 active sync /dev/sdm
46 65 176 46 active sync /dev/sdab
47 65 112 47 active sync /dev/sdx
48 66 96 48 active sync /dev/sdam
49 66 0 49 active sync /dev/sdag
50 66 64 50 active sync /dev/sdak
51 66 208 51 active sync /dev/sdat
52 67 32 52 active sync /dev/sday
53 67 48 53 active sync /dev/sdaz
54 67 96 54 active sync /dev/sdbc
55 8 224 55 active sync /dev/sdo
56 8 160 56 active sync /dev/sdk
57 8 240 57 active sync /dev/sdp
58 65 192 58 active sync /dev/sdac
59 65 128 59 active sync /dev/sdy
About 50% of the time (especially on a fresh boot) it seems to error out after 2 or 3 cards. Not sure why.
But when I try to mount the array, I still get:
$ sudo mount /dev/md0 /mnt/raid0
mount: /mnt/raid0: wrong fs type, bad option, bad superblock on /dev/md0, missing codepage or helper program, or other error.
Trying to format the array again:
pi@sas:~ $ time sudo mkfs.ext4 -m 0 -E lazy_itable_init=0,lazy_journal_init=0 /dev/md0
mke2fs 1.46.2 (28-Feb-2021)
Creating filesystem with 292975395840 4k blocks and 4291632000 inodes
Filesystem UUID: 4177e2fb-d90e-474e-9478-2179f7fad5db
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
102400000, 214990848, 512000000, 550731776, 644972544, 1934917632,
2560000000, 3855122432, 5804752896, 12800000000, 17414258688,
26985857024, 52242776064, 64000000000, 156728328192, 188900999168
Allocating group tables: done
Writing inode tables: done
Creating journal (262144 blocks): done
Writing superblocks and filesystem accounting information: mkfs.ext4: Input/output error while writing out and closing file system
real 119m48.526s
user 3m0.349s
sys 11m14.895s
Seeing some messages like this again in dmesg:
[ 567.745607] F2FS-fs (md0): Magic Mismatch, valid(0xf2f52010) - read(0x0)
[ 567.745650] F2FS-fs (md0): Can't find valid F2FS filesystem in 1th superblock
[ 567.745855] F2FS-fs (md0): Magic Mismatch, valid(0xf2f52010) - read(0x0)
[ 567.745875] F2FS-fs (md0): Can't find valid F2FS filesystem in 2th superblock
[ 580.057185] F2FS-fs (md0): Magic Mismatch, valid(0xf2f52010) - read(0x0)
[ 580.057222] F2FS-fs (md0): Can't find valid F2FS filesystem in 1th superblock
[ 580.057399] F2FS-fs (md0): Magic Mismatch, valid(0xf2f52010) - read(0x0)
[ 580.057416] F2FS-fs (md0): Can't find valid F2FS filesystem in 2th superblock
[ 1246.861540] perf: interrupt took too long (2535 > 2500), lowering kernel.perf_event_max_sample_rate to 78750
And towards the end got the same errors as earlier, looks like a card just restarts a bunch of drives—and this time two just went AWOL.
Going to try a different fs since mdadm raid0 and ext4 seems to not be happy. Might also try a smaller array.
After a reboot, mdadm reports the array is clean
again. So something with writing the superblock across the 60 drives fails.
Trying to create the fs with lazy init... slower IO might help?
$ time sudo mkfs.ext4 -m 0 /dev/md0
Nope. Same error at same point. Going to maybe switch gears to ZFS too... we'll see.
I reset the array using:
$ sudo nano /etc/mdadm/mdadm.conf
$ sudo wipefs --all --force /dev/md0
$ sudo mdadm --stop /dev/md0
$ sudo mdadm --zero-superblock /dev/sda /dev/sdae /dev/sdaj /dev/sdao /dev/sdat /dev/sday /dev/sdbc /dev/sdbh /dev/sdg /dev/sdl /dev/sdq /dev/sdv /dev/sdaa /dev/sdaf /dev/sdak /dev/sdap /dev/sdau /dev/sdaz /dev/sdbd /dev/sdc /dev/sdh /dev/sdm /dev/sdr /dev/sdw /dev/sdab /dev/sdag /dev/sdal /dev/sdaq /dev/sdav /dev/sdb /dev/sdbe /dev/sdd /dev/sdi /dev/sdn /dev/sds /dev/sdx /dev/sdac /dev/sdah /dev/sdam /dev/sdar /dev/sdaw /dev/sdba /dev/sdbf /dev/sde /dev/sdj /dev/sdo /dev/sdt /dev/sdy /dev/sdad /dev/sdai /dev/sdan /dev/sdas /dev/sdax /dev/sdbb /dev/sdbg /dev/sdf /dev/sdk /dev/sdp /dev/sdu /dev/sdz
# Then delete the partition data.
$ for i in `ls /dev/sd*[a-z]`; do sudo wipefs --all --force $i; done
ZFS won't install clean on my custom kernel cleanly, so I'm going to try Btrfs.
# Install BTRFS utilities.
$ sudo apt install btrfs-progs
# Create a RAID-0 Btrfs volume mounted at /btrfs.
$ sudo mkdir /btrfs
$ sudo mkfs.btrfs -L btrfs -d raid0 -m raid0 -f /dev/sda /dev/sdae /dev/sdaj /dev/sdao /dev/sdat /dev/sday /dev/sdbc /dev/sdbh /dev/sdg /dev/sdl /dev/sdq /dev/sdv /dev/sdaa /dev/sdaf /dev/sdak /dev/sdap /dev/sdau /dev/sdaz /dev/sdbd /dev/sdc /dev/sdh /dev/sdm /dev/sdr /dev/sdw /dev/sdab /dev/sdag /dev/sdal /dev/sdaq /dev/sdav /dev/sdb /dev/sdbe /dev/sdd /dev/sdi /dev/sdn /dev/sds /dev/sdx /dev/sdac /dev/sdah /dev/sdam /dev/sdar /dev/sdaw /dev/sdba /dev/sdbf /dev/sde /dev/sdj /dev/sdo /dev/sdt /dev/sdy /dev/sdad /dev/sdai /dev/sdan /dev/sdas /dev/sdax /dev/sdbb /dev/sdbg /dev/sdf /dev/sdk /dev/sdp /dev/sdu /dev/sdz
btrfs-progs v5.10.1
See http://btrfs.wiki.kernel.org for more information.
Label: btrfs
UUID: 82bc04c9-f954-4fe5-a2d5-f8f56021c904
Node size: 16384
Sector size: 4096
Filesystem size: 1.07PiB
Block group profiles:
Data: RAID0 10.00GiB
Metadata: RAID0 1.88GiB
System: RAID0 58.00MiB
SSD detected: no
Incompat features: extref, skinny-metadata
Runtime features:
Checksum: crc32c
Number of devices: 60
Devices:
ID SIZE PATH
1 18.19TiB /dev/sda
2 18.19TiB /dev/sdae
3 18.19TiB /dev/sdaj
4 18.19TiB /dev/sdao
5 18.19TiB /dev/sdat
6 18.19TiB /dev/sday
7 18.19TiB /dev/sdbc
8 18.19TiB /dev/sdbh
9 18.19TiB /dev/sdg
10 18.19TiB /dev/sdl
11 18.19TiB /dev/sdq
12 18.19TiB /dev/sdv
13 18.19TiB /dev/sdaa
14 18.19TiB /dev/sdaf
15 18.19TiB /dev/sdak
16 18.19TiB /dev/sdap
17 18.19TiB /dev/sdau
18 18.19TiB /dev/sdaz
19 18.19TiB /dev/sdbd
20 18.19TiB /dev/sdc
21 18.19TiB /dev/sdh
22 18.19TiB /dev/sdm
23 18.19TiB /dev/sdr
24 18.19TiB /dev/sdw
25 18.19TiB /dev/sdab
26 18.19TiB /dev/sdag
27 18.19TiB /dev/sdal
28 18.19TiB /dev/sdaq
29 18.19TiB /dev/sdav
30 18.19TiB /dev/sdb
31 18.19TiB /dev/sdbe
32 18.19TiB /dev/sdd
33 18.19TiB /dev/sdi
34 18.19TiB /dev/sdn
35 18.19TiB /dev/sds
36 18.19TiB /dev/sdx
37 18.19TiB /dev/sdac
38 18.19TiB /dev/sdah
39 18.19TiB /dev/sdam
40 18.19TiB /dev/sdar
41 18.19TiB /dev/sdaw
42 18.19TiB /dev/sdba
43 18.19TiB /dev/sdbf
44 18.19TiB /dev/sde
45 18.19TiB /dev/sdj
46 18.19TiB /dev/sdo
47 18.19TiB /dev/sdt
48 18.19TiB /dev/sdy
49 18.19TiB /dev/sdad
50 18.19TiB /dev/sdai
51 18.19TiB /dev/sdan
52 18.19TiB /dev/sdas
53 18.19TiB /dev/sdax
54 18.19TiB /dev/sdbb
55 18.19TiB /dev/sdbg
56 18.19TiB /dev/sdf
57 18.19TiB /dev/sdk
58 18.19TiB /dev/sdp
59 18.19TiB /dev/sdu
60 18.19TiB /dev/sdz
Time to reset the counter on your shirt. Just rebuild the kernel one more time so we can try zfs 🥰
$ sudo btrfs filesystem show
Label: 'btrfs' uuid: 82bc04c9-f954-4fe5-a2d5-f8f56021c904
Total devices 60 FS bytes used 128.00KiB
devid 1 size 18.19TiB used 202.62MiB path /dev/sda
devid 2 size 18.19TiB used 202.62MiB path /dev/sdae
devid 3 size 18.19TiB used 203.62MiB path /dev/sdaj
devid 4 size 18.19TiB used 203.62MiB path /dev/sdao
...
And in dmesg:
[ 2847.425438] perf: interrupt took too long (4963 > 4912), lowering kernel.perf_event_max_sample_rate to 40250
[ 2944.666513] raid6: neonx8 gen() 3644 MB/s
[ 2944.734484] raid6: neonx8 xor() 2657 MB/s
[ 2944.802487] raid6: neonx4 gen() 3947 MB/s
[ 2944.870533] raid6: neonx4 xor() 2761 MB/s
[ 2944.938525] raid6: neonx2 gen() 3421 MB/s
[ 2945.006486] raid6: neonx2 xor() 2529 MB/s
[ 2945.074487] raid6: neonx1 gen() 2716 MB/s
[ 2945.142501] raid6: neonx1 xor() 2039 MB/s
[ 2945.210488] raid6: int64x8 gen() 2585 MB/s
[ 2945.278492] raid6: int64x8 xor() 1471 MB/s
[ 2945.346499] raid6: int64x4 gen() 2560 MB/s
[ 2945.414488] raid6: int64x4 xor() 1484 MB/s
[ 2945.482493] raid6: int64x2 gen() 2349 MB/s
[ 2945.550496] raid6: int64x2 xor() 1314 MB/s
[ 2945.618498] raid6: int64x1 gen() 1816 MB/s
[ 2945.686506] raid6: int64x1 xor() 979 MB/s
[ 2945.686524] raid6: using algorithm neonx4 gen() 3947 MB/s
[ 2945.686529] raid6: .... xor() 2761 MB/s, rmw enabled
[ 2945.686534] raid6: using neon recovery algorithm
[ 2945.707259] xor: measuring software checksum speed
[ 2945.708831] 8regs : 6352 MB/sec
[ 2945.710185] 32regs : 7318 MB/sec
[ 2945.714241] arm64_neon : 3072 MB/sec
[ 2945.714260] xor: using function: 32regs (7318 MB/sec)
[ 2945.875144] Btrfs loaded, crc32c=crc32c-generic, zoned=no, fsverity=no
[ 2945.880380] BTRFS: device label btrfs devid 1 transid 5 /dev/sda scanned by systemd-udevd (24578)
[ 2945.887736] BTRFS: device label btrfs devid 6 transid 5 /dev/sday scanned by systemd-udevd (24593)
[ 2945.914456] BTRFS: device label btrfs devid 5 transid 5 /dev/sdat scanned by systemd-udevd (24590)
...
And to mount:
$ sudo mount /dev/sda /btrfs
$ sudo btrfs filesystem usage /btrfs
Overall:
Device size: 1.07PiB
Device allocated: 11.93GiB
Device unallocated: 1.07PiB
Device missing: 0.00B
Used: 128.00KiB
Free (estimated): 1.07PiB (min: 1.07PiB)
Free (statfs, df): 1.07PiB
Data ratio: 1.00
Metadata ratio: 1.00
Global reserve: 3.25MiB (used: 0.00B)
Multiple profiles: no
Data,RAID0: Size:10.00GiB, Used:0.00B (0.00%)
/dev/sda 170.62MiB
/dev/sdae 170.62MiB
...
Metadata,RAID0: Size:1.88GiB, Used:112.00KiB (0.01%)
/dev/sda 32.00MiB
/dev/sdae 32.00MiB
...
System,RAID0: Size:58.00MiB, Used:16.00KiB (0.03%)
/dev/sdaj 1.00MiB
/dev/sdao 1.00MiB
...
Unallocated:
/dev/sda 18.19TiB
/dev/sdae 18.19TiB
...
Quick disk-benchmark.sh result for btrfs RAID 0:
Benchmark | Result |
---|---|
fio 1M sequential read | 213 MB/s |
iozone 1M random read | 144.82 MB/s |
iozone 1M random write | 233.90 MB/s |
iozone 4K random read | 19.45 MB/s |
iozone 4K random write | 15.92 MB/s |
Testing network copy performance:
# Install Samba.
$ sudo apt install -y samba samba-common-bin
$ sudo mkdir /btrfs/shared
$ sudo chmod -R 777 /btrfs/shared
$ sudo nano /etc/samba/smbd.conf
[shared]
path=/btrfs/shared
writeable=Yes
create mask=0777
directory mask=0777
public=yes
$ sudo systemctl restart smbd
So... I started a 70 GB copy of a ton of video files for my current project to the btrfs RAID 0 array, and it kicked off going from 100-119 MB/sec, but after a couple minutes, got a lot slower (30 MB/sec). Then it started stalling out, and after a while Finder threw an error.
Over on the server side, I found this in dmesg: https://gist.github.com/geerlingguy/90a25813dfcdc26c1d4ab503bd7550d4
And if I check the btrfs filesystem status I see:
$ sudo btrfs filesystem show
Label: 'btrfs' uuid: 82bc04c9-f954-4fe5-a2d5-f8f56021c904
Total devices 60 FS bytes used 138.09MiB
devid 2 size 18.19TiB used 202.62MiB path /dev/sdae
devid 3 size 18.19TiB used 203.62MiB path /dev/sdaj
devid 4 size 18.19TiB used 203.62MiB path /dev/sdao
devid 6 size 18.19TiB used 203.62MiB path /dev/sday
devid 7 size 18.19TiB used 203.62MiB path /dev/sdbc
devid 8 size 18.19TiB used 203.62MiB path /dev/sdbh
devid 9 size 18.19TiB used 203.62MiB path /dev/sdg
devid 10 size 18.19TiB used 203.62MiB path /dev/sdl
devid 11 size 18.19TiB used 203.62MiB path /dev/sdq
devid 12 size 18.19TiB used 203.62MiB path /dev/sdv
devid 13 size 18.19TiB used 203.62MiB path /dev/sdaa
devid 14 size 18.19TiB used 203.62MiB path /dev/sdaf
devid 15 size 18.19TiB used 203.62MiB path /dev/sdak
devid 16 size 18.19TiB used 203.62MiB path /dev/sdap
devid 18 size 18.19TiB used 203.62MiB path /dev/sdaz
devid 19 size 18.19TiB used 203.62MiB path /dev/sdbd
devid 20 size 18.19TiB used 203.62MiB path /dev/sdc
devid 21 size 18.19TiB used 203.62MiB path /dev/sdh
devid 22 size 18.19TiB used 203.62MiB path /dev/sdm
devid 23 size 18.19TiB used 203.62MiB path /dev/sdr
devid 24 size 18.19TiB used 203.62MiB path /dev/sdw
devid 26 size 18.19TiB used 203.62MiB path /dev/sdag
devid 27 size 18.19TiB used 203.62MiB path /dev/sdal
devid 28 size 18.19TiB used 203.62MiB path /dev/sdaq
devid 29 size 18.19TiB used 203.62MiB path /dev/sdav
devid 30 size 18.19TiB used 203.62MiB path /dev/sdb
devid 31 size 18.19TiB used 203.62MiB path /dev/sdbe
devid 32 size 18.19TiB used 203.62MiB path /dev/sdd
devid 33 size 18.19TiB used 203.62MiB path /dev/sdi
devid 34 size 18.19TiB used 203.62MiB path /dev/sdn
devid 35 size 18.19TiB used 203.62MiB path /dev/sds
devid 36 size 18.19TiB used 203.62MiB path /dev/sdx
devid 37 size 18.19TiB used 203.62MiB path /dev/sdac
devid 38 size 18.19TiB used 203.62MiB path /dev/sdah
devid 39 size 18.19TiB used 203.62MiB path /dev/sdam
devid 40 size 18.19TiB used 203.62MiB path /dev/sdar
devid 41 size 18.19TiB used 203.62MiB path /dev/sdaw
devid 42 size 18.19TiB used 203.62MiB path /dev/sdba
devid 43 size 18.19TiB used 203.62MiB path /dev/sdbf
devid 44 size 18.19TiB used 203.62MiB path /dev/sde
devid 45 size 18.19TiB used 203.62MiB path /dev/sdj
devid 47 size 18.19TiB used 203.62MiB path /dev/sdt
devid 48 size 18.19TiB used 203.62MiB path /dev/sdy
devid 49 size 18.19TiB used 203.62MiB path /dev/sdad
devid 50 size 18.19TiB used 203.62MiB path /dev/sdai
devid 51 size 18.19TiB used 203.62MiB path /dev/sdan
devid 52 size 18.19TiB used 203.62MiB path /dev/sdas
devid 53 size 18.19TiB used 203.62MiB path /dev/sdax
devid 54 size 18.19TiB used 203.62MiB path /dev/sdbb
devid 55 size 18.19TiB used 203.62MiB path /dev/sdbg
devid 56 size 18.19TiB used 203.62MiB path /dev/sdf
devid 57 size 18.19TiB used 203.62MiB path /dev/sdk
devid 58 size 18.19TiB used 203.62MiB path /dev/sdp
devid 59 size 18.19TiB used 203.62MiB path /dev/sdu
devid 60 size 18.19TiB used 203.62MiB path /dev/sdz
*** Some devices missing
So it looks like a similar error, where HBAs just kinda jump offline, and not all drives come back. Something weird like that.
After reboot, the filesystem was intact.
I did a small file copy over samba (< 100 MB), and it copied almost instantly, and worked. Then I did a larger file (400 MB), and it failed in the same way, but this time I was watching dmesg, and these are the initial failures when it seems the HBAs or driver gets overloaded:
[ 278.151884] mpt3sas_cm1 fault info from func: mpt3sas_base_make_ioc_ready
[ 278.151904] mpt3sas_cm1: fault_state(0x2623)!
[ 278.151911] mpt3sas_cm1: sending diag reset !!
[ 278.727834] mpt3sas_cm3 fault info from func: mpt3sas_base_make_ioc_ready
[ 278.727851] mpt3sas_cm3: fault_state(0x2623)!
[ 278.727858] mpt3sas_cm3: sending diag reset !!
[ 278.999377] mpt3sas_cm1: diag reset: SUCCESS
[ 279.062419] mpt3sas_cm1: CurrentHostPageSize is 0: Setting default host page size to 4k
[ 279.181737] mpt3sas_cm1: _base_display_fwpkg_version: complete
[ 279.181753] mpt3sas_cm1: FW Package Ver(05.00.00.00)
[ 279.181892] mpt3sas_cm1: TimeSync Interval in Manuf page-11 is not enabled. Periodic Time-Sync will be disabled
[ 279.182350] mpt3sas_cm1: SAS3616: FWVersion(05.00.00.00), ChipRevision(0x02), BiosVersion(09.09.00.00)
[ 279.182359] NVMe
[ 279.182363] mpt3sas_cm1: Protocol=(Initiator,Target), Capabilities=(TLR,EEDP,Diag Trace Buffer,Task Set Full,NCQ)
[ 279.182477] mpt3sas_cm1: log_info(0x300301e0): originator(IOP), code(0x03), sub_code(0x01e0)
[ 279.182511] mpt3sas_cm1: log_info(0x300301e0): originator(IOP), code(0x03), sub_code(0x01e0)
[ 279.182544] mpt3sas_cm1: log_info(0x300301e0): originator(IOP), code(0x03), sub_code(0x01e0)
[ 279.182576] mpt3sas_cm1: log_info(0x300301e0): originator(IOP), code(0x03), sub_code(0x01e0)
[ 279.182615] mpt3sas_cm1: sending port enable !!
[ 279.577425] mpt3sas_cm3: diag reset: SUCCESS
[ 279.640569] mpt3sas_cm3: CurrentHostPageSize is 0: Setting default host page size to 4k
[ 279.760314] mpt3sas_cm3: _base_display_fwpkg_version: complete
[ 279.760331] mpt3sas_cm3: FW Package Ver(05.00.00.00)
[ 279.760470] mpt3sas_cm3: TimeSync Interval in Manuf page-11 is not enabled. Periodic Time-Sync will be disabled
[ 279.760926] mpt3sas_cm3: SAS3616: FWVersion(05.00.00.00), ChipRevision(0x02), BiosVersion(00.00.00.00)
[ 279.760935] NVMe
[ 279.760939] mpt3sas_cm3: Protocol=(Initiator,Target), Capabilities=(TLR,EEDP,Diag Trace Buffer,Task Set Full,NCQ)
[ 279.761052] mpt3sas_cm3: log_info(0x300301e0): originator(IOP), code(0x03), sub_code(0x01e0)
[ 279.761085] mpt3sas_cm3: log_info(0x300301e0): originator(IOP), code(0x03), sub_code(0x01e0)
[ 279.761118] mpt3sas_cm3: log_info(0x300301e0): originator(IOP), code(0x03), sub_code(0x01e0)
[ 279.761151] mpt3sas_cm3: log_info(0x300301e0): originator(IOP), code(0x03), sub_code(0x01e0)
[ 279.761189] mpt3sas_cm3: sending port enable !!
[ 302.003185] mpt3sas_cm1: port enable: SUCCESS
...
At that point, the cards seem to re-initialize. And what's really interesting is this time, the array recovered before the Finder file copy timed out, and so the copy finished successfully. And copying the 400 MB file back had no problem either. So it definitely seems to be a stability issue when writing bytes out to all 60 drives at once, continuously. CPU load maxes out at least one core whenever writes are saturating the bus.
I think my next plan is to do a JBOD-style array, so have the entire array set up sequentially (no RAID 0 striping), and see if writing in that manner is more efficient / less error-prone.
Deleted the existing btrfs array with:
$ sudo systemctl stop smbd
$ sudo umount /btrfs
$ sudo wipefs --all -t btrfs /dev/sda /dev/sdae ...
And creating a new one with single
for JBOD-style:
$ sudo mkfs.btrfs -L btrfs -d raid0 -m raid0 -f /dev/sda /dev/sdae /dev/sdaj /dev/sdao /dev/sdat /dev/sday /dev/sdbc /dev/sdbh /dev/sdg /dev/sdl /dev/sdq /dev/sdv /dev/sdaa /dev/sdaf /dev/sdak /dev/sdap /dev/sdau /dev/sdaz /dev/sdbd /dev/sdc /dev/sdh /dev/sdm /dev/sdr /dev/sdw /dev/sdab /dev/sdag /dev/sdal /dev/sdaq /dev/sdav /dev/sdb /dev/sdbe /dev/sdd /dev/sdi /dev/sdn /dev/sds /dev/sdx /dev/sdac /dev/sdah /dev/sdam /dev/sdar /dev/sdaw /dev/sdba /dev/sdbf /dev/sde /dev/sdj /dev/sdo /dev/sdt /dev/sdy /dev/sdad /dev/sdai /dev/sdan /dev/sdas /dev/sdax /dev/sdbb /dev/sdbg /dev/sdf /dev/sdk /dev/sdp /dev/sdu /dev/sdz
$ sudo btrfs filesystem show
Label: 'btrfs' uuid: b4a4b388-a948-4ca3-928c-469296e79e50
Total devices 60 FS bytes used 128.00KiB
devid 1 size 18.19TiB used 202.62MiB path /dev/sda
devid 2 size 18.19TiB used 202.62MiB path /dev/sdae
...
$ sudo mount /dev/sda /btrfs
$ sudo mkdir /btrfs/shared
$ sudo chmod 777 /btrfs/shared
$ sudo systemctl start smbd
Quick disk-benchmark.sh result for Btrfs 'single':
Benchmark | Result |
---|---|
fio 1M sequential read | 211 MB/s |
iozone 1M random read | 146.61 MB/s |
iozone 1M random write | 274.35 MB/s |
iozone 4K random read | 20.22 MB/s |
iozone 4K random write | 16.24 MB/s |
And after some more messing around, I am able to reliably get one of the cards (sometimes two) to do that cycle with heavy write activity.
And usually one or two drives doesn't reappear until after a full reboot of the system (this time it was just sdr
).
$ sudo wipefs --all -t btrfs /dev/sda /dev/sdae /dev/sdaj /dev/sdao /dev/sdat /dev/sday /dev/sdbc /dev/sdbh /dev/sdg /dev/sdl /dev/sdq /dev/sdv /dev/sdaa /dev/sdaf /dev/sdak /dev/sdap /dev/sdau /dev/sdaz /dev/sdbd /dev/sdc /dev/sdh /dev/sdm /dev/sdr /dev/sdw /dev/sdab /dev/sdag /dev/sdal /dev/sdaq /dev/sdav /dev/sdb /dev/sdbe /dev/sdd /dev/sdi /dev/sdn /dev/sds /dev/sdx /dev/sdac /dev/sdah /dev/sdam /dev/sdar /dev/sdaw /dev/sdba /dev/sdbf /dev/sde /dev/sdj /dev/sdo /dev/sdt /dev/sdy /dev/sdad /dev/sdai /dev/sdan /dev/sdas /dev/sdax /dev/sdbb /dev/sdbg /dev/sdf /dev/sdk /dev/sdp /dev/sdu /dev/sdz
wipefs: error: /dev/sdr: probing initialization failed: No such file or directory
Going to try an mdadm
linear
array next...
$ sudo mdadm --create --verbose /dev/md0 --level=linear --raid-devices=60 /dev/sda ...
It's nice to be able to more easily see the speed it's building the ext4 filesystem on the linear array—I can see each disk getting written to via atop
sequentially, while 'Writing inode tables' is going on. And hopefully since it's just writing to one disk after the other, whatever condition is triggering the card reset won't happen.
The LSI 9405W-16i HBA Should be similar to the 9460-16i, and should hopefully be supported on ARM (to some extent) unlike older cards like the 9305-16i (see #195). (Adding the term 9405 so this will also pop up in search.)