geerlingguy / raspberry-pi-pcie-devices

Raspberry Pi PCI Express device compatibility database
http://pipci.jeffgeerling.com
GNU General Public License v3.0
1.52k stars 135 forks source link

Test LSI 9405W-16i SAS/NVMe HBA #196

Closed geerlingguy closed 2 years ago

geerlingguy commented 2 years ago

The LSI 9405W-16i HBA Should be similar to the 9460-16i, and should hopefully be supported on ARM (to some extent) unlike older cards like the 9305-16i (see #195). (Adding the term 9405 so this will also pop up in search.)

9405w-16i_hba_angle

geerlingguy commented 2 years ago

Supposedly it's working on 32-bit Pi OS but not 64-bit. More to come.

joshuaboud commented 2 years ago

lspci

Output of sudo lspci -vvv on a custom 32-Bit Kernel:

01:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS3616 Fusion-MPT Tri-Mode I/O Controller Chip (IOC) (rev 02)
        Subsystem: LSI Logic / Symbios Logic SAS3616 Fusion-MPT Tri-Mode I/O Controller Chip (IOC)
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 64
        Region 0: Memory at 600200000 (64-bit, prefetchable) [size=1M]
        Region 2: Memory at 600300000 (64-bit, prefetchable) [size=1M]
        Region 4: Memory at 600000000 (32-bit, non-prefetchable) [size=1M]
        Region 5: I/O ports at <unassigned> [disabled]
        [virtual] Expansion ROM at 600100000 [disabled] [size=1M]
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
                Address: 0000000000000000  Data: 0000
                Masking: 00000000  Pending: 00000000
        Capabilities: [70] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 1024 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
                LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM L0s L1, Exit Latency L0s unlimited, L1 <64us
                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
                LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [b0] MSI-X: Enable+ Count=128 Masked-
                Vector table: BAR=0 offset=00002000
                PBA: BAR=0 offset=00003000
        Capabilities: [100 v2] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
        Capabilities: [148 v1] Power Budgeting <?>
        Capabilities: [158 v1] Alternative Routing-ID Interpretation (ARI)
                ARICap: MFVC- ACS-, Next Function: 0
                ARICtl: MFVC- ACS-, Function Group: 0
        Capabilities: [168 v1] #19
        Capabilities: [264 v1] #16
        Capabilities: [294 v1] Vendor Specific Information: ID=0002 Rev=2 Len=100 <?>
        Capabilities: [394 v1] Vendor Specific Information: ID=0001 Rev=1 Len=038 <?>
        Capabilities: [3cc v0] Virtual Channel
                Caps:   LPEVC=1 RefClk=100ns PATEntryBits=1
                Arb:    Fixed- WRR32- WRR64- WRR128- ??6+
                Ctrl:   ArbSelect=Fixed
                Status: InProgress-
                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed- WRR32+ WRR64- WRR128- TWRR128- WRR256-
                        Ctrl:   Enable- ID=0 ArbSelect=Fixed TC/VC=00
                        Status: NegoPending- InProgress-
        Kernel driver in use: mpt3sas

Kernel

Our custom kernel

Repo for custom kernel: https://github.com/45Drives/linux Steps for building:

errors: No known data errors

joshuaboud commented 2 years ago

As for 64-bit usage, the kernel was built the same way as above, but use ./build-cm4.sh 64. This kernel works fine with nothing plugged into the PCIe slot, but when the 9405W is plugged in, there is a kernel panic at boot. Here is a screenshot of said kernel panic: cm4_64-bit_kernel_panic The error seems to happen in pci_generic_config_read(), seemingly part of the PCI driver, not the mpt3sas driver.

geerlingguy commented 2 years ago

@joshuaboud - I seem to remember this bit of code leading to that crash: https://github.com/raspberrypi/linux/blob/2697f7403187bb2bb61cc716f33ee9f6cfb9af7c/drivers/scsi/megaraid/megaraid_sas_fusion.c#L262-L265

Can you try the following patch and see if that helps on 64-bit?

diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c b/drivers/scsi/megaraid/megaraid_sas_fusion.c
index b0c01cf0428f..c4accee42e84 100644
--- a/drivers/scsi/megaraid/megaraid_sas_fusion.c
+++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c
@@ -259,7 +259,8 @@ static void
 megasas_write_64bit_req_desc(struct megasas_instance *instance,
        union MEGASAS_REQUEST_DESCRIPTOR_UNION *req_desc)
 {
-#if defined(writeq) && defined(CONFIG_64BIT)
+//#if defined(writeq) && defined(CONFIG_64BIT)
+#if 0
    u64 req_data = (((u64)le32_to_cpu(req_desc->u.high) << 32) |
        le32_to_cpu(req_desc->u.low));
    writeq(req_data, &instance->reg_set->inbound_low_queue_port);

See: https://github.com/raspberrypi/linux/issues/4158

joshuaboud commented 2 years ago

That specific patch won't do much as the megaraid driver isn't being compiled for my kernel, however - there is a very similar chunk of code in the mpt3sas driver: drivers/scsi/mpt3sas/mpt3sas_base.c starting at line 5809:

/**
 * _base_writeq - 64 bit write to MMIO
 * @ioc: per adapter object
 * @b: data payload
 * @addr: address in MMIO space
 * @writeq_lock: spin lock
 *
 * Glue for handling an atomic 64 bit word to MMIO. This special handling takes
 * care of 32 bit environment where its not quarenteed to send the entire word
 * in one transfer.
 */
#if defined(writeq) && defined(CONFIG_64BIT)
static inline void
_base_writeq(__u64 b, volatile void __iomem *addr, spinlock_t *writeq_lock)
{
    writeq(b, addr);
}
#else
static inline void
_base_writeq(__u64 b, volatile void __iomem *addr, spinlock_t *writeq_lock)
{
    unsigned long flags;
    __u64 data_out = b;

    spin_lock_irqsave(writeq_lock, flags);
    writel((u32)(data_out), addr);
    writel((u32)(data_out >> 32), (addr + 4));
    spin_unlock_irqrestore(writeq_lock, flags);
}
#endif

I suppose if I apply the same change here, disabling writeq(), it may fix the issue. I will report back after trying.

geerlingguy commented 2 years ago

Ah yes, sorry was looking at what I was working on for the other card and forgot it was using a different module. Let me know if it helps!

joshuaboud commented 2 years ago

just tried out that fix with the patch:

@@ -5817,7 +5817,8 @@ _base_mpi_ep_writeq(__u64 b, volatile void __iomem *addr, spinlock_t *writeq_loc
  * care of 32 bit environment where its not quarenteed to send the entire word
  * in one transfer.
  */
-#if defined(writeq) && defined(CONFIG_64BIT)
+//#if defined(writeq) && defined(CONFIG_64BIT)
+#if 0
 static inline void
 _base_writeq(__u64 b, volatile void __iomem *addr, spinlock_t *writeq_lock)
 {

but unfortunately it did not fix the kernel panic issue.

I have been trying to track down any possible bugs in the kernel source code, but the second last function call, pci_bus_read_config_byte() does not seem to have a definition anywhere in the source. The function that actually causes the error, pci_generic_config_read(), could be having an issue with dereferencing pointers though. I am going to try to get the full dmesg output from the failed boot.

geerlingguy commented 2 years ago

@joshuaboud - That sounds eerily similar to some of the problems we've been encountering with AMD's GPU drivers in https://github.com/geerlingguy/raspberry-pi-pcie-devices/issues/4 — I wonder if the mpt3sas driver is missing some of the megaraid sas changes that were made for better ARM compatibility in general?

joshuaboud commented 2 years ago

LSI 9405W-16i Now Working on Custom 64-bit Kernel

The Fix

This is very strange. Removing quiet from /boot/cmdline.txt fixed the kernel panic issue with the Storinator JR custom 64-bit kernel.

Reason

We have no idea. Maybe there is a timing issue in the PCI driver that is solved by the slow down from printing console messages.

Use

lspci -vvv:

01:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS3616 Fusion-MPT Tri-Mode I/O Controller Chip (IOC) (rev 02)
        Subsystem: LSI Logic / Symbios Logic SAS3616 Fusion-MPT Tri-Mode I/O Controller Chip (IOC)
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 49
        Region 0: Memory at 600200000 (64-bit, prefetchable) [size=1M]
        Region 2: Memory at 600300000 (64-bit, prefetchable) [size=1M]
        Region 4: Memory at 600000000 (32-bit, non-prefetchable) [size=1M]
        Region 5: I/O ports at <unassigned> [disabled]
        [virtual] Expansion ROM at 600100000 [disabled] [size=1M]
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
                Address: 0000000000000000  Data: 0000
                Masking: 00000000  Pending: 00000000
        Capabilities: [70] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 1024 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
                LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM L0s L1, Exit Latency L0s unlimited, L1 <64us
                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
                LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [b0] MSI-X: Enable+ Count=128 Masked-
                Vector table: BAR=0 offset=00002000
                PBA: BAR=0 offset=00003000
        Capabilities: [100 v2] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
        Capabilities: [148 v1] Power Budgeting <?>
        Capabilities: [158 v1] Alternative Routing-ID Interpretation (ARI)
                ARICap: MFVC- ACS-, Next Function: 0
                ARICtl: MFVC- ACS-, Function Group: 0
        Capabilities: [168 v1] #19
        Capabilities: [264 v1] #16
        Capabilities: [294 v1] Vendor Specific Information: ID=0002 Rev=2 Len=100 <?>
        Capabilities: [394 v1] Vendor Specific Information: ID=0001 Rev=1 Len=038 <?>
        Capabilities: [3cc v0] Virtual Channel
                Caps:   LPEVC=1 RefClk=100ns PATEntryBits=1
                Arb:    Fixed- WRR32- WRR64- WRR128- ??6+
                Ctrl:   ArbSelect=Fixed
                Status: InProgress-
                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed- WRR32+ WRR64- WRR128- TWRR128- WRR256-
                        Ctrl:   Enable- ID=0 ArbSelect=Fixed TC/VC=00
                        Status: NegoPending- InProgress-
        Kernel driver in use: mpt3sas

lsblk:

NAME         MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda            8:0    0 232.9G  0 disk 
sdb            8:16   0 232.9G  0 disk 
sdc            8:32   0 232.9G  0 disk 
sdd            8:48   0 232.9G  0 disk 
sde            8:64   0 232.9G  0 disk 
sdf            8:80   0 232.9G  0 disk 
sdg            8:96   0 232.9G  0 disk 
sdh            8:112  0 232.9G  0 disk 
sdi            8:128  0 232.9G  0 disk 
sdj            8:144  0 232.9G  0 disk 
mmcblk0      179:0    0  29.1G  0 disk 
├─mmcblk0p1  179:1    0   256M  0 part /boot
└─mmcblk0p2  179:2    0  28.9G  0 part /
mmcblk0boot0 179:32   0     4M  1 disk 
mmcblk0boot1 179:64   0     4M  1 disk
geerlingguy commented 2 years ago

Reason

We have no idea.

🤣 love it

wallentx commented 2 years ago

Oooooo this is about to get interesting.. I have the following:

Using the delicious breadcrumbs left by you wonderful folks in this GH issue, I'll be trying to get one of these working. Is there any interest in seeing my results shared somewhere? (AKA - where can I ask for help when I fail)

geerlingguy commented 2 years ago

@wallentx - According to the Broadcom engineer I spoke with, none of the 93xx series cards will work on the Pi because drivers don't support ARM for that generation. Only 94xx/95xx and newer cards should be able to work (so your SAS 9440-8i has a fighting chance!).

wallentx commented 2 years ago

@geerlingguy My SAS 9440-8i shows up in lspci just out of the box. No mpt3sas modules available to be loaded though. I built the kernel, but I'm a little lost with the way the cm4 bootloader works.. Are you guys just renaming your compiled kernel image to kernel8.img and overwriting? I saw somewhere on another writeup about setting kernel=<mykernel.img> in /boot/config.txt, but that just got me stuck at the rainbow screen.

geerlingguy commented 2 years ago

@wallentx - Here's the exact process I follow: https://github.com/geerlingguy/raspberry-pi-pcie-devices/tree/master/extras/cross-compile

Note that there are a number of ways you can stick multiple kernels on the Pi and switch between them, but in my case, since I normally nuke the microSD card multiple times per day, I just overwrite the kernel in place (following the steps in the guide above).

wallentx commented 2 years ago

@geerlingguy if I'm sharing more about my 9440-8i, should I put this in a separate issue, or are individual issues intended to be sort of exclusive for your own testing/tracking? If a new issue is needed, I'll edit this comment and migrate the details elsewhere.

Moved - https://github.com/geerlingguy/raspberry-pi-pcie-devices/issues/250

geerlingguy commented 2 years ago

@wallentx - Can you open a new/separate issue for it (since it's a different model)?

geerlingguy commented 2 years ago

I'll be doing a little more testing on one of these, now that I have it in my possession.

geerlingguy commented 2 years ago

That's a lotta PCIe:

Screen Shot 2022-04-27 at 4 50 05 PM
geerlingguy commented 2 years ago

All right, so testing a bringup on a fork of the rpi-5.15.y branch:

  1. I cloned Linux, checked out a new branch.
  2. I ran menuconfig and enabled mpt3sas (option "LSI MPT Fusion SAS 3.0 & SAS 2.0 Device Driver"—see below)
  3. I patched the drivers/scsi/mpt3sas/mpt3sas_base.c file with @joshuaboud's patch from this comment above.
  4. I am cross-compiling the kernel and copying it over to the Pi.

The mpt3sas option is under:

-> Device Drivers
  -> SCSI device support
    -> SCSI low-level drivers (SCSI_LOWLEVEL [=y])
      -> LSI MPT Fusion SAS 3.0 & SAS 2.0 Device Driver
geerlingguy commented 2 years ago

After a reboot I'm seeing:

0f:00.0 Serial Attached SCSI controller: Broadcom / LSI SAS3616 Fusion-MPT Tri-Mode I/O Controller Chip (IOC) (rev 02)
    Subsystem: Broadcom / LSI HBA 9405W-16e
    Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
    Interrupt: pin A routed to IRQ 255
    Region 0: Memory at 600f00000 (64-bit, prefetchable) [disabled] [size=1M]
    Region 2: Memory at 601000000 (64-bit, prefetchable) [disabled] [size=1M]
    Region 4: Memory at 600600000 (32-bit, non-prefetchable) [disabled] [size=1M]
    Region 5: I/O ports at <unassigned> [disabled]
    Expansion ROM at 600700000 [virtual] [disabled] [size=1M]
    Capabilities: <access denied>
    Kernel modules: mpt3sas

So the module's loaded at least. I'm going to have to stop for the evening and pick it back up later!

geerlingguy commented 2 years ago

Oh also, from dmesg:

[    7.332681] mpt3sas 0000:0c:00.0: enabling device (0000 -> 0002)
[    7.332756] mpt3sas_cm0: 63 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (3885552 kB)
...
[    7.476877] mpt3sas_cm0: CurrentHostPageSize is 0: Setting default host page size to 4k
[    7.476951] mpt3sas_cm0: MSI-X vectors supported: 128
[    7.476968]   no of cores: 4, max_msix_vectors: -1
[    7.476982] mpt3sas_cm0:  0 4 4
...
[    7.550816] mpt3sas_cm0: High IOPs queues : disabled
[    7.550857] mpt3sas0-msix0: PCI-MSI-X enabled: IRQ 64
[    7.550871] mpt3sas0-msix1: PCI-MSI-X enabled: IRQ 65
[    7.550883] mpt3sas0-msix2: PCI-MSI-X enabled: IRQ 66
[    7.550895] mpt3sas0-msix3: PCI-MSI-X enabled: IRQ 67
[    7.550905] mpt3sas_cm0: iomem(0x0000000600900000), mapped(0x000000008bfd810c), size(1048576)
[    7.550926] mpt3sas_cm0: ioport(0x0000000000000000), size(0)
[    7.604287] checking generic (3e3cf000 7f8000) vs hw (0 ffffffffffffffff)
...
[    7.770218] mpt3sas_cm0: CurrentHostPageSize is 0: Setting default host page size to 4k
[    7.800730] mpt3sas_cm0: scatter gather: sge_in_main_msg(1), sge_per_chain(7), sge_per_io(128), chains_per_io(19)
[    7.818705] mpt3sas_cm0: request pool(0x000000004d73eefe) - dma(0x41b600000): depth(7272), frame_size(128), pool_size(909 kB)
...
[   10.479253] mpt3sas_cm0: sense pool(0x00000000107cc924) - dma(0x41bc00000): depth(7059), element_size(96), pool_size (661 kB)
[   10.479279] mpt3sas_cm0: sense pool(0x00000000107cc924)- dma(0x41bc00000): depth(7059),element_size(96), pool_size(4 kB)
[   10.479719] mpt3sas_cm0: reply pool(0x000000008920f035) - dma(0x41bd00000): depth(7336), frame_size(128), pool_size(917 kB)
[   10.479835] mpt3sas_cm0: config page(0x00000000f394c332) - dma(0x44c765000): size(512)
[   10.479842] mpt3sas_cm0: Allocated physical memory: size(31210 kB)
[   10.479848] mpt3sas_cm0: Current Controller Queue Depth(7056),Max Controller Queue Depth(7168)
[   10.479853] mpt3sas_cm0: Scatter Gather Elements per IO(128)
[   10.599660] mpt3sas_cm0: _base_display_fwpkg_version: complete
[   10.599669] mpt3sas_cm0: FW Package Ver(05.00.00.00)
[   10.599812] mpt3sas_cm0: TimeSync Interval in Manuf page-11 is not enabled. Periodic Time-Sync will be disabled
[   10.600313] mpt3sas_cm0: SAS3616: FWVersion(05.00.00.00), ChipRevision(0x02), BiosVersion(00.00.00.00)
[   10.600322] NVMe
[   10.600325] mpt3sas_cm0: Protocol=(Initiator,Target), Capabilities=(TLR,EEDP,Diag Trace Buffer,Task Set Full,NCQ)
[   10.600447] mpt3sas_cm0: log_info(0x300301e0): originator(IOP), code(0x03), sub_code(0x01e0)
[   10.600477] mpt3sas_cm0: log_info(0x300301e0): originator(IOP), code(0x03), sub_code(0x01e0)
[   10.600618] mpt3sas 0000:0c:00.0: Max SCSIIO MPT commands: 7056 shared with nr_hw_queues = 4
[   10.600630] scsi host0: Fusion MPT SAS Host
[   10.626156] mpt3sas_cm0: sending port enable !!
[   13.477768] mpt3sas_cm0: hba_port entry: 000000008adcd575, port: 0 is added to hba_port list
[   13.479519] mpt3sas_cm0: hba_port entry: 00000000dde23a00, port: 8 is added to hba_port list
[   13.481680] mpt3sas_cm0: host_add: handle(0x0001), sas_addr(0x500605b00de7bd50), phys(17)
[   13.482181] mpt3sas_cm0: handle(0x11) sas_address(0x510600b00de7bd50) port_type(0x0)
[   13.483080] scsi 0:0:0:0: Enclosure         LSI      VirtualSES       01   PQ: 0 ANSI: 6
[   13.483099] scsi 0:0:0:0: set ignore_delay_remove for handle(0x0011)
[   13.483106] scsi 0:0:0:0: SES: handle(0x0011), sas_addr(0x510600b00de7bd50), phy(16), device_name(0x510600b00de7bd50)
[   13.483111] scsi 0:0:0:0: enclosure logical id (0x500605b00de7bd50), slot(16) 
[   13.483115] scsi 0:0:0:0: enclosure level(0x0000), connector name(     )
[   13.483121] scsi 0:0:0:0: qdepth(1), tagged(0), scsi_level(7), cmd_que(0)
[   13.483161] mpt3sas_cm0: log_info(0x31200206): originator(PL), code(0x20), sub_code(0x0206)
[   13.483816]  end_device-0:0: add: handle(0x0011), sas_addr(0x510600b00de7bd50)
...
[   18.728604] mpt3sas_cm0: port enable: SUCCESS
[   18.729364] pci 0000:0b:03.0: enabling device (0000 -> 0002)
[   18.729393] mpt3sas 0000:0d:00.0: enabling device (0000 -> 0002)
[   18.729437] mpt3sas_cm1: 63 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (3885552 kB)
[   18.787144] mpt3sas_cm1: CurrentHostPageSize is 0: Setting default host page size to 4k
[   18.787182] mpt3sas_cm1: MSI-X vectors supported: 128
[   18.787187]   no of cores: 4, max_msix_vectors: -1
[   18.787192] mpt3sas_cm1:  0 4 4
[   18.787558] mpt3sas_cm1: High IOPs queues : disabled
[   18.787565] mpt3sas1-msix0: PCI-MSI-X enabled: IRQ 69
[   18.787570] mpt3sas1-msix1: PCI-MSI-X enabled: IRQ 70
[   18.787575] mpt3sas1-msix2: PCI-MSI-X enabled: IRQ 71
[   18.787579] mpt3sas1-msix3: PCI-MSI-X enabled: IRQ 72
[   18.787583] mpt3sas_cm1: iomem(0x0000000600b00000), mapped(0x0000000032381adf), size(1048576)
[   18.787592] mpt3sas_cm1: ioport(0x0000000000000000), size(0)
[   18.845932] mpt3sas_cm1: CurrentHostPageSize is 0: Setting default host page size to 4k
[   18.873717] mpt3sas_cm1: scatter gather: sge_in_main_msg(1), sge_per_chain(7), sge_per_io(128), chains_per_io(19)
[   18.875031] mpt3sas_cm1: request pool(0x000000005b8761c2) - dma(0x41bf00000): depth(7272), frame_size(128), pool_size(909 kB)
[   20.161005] mpt3sas_cm1: sense pool(0x000000003af0b070) - dma(0x41c000000): depth(7059), element_size(96), pool_size (661 kB)
[   20.161029] mpt3sas_cm1: sense pool(0x000000003af0b070)- dma(0x41c000000): depth(7059),element_size(96), pool_size(4 kB)
[   20.161490] mpt3sas_cm1: reply pool(0x000000002c451284) - dma(0x41c100000): depth(7336), frame_size(128), pool_size(917 kB)
[   20.161613] mpt3sas_cm1: config page(0x0000000060a9007a) - dma(0x4524e0000): size(512)
[   20.161621] mpt3sas_cm1: Allocated physical memory: size(31210 kB)
[   20.161626] mpt3sas_cm1: Current Controller Queue Depth(7056),Max Controller Queue Depth(7168)
[   20.161631] mpt3sas_cm1: Scatter Gather Elements per IO(128)
[   20.281165] mpt3sas_cm1: _base_display_fwpkg_version: complete
[   20.281176] mpt3sas_cm1: FW Package Ver(05.00.00.00)
[   20.281321] mpt3sas_cm1: TimeSync Interval in Manuf page-11 is not enabled. Periodic Time-Sync will be disabled
[   20.281821] mpt3sas_cm1: SAS3616: FWVersion(05.00.00.00), ChipRevision(0x02), BiosVersion(09.09.00.00)
[   20.281830] NVMe
[   20.281833] mpt3sas_cm1: Protocol=(Initiator,Target), Capabilities=(TLR,EEDP,Diag Trace Buffer,Task Set Full,NCQ)
[   20.281955] mpt3sas_cm1: log_info(0x300301e0): originator(IOP), code(0x03), sub_code(0x01e0)
[   20.281984] mpt3sas_cm1: log_info(0x300301e0): originator(IOP), code(0x03), sub_code(0x01e0)
[   20.282026] mpt3sas 0000:0d:00.0: Max SCSIIO MPT commands: 7056 shared with nr_hw_queues = 4
[   20.282037] scsi host1: Fusion MPT SAS Host
[   20.314803] mpt3sas_cm1: sending port enable !!
[   22.684129] mpt3sas_cm1: hba_port entry: 00000000ea664476, port: 0 is added to hba_port list
[   22.688538] mpt3sas_cm1: hba_port entry: 00000000869770b5, port: 8 is added to hba_port list
[   22.693720] mpt3sas_cm1: host_add: handle(0x0001), sas_addr(0x500605b00de7bf50), phys(17)
[   22.694621] mpt3sas_cm1: handle(0x11) sas_address(0x510600b00de7bf50) port_type(0x0)
[   22.695294] mpt3sas_cm1: handle(0x20) sas_address(0x300605b00de7bf59) port_type(0x1)
[   28.412605] mpt3sas_cm1: port enable: SUCCESS
[   28.905761] scsi 1:0:0:0: Direct-Access     ATA      WDC WD5000AVDS-6 0A01 PQ: 0 ANSI: 6
[   28.905787] scsi 1:0:0:0: SATA: handle(0x0020), sas_addr(0x300605b00de7bf59), phy(9), device_name(0x0000000000000000)
[   28.905792] scsi 1:0:0:0: enclosure logical id (0x500605b00de7bf50), slot(0) 
[   28.905797] scsi 1:0:0:0: enclosure level(0x0000), connector name( C0  )
[   28.905875] scsi 1:0:0:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
[   28.905886] scsi 1:0:0:0: qdepth(128), tagged(1), scsi_level(7), cmd_que(1)
[   28.912046]  end_device-1:0: add: handle(0x0020), sas_addr(0x300605b00de7bf59)
[   28.912085] sd 1:0:0:0: Power-on or device reset occurred
[   28.914233] scsi 1:0:1:0: Enclosure         LSI      VirtualSES       01   PQ: 0 ANSI: 6
[   28.914256] scsi 1:0:1:0: set ignore_delay_remove for handle(0x0011)
[   28.914263] scsi 1:0:1:0: SES: handle(0x0011), sas_addr(0x510600b00de7bf50), phy(16), device_name(0x510600b00de7bf50)
[   28.914268] scsi 1:0:1:0: enclosure logical id (0x500605b00de7bf50), slot(16) 
[   28.914272] scsi 1:0:1:0: enclosure level(0x0000), connector name(     )
[   28.914278] scsi 1:0:1:0: qdepth(1), tagged(0), scsi_level(7), cmd_que(0)
[   28.914317] mpt3sas_cm1: log_info(0x31200206): originator(PL), code(0x20), sub_code(0x0206)
[   28.915018]  end_device-1:1: add: handle(0x0011), sas_addr(0x510600b00de7bf50)
[   28.915769] pci 0000:0b:05.0: enabling device (0000 -> 0002)
[   28.915800] mpt3sas 0000:0e:00.0: enabling device (0000 -> 0002)
[   28.915847] mpt3sas_cm2: 63 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (3885552 kB)
[   28.917936] sd 1:0:0:0: [sda] 975724592 512-byte logical blocks: (500 GB/465 GiB)
[   28.924717] sd 1:0:0:0: [sda] Write Protect is off
[   28.924729] sd 1:0:0:0: [sda] Mode Sense: 9b 00 10 08
[   28.926756] sd 1:0:0:0: [sda] Write cache: enabled, read cache: enabled, supports DPO and FUA
[   28.971834] mpt3sas_cm2: CurrentHostPageSize is 0: Setting default host page size to 4k
[   28.971872] mpt3sas_cm2: MSI-X vectors supported: 128
[   28.971879]   no of cores: 4, max_msix_vectors: -1
[   28.971884] mpt3sas_cm2:  0 4 4
[   28.972276] mpt3sas_cm2: High IOPs queues : disabled
[   28.972283] mpt3sas2-msix0: PCI-MSI-X enabled: IRQ 73
[   28.972288] mpt3sas2-msix1: PCI-MSI-X enabled: IRQ 74
[   28.972293] mpt3sas2-msix2: PCI-MSI-X enabled: IRQ 75
[   28.972297] mpt3sas2-msix3: PCI-MSI-X enabled: IRQ 76
[   28.972302] mpt3sas_cm2: iomem(0x0000000600d00000), mapped(0x000000009160f51c), size(1048576)
[   28.972310] mpt3sas_cm2: ioport(0x0000000000000000), size(0)
[   28.986402] sd 1:0:0:0: [sda] Attached SCSI disk
[   29.029063] mpt3sas_cm2: CurrentHostPageSize is 0: Setting default host page size to 4k
[   29.056799] mpt3sas_cm2: scatter gather: sge_in_main_msg(1), sge_per_chain(7), sge_per_io(128), chains_per_io(19)
[   29.058033] mpt3sas_cm2: request pool(0x0000000078770a9d) - dma(0x41c200000): depth(7272), frame_size(128), pool_size(909 kB)
[   30.211333] mpt3sas_cm2: sense pool(0x00000000fd83fac7) - dma(0x41c300000): depth(7059), element_size(96), pool_size (661 kB)
[   30.211360] mpt3sas_cm2: sense pool(0x00000000fd83fac7)- dma(0x41c300000): depth(7059),element_size(96), pool_size(4 kB)
[   30.212561] mpt3sas_cm2: reply pool(0x00000000c747b5de) - dma(0x41c400000): depth(7336), frame_size(128), pool_size(917 kB)
[   30.213000] mpt3sas_cm2: config page(0x00000000ca0f15e1) - dma(0x45652d000): size(512)
[   30.213010] mpt3sas_cm2: Allocated physical memory: size(31210 kB)
[   30.213015] mpt3sas_cm2: Current Controller Queue Depth(7056),Max Controller Queue Depth(7168)
[   30.213019] mpt3sas_cm2: Scatter Gather Elements per IO(128)
[   30.332717] mpt3sas_cm2: _base_display_fwpkg_version: complete
[   30.332725] mpt3sas_cm2: FW Package Ver(05.00.00.00)
[   30.332879] mpt3sas_cm2: TimeSync Interval in Manuf page-11 is not enabled. Periodic Time-Sync will be disabled
[   30.333378] mpt3sas_cm2: SAS3616: FWVersion(05.00.00.00), ChipRevision(0x02), BiosVersion(00.00.00.00)
[   30.333387] NVMe
[   30.333391] mpt3sas_cm2: Protocol=(Initiator,Target), Capabilities=(TLR,EEDP,Diag Trace Buffer,Task Set Full,NCQ)
[   30.333513] mpt3sas_cm2: log_info(0x300301e0): originator(IOP), code(0x03), sub_code(0x01e0)
[   30.333542] mpt3sas_cm2: log_info(0x300301e0): originator(IOP), code(0x03), sub_code(0x01e0)
[   30.333749] mpt3sas 0000:0e:00.0: Max SCSIIO MPT commands: 7056 shared with nr_hw_queues = 4
[   30.333760] scsi host2: Fusion MPT SAS Host
[   30.358905] mpt3sas_cm2: sending port enable !!
[   33.153649] mpt3sas_cm2: hba_port entry: 00000000296158ae, port: 0 is added to hba_port list
[   33.155900] mpt3sas_cm2: hba_port entry: 00000000cf3a1228, port: 8 is added to hba_port list
[   33.158714] mpt3sas_cm2: host_add: handle(0x0001), sas_addr(0x500605b00f3dfa80), phys(17)
[   33.159204] mpt3sas_cm2: handle(0x11) sas_address(0x510600b00f3dfa80) port_type(0x0)
[   33.160212] scsi 2:0:0:0: Enclosure         LSI      VirtualSES       01   PQ: 0 ANSI: 6
[   33.160233] scsi 2:0:0:0: set ignore_delay_remove for handle(0x0011)
[   33.160239] scsi 2:0:0:0: SES: handle(0x0011), sas_addr(0x510600b00f3dfa80), phy(16), device_name(0x510600b00f3dfa80)
[   33.160244] scsi 2:0:0:0: enclosure logical id (0x500605b00f3dfa80), slot(16) 
[   33.160248] scsi 2:0:0:0: enclosure level(0x0000), connector name(     )
[   33.160255] scsi 2:0:0:0: qdepth(1), tagged(0), scsi_level(7), cmd_que(0)
[   33.160294] mpt3sas_cm2: log_info(0x31200206): originator(PL), code(0x20), sub_code(0x0206)
[   33.161232]  end_device-2:0: add: handle(0x0011), sas_addr(0x510600b00f3dfa80)
[   33.764634] cam-dummy-reg: disabling
[   38.404635] mpt3sas_cm2: port enable: SUCCESS
[   38.406201] pci 0000:0b:07.0: enabling device (0000 -> 0002)
[   38.406259] mpt3sas 0000:0f:00.0: enabling device (0000 -> 0002)
[   38.406336] mpt3sas_cm3: 63 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (3885552 kB)
[   38.465052] mpt3sas_cm3: CurrentHostPageSize is 0: Setting default host page size to 4k
[   38.465115] mpt3sas_cm3: MSI-X vectors supported: 128
[   38.465129]   no of cores: 4, max_msix_vectors: -1
[   38.465141] mpt3sas_cm3:  0 4 4
[   38.465923] mpt3sas_cm3: High IOPs queues : disabled
[   38.465939] mpt3sas3-msix0: PCI-MSI-X enabled: IRQ 77
[   38.465952] mpt3sas3-msix1: PCI-MSI-X enabled: IRQ 78
[   38.465963] mpt3sas3-msix2: PCI-MSI-X enabled: IRQ 79
[   38.465974] mpt3sas3-msix3: PCI-MSI-X enabled: IRQ 80
[   38.465985] mpt3sas_cm3: iomem(0x0000000600f00000), mapped(0x0000000085d0d7dc), size(1048576)
[   38.466004] mpt3sas_cm3: ioport(0x0000000000000000), size(0)
[   38.524899] mpt3sas_cm3: CurrentHostPageSize is 0: Setting default host page size to 4k
[   38.552664] mpt3sas_cm3: scatter gather: sge_in_main_msg(1), sge_per_chain(7), sge_per_io(128), chains_per_io(19)
[   38.555881] mpt3sas_cm3: request pool(0x000000003b3bae2a) - dma(0x41c600000): depth(7272), frame_size(128), pool_size(909 kB)
[   39.716969] mpt3sas_cm3: sense pool(0x00000000a81f00bf) - dma(0x41c700000): depth(7059), element_size(96), pool_size (661 kB)
[   39.716996] mpt3sas_cm3: sense pool(0x00000000a81f00bf)- dma(0x41c700000): depth(7059),element_size(96), pool_size(4 kB)
[   39.718228] mpt3sas_cm3: reply pool(0x00000000e72d3333) - dma(0x41c800000): depth(7336), frame_size(128), pool_size(917 kB)
[   39.718385] mpt3sas_cm3: config page(0x0000000043a8ee65) - dma(0x45a2e9000): size(512)
[   39.718392] mpt3sas_cm3: Allocated physical memory: size(31210 kB)
[   39.718397] mpt3sas_cm3: Current Controller Queue Depth(7056),Max Controller Queue Depth(7168)
[   39.718402] mpt3sas_cm3: Scatter Gather Elements per IO(128)
[   39.838113] mpt3sas_cm3: _base_display_fwpkg_version: complete
[   39.838123] mpt3sas_cm3: FW Package Ver(05.00.00.00)
[   39.838268] mpt3sas_cm3: TimeSync Interval in Manuf page-11 is not enabled. Periodic Time-Sync will be disabled
[   39.838768] mpt3sas_cm3: SAS3616: FWVersion(05.00.00.00), ChipRevision(0x02), BiosVersion(00.00.00.00)
[   39.838777] NVMe
[   39.838781] mpt3sas_cm3: Protocol=(Initiator,Target), Capabilities=(TLR,EEDP,Diag Trace Buffer,Task Set Full,NCQ)
[   39.838903] mpt3sas_cm3: log_info(0x300301e0): originator(IOP), code(0x03), sub_code(0x01e0)
[   39.838932] mpt3sas_cm3: log_info(0x300301e0): originator(IOP), code(0x03), sub_code(0x01e0)
[   39.838971] mpt3sas 0000:0f:00.0: Max SCSIIO MPT commands: 7056 shared with nr_hw_queues = 4
[   39.838982] scsi host3: Fusion MPT SAS Host
[   39.865436] mpt3sas_cm3: sending port enable !!
[   42.653845] mpt3sas_cm3: hba_port entry: 00000000cd02c1a1, port: 0 is added to hba_port list
[   42.656344] mpt3sas_cm3: hba_port entry: 00000000d693f100, port: 8 is added to hba_port list
[   42.659492] mpt3sas_cm3: host_add: handle(0x0001), sas_addr(0x500605b00f3df7f0), phys(17)
[   42.659983] mpt3sas_cm3: handle(0x11) sas_address(0x510600b00f3df7f0) port_type(0x0)
[   42.661220] scsi 3:0:0:0: Enclosure         LSI      VirtualSES       01   PQ: 0 ANSI: 6
[   42.661240] scsi 3:0:0:0: set ignore_delay_remove for handle(0x0011)
[   42.661247] scsi 3:0:0:0: SES: handle(0x0011), sas_addr(0x510600b00f3df7f0), phy(16), device_name(0x510600b00f3df7f0)
[   42.661252] scsi 3:0:0:0: enclosure logical id (0x500605b00f3df7f0), slot(16) 
[   42.661256] scsi 3:0:0:0: enclosure level(0x0000), connector name(     )
[   42.661262] scsi 3:0:0:0: qdepth(1), tagged(0), scsi_level(7), cmd_que(0)
[   42.661302] mpt3sas_cm3: log_info(0x31200206): originator(PL), code(0x20), sub_code(0x0206)
[   42.662322]  end_device-3:0: add: handle(0x0011), sas_addr(0x510600b00f3df7f0)
[   47.904637] mpt3sas_cm3: port enable: SUCCESS
[   48.094448] scsi 0:0:0:0: Attached scsi generic sg0 type 13
[   48.095649] sd 1:0:0:0: Attached scsi generic sg1 type 0
[   48.095747] scsi 1:0:1:0: Attached scsi generic sg2 type 13
[   48.095842] scsi 2:0:0:0: Attached scsi generic sg3 type 13
[   48.096113] scsi 3:0:0:0: Attached scsi generic sg4 type 13
geerlingguy commented 2 years ago

And hey, look at that!

pi@sas:~ $ lsblk
NAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda           8:0    0 465.3G  0 disk 
mmcblk0     179:0    0  29.7G  0 disk 
├─mmcblk0p1 179:1    0   256M  0 part /boot
└─mmcblk0p2 179:2    0  29.5G  0 part /

Hello, lonely little WD Green that I was willing to sacrifice if this entire thing went kaboom!

Next test is to get one hard drive off each card and see if that works too.

geerlingguy commented 2 years ago

I don't have enough HD Mini SAS (SFF-8643) to SATA adapter cables (I'm using these from CableCreation) to test all four cards... but two cards are working with two drives each:

pi@sas:~ $ lsblk
NAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda           8:0    0 465.8G  0 disk 
sdb           8:16   0 111.8G  0 disk 
sdc           8:32   0 465.3G  0 disk 
sdd           8:48   0 111.8G  0 disk 

I used my guide to create an mdadm RAID array in Linux to create a 4 disk RAID 0 array.

And the array seems to be working:

pi@sas:~ $ sudo mdadm --detail /dev/md0
/dev/md0:
           Version : 1.2
     Creation Time : Wed Apr 27 21:55:08 2022
        Raid Level : raid0
        Array Size : 1210157056 (1154.10 GiB 1239.20 GB)
      Raid Devices : 4
     Total Devices : 4
       Persistence : Superblock is persistent

       Update Time : Wed Apr 27 21:55:08 2022
             State : clean 
    Active Devices : 4
   Working Devices : 4
    Failed Devices : 0
     Spare Devices : 0

            Layout : original
        Chunk Size : 512K

Consistency Policy : none

              Name : sas:0  (local to host sas)
              UUID : 221f1350:c9590fd4:03deaae8:d09bae04
            Events : 0

    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync   /dev/sda1
       1       8       17        1      active sync   /dev/sdb1
       2       8       33        2      active sync   /dev/sdc1
       3       8       49        3      active sync   /dev/sdd1

And after mounting the array:

pi@sas:~ $ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/root        29G  1.5G   27G   6% /
devtmpfs        1.7G     0  1.7G   0% /dev
tmpfs           1.9G     0  1.9G   0% /dev/shm
tmpfs           759M  976K  758M   1% /run
tmpfs           5.0M  4.0K  5.0M   1% /run/lock
/dev/mmcblk0p1  253M   45M  208M  18% /boot
tmpfs           380M     0  380M   0% /run/user/1000
/dev/md0        1.2T   28K  1.2T   1% /mnt/raid0

And here are the results of disk-benchmark.sh:

Running fio sequential read test...
fio-rand-read-sequential: (g=0): rw=read, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=64
...
fio-3.25
Starting 4 processes
Jobs: 4 (f=4): [R(4)][27.3%][r=404MiB/s][r=403 IOPS][eta 00m:08s]
Jobs: 4 (f=4): [R(4)][36.4%][r=402MiB/s][r=401 IOPS][eta 00m:07s]
Jobs: 4 (f=4): [R(4)][45.5%][r=389MiB/s][r=388 IOPS][eta 00m:06s]
Jobs: 4 (f=4): [R(4)][54.5%][r=392MiB/s][r=391 IOPS][eta 00m:05s]
Jobs: 4 (f=4): [R(4)][63.6%][r=408MiB/s][r=408 IOPS][eta 00m:04s]
Jobs: 4 (f=4): [R(4)][72.7%][r=394MiB/s][r=393 IOPS][eta 00m:03s]
Jobs: 4 (f=4): [R(4)][81.8%][r=396MiB/s][r=396 IOPS][eta 00m:02s]
Jobs: 4 (f=4): [R(4)][90.9%][r=397MiB/s][r=396 IOPS][eta 00m:01s]
Jobs: 4 (f=4): [R(4)][24.4%][r=385MiB/s][r=384 IOPS][eta 00m:34s]
fio-rand-read-sequential: (groupid=0, jobs=4): err= 0: pid=2530: Wed Apr 27 22:01:57 2022
  read: IOPS=396, BW=397MiB/s (416MB/s)(4188MiB/10559msec)
    slat (usec): min=75, max=100046, avg=4600.93, stdev=12722.72
    clat (msec): min=134, max=1243, avg=634.65, stdev=121.09
     lat (msec): min=136, max=1243, avg=639.25, stdev=121.80
    clat percentiles (msec):
     |  1.00th=[  155],  5.00th=[  456], 10.00th=[  575], 20.00th=[  609],
     | 30.00th=[  625], 40.00th=[  634], 50.00th=[  642], 60.00th=[  651],
     | 70.00th=[  659], 80.00th=[  676], 90.00th=[  701], 95.00th=[  743],
     | 99.00th=[ 1083], 99.50th=[ 1133], 99.90th=[ 1200], 99.95th=[ 1217],
     | 99.99th=[ 1250]
   bw (  KiB/s): min=91961, max=452374, per=95.61%, avg=388307.83, stdev=19725.13, samples=83
   iops        : min=   86, max=  441, avg=378.25, stdev=19.40, samples=83
  lat (msec)   : 250=2.67%, 500=2.87%, 750=89.68%, 1000=3.01%, 2000=1.77%
  cpu          : usr=0.17%, sys=3.29%, ctx=3682, majf=0, minf=65634
  IO depths    : 1=0.1%, 2=0.2%, 4=0.4%, 8=0.8%, 16=1.5%, 32=3.1%, >=64=94.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=99.9%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=4188,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   READ: bw=397MiB/s (416MB/s), 397MiB/s-397MiB/s (416MB/s-416MB/s), io=4188MiB (4391MB), run=10559-10559msec

Disk stats (read/write):
    md0: ios=16752/0, merge=0/0, ticks=9172540/0, in_queue=9172540, util=99.19%, aggrios=4188/1, aggrmerge=0/0, aggrticks=2296652/0, aggrin_queue=2296652, aggrutil=98.43%
  sdd: ios=4188/1, merge=0/0, ticks=1898746/0, in_queue=1898746, util=97.04%
  sdc: ios=4188/1, merge=0/0, ticks=2574014/0, in_queue=2574015, util=97.74%
  sdb: ios=4188/1, merge=0/0, ticks=2302127/0, in_queue=2302127, util=98.43%
  sda: ios=4188/1, merge=0/0, ticks=2411723/0, in_queue=2411723, util=97.87%

Running iozone 1024K random read and write tests...
    Iozone: Performance Test of File I/O
            Version $Revision: 3.492 $
        Compiled for 64 bit mode.
        Build: linux-arm 

    Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins
                 Al Slater, Scott Rhine, Mike Wisner, Ken Goss
                 Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR,
                 Randy Dunlap, Mark Montague, Dan Million, Gavin Brebner,
                 Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy, Dave Boone,
                 Erik Habbinga, Kris Strecker, Walter Wong, Joshua Root,
                 Fabrice Bacchella, Zhenghua Xue, Qin Li, Darren Sawyer,
                 Vangel Bojaxhi, Ben England, Vikentsi Lapa,
                 Alexey Skidanov, Sudhir Kumar.

    Run began: Wed Apr 27 22:01:58 2022

    Include fsync in write timing
    O_DIRECT feature enabled
    Auto Mode
    File size set to 102400 kB
    Record Size 1024 kB
    Command line used: ./iozone -e -I -a -s 100M -r 1024k -i 0 -i 2 -f /mnt/raid0/iozone
    Output is in kBytes/sec
    Time Resolution = 0.000001 seconds.
    Processor cache size set to 1024 kBytes.
    Processor cache line size set to 32 bytes.
    File stride size set to 17 * record size.
                                                              random    random     bkwd    record    stride                                    
              kB  reclen    write  rewrite    read    reread    read     write     read   rewrite      read   fwrite frewrite    fread  freread
          102400    1024   258650   181828                      92813   167495                                                                

iozone test complete.

Running iozone 4K random read and write tests...
    Iozone: Performance Test of File I/O
            Version $Revision: 3.492 $
        Compiled for 64 bit mode.
        Build: linux-arm 

    Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins
                 Al Slater, Scott Rhine, Mike Wisner, Ken Goss
                 Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR,
                 Randy Dunlap, Mark Montague, Dan Million, Gavin Brebner,
                 Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy, Dave Boone,
                 Erik Habbinga, Kris Strecker, Walter Wong, Joshua Root,
                 Fabrice Bacchella, Zhenghua Xue, Qin Li, Darren Sawyer,
                 Vangel Bojaxhi, Ben England, Vikentsi Lapa,
                 Alexey Skidanov, Sudhir Kumar.

    Run began: Wed Apr 27 22:02:01 2022

    Include fsync in write timing
    O_DIRECT feature enabled
    Auto Mode
    File size set to 102400 kB
    Record Size 4 kB
    Command line used: ./iozone -e -I -a -s 100M -r 4k -i 0 -i 2 -f /mnt/raid0/iozone
    Output is in kBytes/sec
    Time Resolution = 0.000001 seconds.
    Processor cache size set to 1024 kBytes.
    Processor cache line size set to 32 bytes.
    File stride size set to 17 * record size.
                                                              random    random     bkwd    record    stride                                    
              kB  reclen    write  rewrite    read    reread    read     write     read   rewrite      read   fwrite frewrite    fread  freread
          102400       4    25394    32450                       5270     5323                                                                

iozone test complete.

So we can still put through 400 MiB/s (416 MB/s) using multiple HBAs (two, in this case), which is a positive sign that the Pi won't be hampered too much by multiple cards on a PCIe switch. I hoped that would be the case, but was prepared for disappointment. Luckily I'm not disappointed, lol.

The random IO would be a lot faster if I used all faster drives. As it is, I have a 120 GB MakerDisk SSD, a 120 GB Kingston A400 SSD, and two WD GreenPower WD5000AVDS drives that are very very slow, dragging the array down.

But I wanted to see how a mixed environment would work.

I just ordered another 2 pack of the HD Mini SAS cables and hopefully I will see the same result spanning the disks one disk per card.

geerlingguy commented 2 years ago

Created a PR with the patch: https://github.com/geerlingguy/linux/pull/4

joshuaboud commented 2 years ago

Glad to see this worked out for you. Have you tried recreating the kernel panic issue when quiet is turned on?

geerlingguy commented 2 years ago

@joshuaboud Current /boot/cmdline.txt:

console=serial0,115200 console=tty1 root=PARTUUID=1b8530a1-02 rootfstype=ext4 fsck.repair=yes rootwait

I just modified the file to:

console=serial0,115200 console=tty3 root=PARTUUID=1b8530a1-02 rootfstype=ext4 fsck.repair=yes loglevel=3 quiet rootwait logo.nologo

After reboot, it does indeed lock up, with the following kernel panic:

IMG_1207

joshuaboud commented 2 years ago

It's so strange, I have no idea what could cause that, though at least it isn't just me

geerlingguy commented 2 years ago

@joshuaboud Yeah... I'm perplexed, didn't look too deep but it is good to see it's consistent and also still happening with 5.15.y and the latest firmware. Just reset the cmdline.txt back and rebooted again, and everything's working as normal.

Something weird with the behavior of PCIe initialization if you try silencing the console...

geerlingguy commented 2 years ago

Finally got the extra cables in, and at this point I can confirm I can split one drive per controller, and they're all accessible, and perform similarly (afaict):

Run status group 0 (all jobs):
   READ: bw=397MiB/s (416MB/s), 397MiB/s-397MiB/s (416MB/s-416MB/s), io=4186MiB (4389MB), run=10557-10557msec

                                                              random    random     bkwd    record    stride                                    
              kB  reclen    write  rewrite    read    reread    read     write     read   rewrite      read   fwrite frewrite    fread  freread
          102400    1024   177821   267258                      91817   181764    

                                                              random    random     bkwd    record    stride                                    
              kB  reclen    write  rewrite    read    reread    read     write     read   rewrite      read   fwrite frewrite    fread  freread
          102400       4    26498    37931                       5299     5682    
geerlingguy commented 2 years ago

I downloaded StorCLI (from here: https://www.broadcom.com/support/download-search?pg=&pf=&pn=&pa=&po=&dk=storcli&pl= - MR 7.20) and ran the arm64 version:

pi@sas:~ $ ./storcli64 show
CLI Version = 007.2007.0000.0000 Feb 11, 2022
Operating system = Linux 5.15.35-v8+
Status Code = 0
Status = Success
Description = None

Number of Controllers = 0
Host Name = sas
Operating System  = Linux 5.15.35-v8+

Do I need to do anything else special to get storcli to work with these cards through the PCIe switches?

geerlingguy commented 2 years ago

Oh... lol:

pi@sas:~ $ sudo ./storcli64 show
CLI Version = 007.2103.0000.0000 Dec 08, 2021
Operating system = Linux 5.15.35-v8+
Status Code = 0
Status = Success
Description = None

Number of Controllers = 4
Host Name = sas
Operating System  = Linux 5.15.35-v8+
StoreLib IT Version = 07.2103.0200.0000

IT System Overview :
==================

----------------------------------------------------------------------------
Ctl Model         AdapterType   VendId DevId SubVendId SubDevId PCI Address 
----------------------------------------------------------------------------
  0 HBA 9405W-16i   SAS3616(B0) 0x1000  0xD1    0x1000   0x3080 00:0c:00:00 
  1 HBA 9405W-16i   SAS3616(B0) 0x1000  0xD1    0x1000   0x3080 00:0d:00:00 
  2 HBA 9405W-16i   SAS3616(B0) 0x1000  0xD1    0x1000   0x3080 00:0e:00:00 
  3 HBA 9405W-16i   SAS3616(B0) 0x1000  0xD1    0x1000   0x3080 00:0f:00:00 
----------------------------------------------------------------------------

And then all the drives:

$ sudo ./storcli64 /c0 show
...
------------------------------------------------------------------------------
EID:Slt DID State DG       Size Intf Med SED PI SeSz Model                 Sp 
------------------------------------------------------------------------------
0:0       1 JBOD  -  111.790 GB SATA SSD -   -  512B KINGSTON SA400S37120G -  
------------------------------------------------------------------------------

$ sudo ./storcli64 /c1 show
...
------------------------------------------------------------------------------
EID:Slt DID State DG       Size Intf Med SED PI SeSz Model                 Sp 
------------------------------------------------------------------------------
0:0       1 JBOD  -  465.261 GB SATA HDD -   -  512B WDC WD5000AVDS-63U7B1 -  
------------------------------------------------------------------------------

$ sudo ./storcli64 /c2 show
...
-----------------------------------------------------------------
EID:Slt DID State DG       Size Intf Med SED PI SeSz Model    Sp 
-----------------------------------------------------------------
0:0       1 JBOD  -  111.790 GB SATA SSD -   -  512B SATA SSD -  
-----------------------------------------------------------------

$ sudo ./storcli64 /c3 show
...
------------------------------------------------------------------------------
EID:Slt DID State DG       Size Intf Med SED PI SeSz Model                 Sp 
------------------------------------------------------------------------------
0:0       1 JBOD  -  465.761 GB SATA HDD -   -  512B WDC WD5000AVDS-61U7B1 -  
------------------------------------------------------------------------------
geerlingguy commented 2 years ago

Besides the weird issue with quiet not working, I think we've explored this card enough to give it a thumbs up overall.

geerlingguy commented 2 years ago

Still testing, in a sense.

$ ls /dev/sd*[a-z] | wc -l
60

heh... First test is RAID 0 using my mdadm guide:

# Partition all 60 disks (optional... I just obliterate the partitioning when I create the array).
$ for i in `ls /dev/sd*[a-z]`; do sudo sgdisk -n 1:0:0 $i; done

# Get list of all the drives to copy out to next command.
$ ls /dev/sd*[a-z]

# Create a RAID0 array.
$ sudo mdadm --create --verbose /dev/md0 --level=0 --raid-devices=60 /dev/sda /dev/sdae /dev/sdaj /dev/sdao /dev/sdat /dev/sday /dev/sdbc /dev/sdbh /dev/sdg /dev/sdl /dev/sdq /dev/sdv /dev/sdaa /dev/sdaf /dev/sdak /dev/sdap /dev/sdau /dev/sdaz /dev/sdbd /dev/sdc  /dev/sdh /dev/sdm /dev/sdr /dev/sdw /dev/sdab /dev/sdag /dev/sdal /dev/sdaq /dev/sdav /dev/sdb  /dev/sdbe /dev/sdd  /dev/sdi /dev/sdn /dev/sds /dev/sdx /dev/sdac /dev/sdah /dev/sdam /dev/sdar /dev/sdaw /dev/sdba /dev/sdbf /dev/sde  /dev/sdj /dev/sdo /dev/sdt /dev/sdy /dev/sdad /dev/sdai /dev/sdan /dev/sdas /dev/sdax /dev/sdbb /dev/sdbg /dev/sdf  /dev/sdk /dev/sdp /dev/sdu /dev/sdz

# Verify the array is working.
$ sudo mdadm --detail /dev/md0
/dev/md0:
           Version : 1.2
     Creation Time : Thu May 12 15:12:43 2022
        Raid Level : raid0
        Array Size : 1171901583360 (1117612.44 GiB 1200027.22 GB)
      Raid Devices : 60
     Total Devices : 60
       Persistence : Superblock is persistent

       Update Time : Thu May 12 15:12:43 2022
             State : clean 
    Active Devices : 60
   Working Devices : 60
    Failed Devices : 0
     Spare Devices : 0

            Layout : -unknown-
        Chunk Size : 512K

Consistency Policy : none

              Name : sas:0  (local to host sas)
              UUID : f8b39a60:357b0007:acc18066:aa6fdf97
            Events : 0

    Number   Major   Minor   RaidDevice State
       0       8        0        0      active sync   /dev/sda
       1      65      224        1      active sync   /dev/sdae
       2      66       48        2      active sync   /dev/sdaj
       3      66      128        3      active sync   /dev/sdao
       4      66      208        4      active sync   /dev/sdat
       5      67       32        5      active sync   /dev/sday
       6      67       96        6      active sync   /dev/sdbc
       7      67      176        7      active sync   /dev/sdbh
       8       8       96        8      active sync   /dev/sdg
       9       8      176        9      active sync   /dev/sdl
      10      65        0       10      active sync   /dev/sdq
      11      65       80       11      active sync   /dev/sdv
      12      65      160       12      active sync   /dev/sdaa
      13      65      240       13      active sync   /dev/sdaf
      14      66       64       14      active sync   /dev/sdak
      15      66      144       15      active sync   /dev/sdap
      16      66      224       16      active sync   /dev/sdau
      17      67       48       17      active sync   /dev/sdaz
      18      67      112       18      active sync   /dev/sdbd
      19       8       32       19      active sync   /dev/sdc
      20       8      112       20      active sync   /dev/sdh
      21       8      192       21      active sync   /dev/sdm
      22      65       16       22      active sync   /dev/sdr
      23      65       96       23      active sync   /dev/sdw
      24      65      176       24      active sync   /dev/sdab
      25      66        0       25      active sync   /dev/sdag
      26      66       80       26      active sync   /dev/sdal
      27      66      160       27      active sync   /dev/sdaq
      28      66      240       28      active sync   /dev/sdav
      29       8       16       29      active sync   /dev/sdb
      30      67      128       30      active sync   /dev/sdbe
      31       8       48       31      active sync   /dev/sdd
      32       8      128       32      active sync   /dev/sdi
      33       8      208       33      active sync   /dev/sdn
      34      65       32       34      active sync   /dev/sds
      35      65      112       35      active sync   /dev/sdx
      36      65      192       36      active sync   /dev/sdac
      37      66       16       37      active sync   /dev/sdah
      38      66       96       38      active sync   /dev/sdam
      39      66      176       39      active sync   /dev/sdar
      40      67        0       40      active sync   /dev/sdaw
      41      67       64       41      active sync   /dev/sdba
      42      67      144       42      active sync   /dev/sdbf
      43       8       64       43      active sync   /dev/sde
      44       8      144       44      active sync   /dev/sdj
      45       8      224       45      active sync   /dev/sdo
      46      65       48       46      active sync   /dev/sdt
      47      65      128       47      active sync   /dev/sdy
      48      65      208       48      active sync   /dev/sdad
      49      66       32       49      active sync   /dev/sdai
      50      66      112       50      active sync   /dev/sdan
      51      66      192       51      active sync   /dev/sdas
      52      67       16       52      active sync   /dev/sdax
      53      67       80       53      active sync   /dev/sdbb
      54      67      160       54      active sync   /dev/sdbg
      55       8       80       55      active sync   /dev/sdf
      56       8      160       56      active sync   /dev/sdk
      57       8      240       57      active sync   /dev/sdp
      58      65       64       58      active sync   /dev/sdu
      59      65      144       59      active sync   /dev/sdz

# Format the array.
$ sudo mkfs.ext4 -m 0 -E lazy_itable_init=0,lazy_journal_init=0 /dev/md0

# Mount the array.
$ sudo mkdir /mnt/raid0
$ sudo mount /dev/md0 /mnt/raid0

When I started the formatting operation, I got the following call trace printed by mdadm:

[ 1506.932433] ------------[ cut here ]------------
[ 1506.932450] WARNING: CPU: 1 PID: 1405 at lib/vsprintf.c:2742 vsnprintf+0x54c/0x6e0
[ 1506.932468] Modules linked in: raid0 md_mod sg cmac algif_hash aes_arm64 algif_skcipher af_alg bnep hci_uart btbcm bluetooth ecdh_generic ecc hid_logitech_hidpp 8021q garp stp llc joydev snd_soc_hdmi_codec hid_logitech_dj brcmfmac brcmutil bcm2835_codec(C) v3d cfg80211 bcm2835_isp(C) vc4 bcm2835_v4l2(C) gpu_sched v4l2_mem2mem bcm2835_mmal_vchiq(C) videobuf2_vmalloc videobuf2_dma_contig videobuf2_memops cec videobuf2_v4l2 drm_kms_helper rfkill videobuf2_common raspberrypi_hwmon snd_soc_core i2c_brcmstb mpt3sas videodev raid_class snd_compress scsi_transport_sas vc_sm_cma(C) snd_pcm_dmaengine snd_bcm2835(C) snd_pcm snd_timer mc snd syscopyarea sysfillrect sysimgblt rpivid_mem nvmem_rmem fb_sys_fops uio_pdrv_genirq uio drm fuse drm_panel_orientation_quirks backlight ip_tables x_tables ipv6
[ 1506.932622] CPU: 1 PID: 1405 Comm: mdadm Tainted: G         C        5.15.35-v8+ #1
[ 1506.932629] Hardware name: Raspberry Pi Compute Module 4 Rev 1.0 (DT)
[ 1506.932633] pstate: a0000005 (NzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 1506.932639] pc : vsnprintf+0x54c/0x6e0
[ 1506.932644] lr : snprintf+0x60/0x88
[ 1506.932649] sp : ffffffc018b0b880
[ 1506.932652] x29: ffffffc018b0b880 x28: ffffff8121707580 x27: 00000000000000ca
[ 1506.932661] x26: 000000000000002d x25: ffffffd939eb5168 x24: ffffffd9b808b000
[ 1506.932670] x23: ffffffc018b0bb1a x22: 00000000ffffffd8 x21: ffffffd939eb5178
[ 1506.932678] x20: ffffffc018b0b9b0 x19: ffffffd9b808b048 x18: 0000000000000001
[ 1506.932686] x17: 0000000000000001 x16: ffffffd9b785f308 x15: 00000221b58e2000
[ 1506.932694] x14: ffffffd9b808b048 x13: ffffff8121707418 x12: ffffff8121707414
[ 1506.932702] x11: 000000000000003c x10: ffffffc018b0b9b0 x9 : 00000000ffffffd8
[ 1506.932711] x8 : ffffffc018b0b980 x7 : 0000000000000003 x6 : 00000000ffffffff
[ 1506.932718] x5 : 0000000000000000 x4 : ffffffd9b808b048 x3 : ffffffc018b0b930
[ 1506.932726] x2 : ffffffd939eb5178 x1 : fffffffffffffffe x0 : ffffffc018b0b9b0
[ 1506.932735] Call trace:
[ 1506.932737]  vsnprintf+0x54c/0x6e0
[ 1506.932742]  snprintf+0x60/0x88
[ 1506.932747]  dump_zones.isra.17+0x100/0x190 [raid0]
[ 1506.932758]  raid0_run+0x148/0x250 [raid0]
[ 1506.932764]  md_run+0x488/0xb18 [md_mod]
[ 1506.932791]  do_md_run+0x80/0x178 [md_mod]
[ 1506.932807]  md_ioctl+0xd48/0x1640 [md_mod]
[ 1506.932824]  blkdev_ioctl+0x23c/0x3d0
[ 1506.932830]  block_ioctl+0x54/0x70
[ 1506.932835]  __arm64_sys_ioctl+0xb0/0xf0
[ 1506.932842]  invoke_syscall+0x4c/0x110
[ 1506.932849]  el0_svc_common.constprop.3+0xfc/0x120
[ 1506.932854]  do_el0_svc+0x2c/0x90
[ 1506.932860]  el0_svc+0x24/0x60
[ 1506.932867]  el0t_64_sync_handler+0x90/0xb8
[ 1506.932872]  el0t_64_sync+0x180/0x184
[ 1506.932877] ---[ end trace 6a720dbd06819c8f ]---
[ 1506.933188] md0: detected capacity change from 0 to 2343803166720

But it seemed to proceed normally.

geerlingguy commented 2 years ago

EXT4 initialization started at 3:15 pm, took until 5:23 pm, so total time of 2 hours, 8 minutes (it was writing around 150 MiB/sec the entire time, and it can get a bit toasty!).

I also noticed a few more errors in dmesg during the formatting:

[ 3514.441499] perf: interrupt took too long (2511 > 2500), lowering kernel.perf_event_max_sample_rate to 79500
[ 3889.553149] perf: interrupt took too long (3158 > 3138), lowering kernel.perf_event_max_sample_rate to 63250
[ 4506.851519] perf: interrupt took too long (3956 > 3947), lowering kernel.perf_event_max_sample_rate to 50500
[ 5603.499470] perf: interrupt took too long (4948 > 4945), lowering kernel.perf_event_max_sample_rate to 40250

It looks like others have reported similar messages during heavy Disk I/O (e.g. on older systems running a btrfs scrub, like here. From this patch.

I also want to try overclocking to 2.2 GHz after running initial benchmarks since it looks like I have plenty of headroom (CPU temp is around 34-37°C with those high-CFM fans blowing directly over the CM4 heatsink).

geerlingguy commented 2 years ago

Hmm...

$ sudo mount /dev/md0 /mnt/raid0
mount: /mnt/raid0: wrong fs type, bad option, bad superblock on /dev/md0, missing codepage or helper program, or other error.

$ sudo mdadm --detail /dev/md0
/dev/md0:
           Version : 1.2
     Creation Time : Thu May 12 15:12:43 2022
        Raid Level : raid0
        Array Size : 1171901583360 (1117612.44 GiB 1200027.22 GB)
      Raid Devices : 60
     Total Devices : 60
       Persistence : Superblock is persistent

       Update Time : Thu May 12 15:12:43 2022
             State : broken 
    Active Devices : 60
   Working Devices : 60
    Failed Devices : 0
     Spare Devices : 0

            Layout : -unknown-
        Chunk Size : 512K

Consistency Policy : none

              Name : sas:0  (local to host sas)
              UUID : f8b39a60:357b0007:acc18066:aa6fdf97
            Events : 0

    Number   Major   Minor   RaidDevice State
       0       8        0        0      active sync   /dev/sda
       1      65      224        1      active sync   /dev/sdae
       2      66       48        2      active sync   /dev/sdaj
       3      66      128        3      active sync   /dev/sdao
       4      66      208        4      active sync   /dev/sdat
       5      67       32        5      active sync
       6      67       96        6      active sync   /dev/sdbc
       7      67      176        7      active sync   /dev/sdbh
       8       8       96        8      active sync   /dev/sdg
       9       8      176        9      active sync   /dev/sdl
      10      65        0       10      active sync   /dev/sdq
      11      65       80       11      active sync   /dev/sdv
      12      65      160       12      active sync   /dev/sdaa
      13      65      240       13      active sync   /dev/sdaf
      14      66       64       14      active sync   /dev/sdak
      15      66      144       15      active sync   /dev/sdap
      16      66      224       16      active sync   /dev/sdau
      17      67       48       17      active sync   /dev/sdaz
      18      67      112       18      active sync   /dev/sdbd
      19       8       32       19      active sync   /dev/sdc
      20       8      112       20      active sync   /dev/sdh
      21       8      192       21      active sync   /dev/sdm
      22      65       16       22      active sync   /dev/sdr
      23      65       96       23      active sync   /dev/sdw
      24      65      176       24      active sync   /dev/sdab
      25      66        0       25      active sync   /dev/sdag
      26      66       80       26      active sync   /dev/sdal
      27      66      160       27      active sync   /dev/sdaq
      28      66      240       28      active sync   /dev/sdav
      29       8       16       29      active sync   /dev/sdb
      30      67      128       30      active sync   /dev/sdbe
      31       8       48       31      active sync   /dev/sdd
      32       8      128       32      active sync   /dev/sdi
      33       8      208       33      active sync   /dev/sdn
      34      65       32       34      active sync   /dev/sds
      35      65      112       35      active sync   /dev/sdx
      36      65      192       36      active sync   /dev/sdac
      37      66       16       37      active sync   /dev/sdah
      38      66       96       38      active sync   /dev/sdam
      39      66      176       39      active sync   /dev/sdar
      40      67        0       40      active sync   /dev/sdaw
      41      67       64       41      active sync   /dev/sdba
      42      67      144       42      active sync   /dev/sdbf
      43       8       64       43      active sync   /dev/sde
      44       8      144       44      active sync   /dev/sdj
      45       8      224       45      active sync   /dev/sdo
      46      65       48       46      active sync   /dev/sdt
      47      65      128       47      active sync   /dev/sdy
      48      65      208       48      active sync   /dev/sdad
      49      66       32       49      active sync   /dev/sdai
      50      66      112       50      active sync   /dev/sdan
      51      66      192       51      active sync   /dev/sdas
      52      67       16       52      active sync   /dev/sdax
      53      67       80       53      active sync   /dev/sdbb
      54      67      160       54      active sync   /dev/sdbg
      55       8       80       55      active sync   /dev/sdf
      56       8      160       56      active sync   /dev/sdk
      57       8      240       57      active sync   /dev/sdp
      58      65       64       58      active sync   /dev/sdu
      59      65      144       59      active sync   /dev/sdz

$ sudo cat /proc/mdstat
Personalities : [raid0] 
md0 : active raid0 sdz[59] sdu[58] sdp[57] sdk[56] sdf[55] sdbg[54] sdbb[53] sdax[52] sdas[51] sdan[50] sdai[49] sdad[48] sdy[47] sdt[46] sdo[45] sdj[44] sde[43] sdbf[42] sdba[41] sdaw[40] sdar[39] sdam[38] sdah[37] sdac[36] sdx[35] sds[34] sdn[33] sdi[32] sdd[31] sdbe[30] sdb[29] sdav[28] sdaq[27] sdal[26] sdag[25] sdab[24] sdw[23] sdr[22] sdm[21] sdh[20] sdc[19] sdbd[18] sdaz[17] sdau[16] sdap[15] sdak[14] sdaf[13] sdaa[12] sdv[11] sdq[10] sdl[9] sdg[8] sdbh[7] sdbc[6] sday[5] sdat[4] sdao[3] sdaj[2] sdae[1] sda[0]
      1171901583360 blocks super 1.2 512k chunks

unused devices: <none>
geerlingguy commented 2 years ago

So... it looks like the missing drive in that list is /dev/sday (I grabbed the list and pressed F5 in Sublime text to sort alphabetically, then did my ABC's through it until I found the missing letter).

Ah... dmesg showing a bunch of errors: https://gist.github.com/geerlingguy/1004b7925de52aff730ecd84769d2b0d


...
[ 9226.938805] scsi 3:0:18:0: Attached scsi generic sg47 type 13
[ 9226.939107]  end_device-3:18: add: handle(0x0011), sas_addr(0x510600b00f3df7f0)
[ 9226.939131] mpt3sas_cm3:     AFTER adding end device: handle (0x0011), sas_addr(0x510600b00f3df7f0)
[ 9226.939567] mpt3sas_cm3:     BEFORE adding end device: handle (0x0024), sas_addr(0x300605b00f3df7f7)
[ 9226.940346] mpt3sas_cm3: handle(0x24) sas_address(0x300605b00f3df7f7) port_type(0x1)
[ 9227.199460] sd 3:0:4:0: Power-on or device reset occurred
[ 9227.199613] sd 3:0:3:0: Power-on or device reset occurred
[ 9227.199658] sd 3:0:10:0: Power-on or device reset occurred
[ 9227.199676] sd 3:0:2:0: Power-on or device reset occurred
[ 9227.199692] sd 3:0:5:0: Power-on or device reset occurred
[ 9227.199706] sd 3:0:6:0: Power-on or device reset occurred
[ 9227.199721] sd 3:0:1:0: Power-on or device reset occurred
[ 9227.199737] sd 3:0:12:0: Power-on or device reset occurred
[ 9227.199752] sd 3:0:13:0: Power-on or device reset occurred
[ 9227.200013] sd 3:0:11:0: Power-on or device reset occurred
[ 9227.200041] sd 3:0:16:0: Power-on or device reset occurred
[ 9227.200056] sd 3:0:8:0: Power-on or device reset occurred
[ 9227.200070] sd 3:0:9:0: Power-on or device reset occurred
[ 9227.200084] sd 3:0:14:0: Power-on or device reset occurred
[ 9227.200098] sd 3:0:15:0: Power-on or device reset occurred
[ 9227.200853] scsi 3:0:19:0: Direct-Access     ATA      ST20000NM007D-3D SN01 PQ: 0 ANSI: 6
[ 9227.200893] scsi 3:0:19:0: SATA: handle(0x0024), sas_addr(0x300605b00f3df7f7), phy(7), device_name(0x0000000000000000)
[ 9227.200899] scsi 3:0:19:0: enclosure logical id (0x500605b00f3df7f0), slot(11) 
[ 9227.200903] scsi 3:0:19:0: enclosure level(0x0000), connector name( C2  )
[ 9227.201090] scsi 3:0:19:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
[ 9227.201105] scsi 3:0:19:0: qdepth(128), tagged(1), scsi_level(7), cmd_que(1)
[ 9227.205251] sd 3:0:19:0: Attached scsi generic sg54 type 0
[ 9227.205378] sd 3:0:19:0: Power-on or device reset occurred
[ 9227.205425]  end_device-3:19: add: handle(0x0024), sas_addr(0x300605b00f3df7f7)
...
[ 9227.210798] mpt3sas_cm3: scan devices: complete
[ 9227.236642] sd 3:0:19:0: [sdbi] Write Protect is off
[ 9227.236663] sd 3:0:19:0: [sdbi] Mode Sense: 9b 00 10 08
[ 9227.237916] sd 3:0:19:0: [sdbi] Write cache: enabled, read cache: enabled, supports DPO and FUA
[ 9227.326525] sd 3:0:19:0: [sdbi] Attached SCSI disk
[ 9227.806817] md: md0: raid0 array has a missing/failed member
[ 9229.139126] Buffer I/O error on dev md0, logical block 34, lost async page write
[ 9229.139180] Buffer I/O error on dev md0, logical block 514, lost async page write
[ 9229.139198] Buffer I/O error on dev md0, logical block 515, lost async page write
[ 9229.139214] Buffer I/O error on dev md0, logical block 516, lost async page write
[ 9229.139230] Buffer I/O error on dev md0, logical block 517, lost async page write
[ 9229.139246] Buffer I/O error on dev md0, logical block 518, lost async page write
[ 9229.139262] Buffer I/O error on dev md0, logical block 146487672831, lost async page write
[ 9229.139278] Buffer I/O error on dev md0, logical block 146487705600, lost async page write
[ 9229.139293] Buffer I/O error on dev md0, logical block 146487705601, lost async page write
[ 9229.139309] Buffer I/O error on dev md0, logical block 146487705602, lost async page write
[ 9234.156436] buffer_io_error: 266237 callbacks suppressed
[ 9234.156455] Buffer I/O error on dev md0, logical block 19326304256, lost async page write
...
[ 9249.168163] Buffer I/O error on dev md0, logical block 159016517632, lost async page write
[ 9445.238579] F2FS-fs (md0): Magic Mismatch, valid(0xf2f52010) - read(0x0)
[ 9445.238622] F2FS-fs (md0): Can't find valid F2FS filesystem in 1th superblock
[ 9445.239182] F2FS-fs (md0): Magic Mismatch, valid(0xf2f52010) - read(0x0)
[ 9445.239206] F2FS-fs (md0): Can't find valid F2FS filesystem in 2th superblock
...
geerlingguy commented 2 years ago

Four reboots later, and:

$ sudo mdadm --misc --detail /dev/md0
/dev/md0:
           Version : 1.2
     Creation Time : Thu May 12 15:12:43 2022
        Raid Level : raid0
        Array Size : 1171901583360 (1117612.44 GiB 1200027.22 GB)
      Raid Devices : 60
     Total Devices : 60
       Persistence : Superblock is persistent

       Update Time : Thu May 12 15:12:43 2022
             State : clean 
    Active Devices : 60
   Working Devices : 60
    Failed Devices : 0
     Spare Devices : 0

            Layout : -unknown-
        Chunk Size : 512K

Consistency Policy : none

              Name : sas:0  (local to host sas)
              UUID : f8b39a60:357b0007:acc18066:aa6fdf97
            Events : 0

    Number   Major   Minor   RaidDevice State
       0       8       32        0      active sync   /dev/sdc
       1      65      224        1      active sync   /dev/sdae
       2      66       48        2      active sync   /dev/sdaj
       3      66       80        3      active sync   /dev/sdal
       4      66      224        4      active sync   /dev/sdau
       5      67        0        5      active sync   /dev/sdaw
       6      67      160        6      active sync   /dev/sdbg
       7      67      144        7      active sync   /dev/sdbf
       8       8       64        8      active sync   /dev/sde
       9       8        0        9      active sync   /dev/sda
      10      65       32       10      active sync   /dev/sds
      11      65      160       11      active sync   /dev/sdaa
      12      65       80       12      active sync   /dev/sdv
      13      66       32       13      active sync   /dev/sdai
      14      66       16       14      active sync   /dev/sdah
      15      66      128       15      active sync   /dev/sdao
      16      67       16       16      active sync   /dev/sdax
      17      67       80       17      active sync   /dev/sdbb
      18      67      176       18      active sync   /dev/sdbh
      19       8       80       19      active sync   /dev/sdf
      20       8      176       20      active sync   /dev/sdl
      21       8      128       21      active sync   /dev/sdi
      22      65       16       22      active sync   /dev/sdr
      23      65      144       23      active sync   /dev/sdz
      24      65       48       24      active sync   /dev/sdt
      25      65      240       25      active sync   /dev/sdaf
      26      66      112       26      active sync   /dev/sdan
      27      65      208       27      active sync   /dev/sdad
      28      66      240       28      active sync   /dev/sdav
      29       8       48       29      active sync   /dev/sdd
      30      67      128       30      active sync   /dev/sdbe
      31       8      112       31      active sync   /dev/sdh
      32       8       16       32      active sync   /dev/sdb
      33       8      208       33      active sync   /dev/sdn
      34      65       96       34      active sync   /dev/sdw
      35      65       64       35      active sync   /dev/sdu
      36      65        0       36      active sync   /dev/sdq
      37      66      176       37      active sync   /dev/sdar
      38      66      160       38      active sync   /dev/sdaq
      39      66      144       39      active sync   /dev/sdap
      40      66      192       40      active sync   /dev/sdas
      41      67      112       41      active sync   /dev/sdbd
      42      67       64       42      active sync   /dev/sdba
      43       8      144       43      active sync   /dev/sdj
      44       8       96       44      active sync   /dev/sdg
      45       8      192       45      active sync   /dev/sdm
      46      65      176       46      active sync   /dev/sdab
      47      65      112       47      active sync   /dev/sdx
      48      66       96       48      active sync   /dev/sdam
      49      66        0       49      active sync   /dev/sdag
      50      66       64       50      active sync   /dev/sdak
      51      66      208       51      active sync   /dev/sdat
      52      67       32       52      active sync   /dev/sday
      53      67       48       53      active sync   /dev/sdaz
      54      67       96       54      active sync   /dev/sdbc
      55       8      224       55      active sync   /dev/sdo
      56       8      160       56      active sync   /dev/sdk
      57       8      240       57      active sync   /dev/sdp
      58      65      192       58      active sync   /dev/sdac
      59      65      128       59      active sync   /dev/sdy

About 50% of the time (especially on a fresh boot) it seems to error out after 2 or 3 cards. Not sure why.

But when I try to mount the array, I still get:

$ sudo mount /dev/md0 /mnt/raid0
mount: /mnt/raid0: wrong fs type, bad option, bad superblock on /dev/md0, missing codepage or helper program, or other error.
geerlingguy commented 2 years ago

Trying to format the array again:

pi@sas:~ $ time sudo mkfs.ext4 -m 0 -E lazy_itable_init=0,lazy_journal_init=0 /dev/md0
mke2fs 1.46.2 (28-Feb-2021)
Creating filesystem with 292975395840 4k blocks and 4291632000 inodes
Filesystem UUID: 4177e2fb-d90e-474e-9478-2179f7fad5db
Superblock backups stored on blocks: 
    32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
    4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 
    102400000, 214990848, 512000000, 550731776, 644972544, 1934917632, 
    2560000000, 3855122432, 5804752896, 12800000000, 17414258688, 
    26985857024, 52242776064, 64000000000, 156728328192, 188900999168

Allocating group tables: done                            
Writing inode tables: done                            
Creating journal (262144 blocks): done
Writing superblocks and filesystem accounting information: mkfs.ext4: Input/output error while writing out and closing file system

real    119m48.526s
user    3m0.349s
sys 11m14.895s

Seeing some messages like this again in dmesg:

[  567.745607] F2FS-fs (md0): Magic Mismatch, valid(0xf2f52010) - read(0x0)
[  567.745650] F2FS-fs (md0): Can't find valid F2FS filesystem in 1th superblock
[  567.745855] F2FS-fs (md0): Magic Mismatch, valid(0xf2f52010) - read(0x0)
[  567.745875] F2FS-fs (md0): Can't find valid F2FS filesystem in 2th superblock
[  580.057185] F2FS-fs (md0): Magic Mismatch, valid(0xf2f52010) - read(0x0)
[  580.057222] F2FS-fs (md0): Can't find valid F2FS filesystem in 1th superblock
[  580.057399] F2FS-fs (md0): Magic Mismatch, valid(0xf2f52010) - read(0x0)
[  580.057416] F2FS-fs (md0): Can't find valid F2FS filesystem in 2th superblock
[ 1246.861540] perf: interrupt took too long (2535 > 2500), lowering kernel.perf_event_max_sample_rate to 78750

And towards the end got the same errors as earlier, looks like a card just restarts a bunch of drives—and this time two just went AWOL.

Going to try a different fs since mdadm raid0 and ext4 seems to not be happy. Might also try a smaller array.

After a reboot, mdadm reports the array is clean again. So something with writing the superblock across the 60 drives fails.

geerlingguy commented 2 years ago

Trying to create the fs with lazy init... slower IO might help?

$ time sudo mkfs.ext4 -m 0 /dev/md0

Nope. Same error at same point. Going to maybe switch gears to ZFS too... we'll see.

geerlingguy commented 2 years ago

I reset the array using:

$ sudo nano /etc/mdadm/mdadm.conf
$ sudo wipefs --all --force /dev/md0
$ sudo mdadm --stop /dev/md0
$ sudo mdadm --zero-superblock /dev/sda /dev/sdae /dev/sdaj /dev/sdao /dev/sdat /dev/sday /dev/sdbc /dev/sdbh /dev/sdg /dev/sdl /dev/sdq /dev/sdv /dev/sdaa /dev/sdaf /dev/sdak /dev/sdap /dev/sdau /dev/sdaz /dev/sdbd /dev/sdc  /dev/sdh /dev/sdm /dev/sdr /dev/sdw /dev/sdab /dev/sdag /dev/sdal /dev/sdaq /dev/sdav /dev/sdb  /dev/sdbe /dev/sdd  /dev/sdi /dev/sdn /dev/sds /dev/sdx /dev/sdac /dev/sdah /dev/sdam /dev/sdar /dev/sdaw /dev/sdba /dev/sdbf /dev/sde  /dev/sdj /dev/sdo /dev/sdt /dev/sdy /dev/sdad /dev/sdai /dev/sdan /dev/sdas /dev/sdax /dev/sdbb /dev/sdbg /dev/sdf  /dev/sdk /dev/sdp /dev/sdu /dev/sdz

# Then delete the partition data.
$ for i in `ls /dev/sd*[a-z]`; do sudo wipefs --all --force $i; done
geerlingguy commented 2 years ago

ZFS won't install clean on my custom kernel cleanly, so I'm going to try Btrfs.

# Install BTRFS utilities.
$ sudo apt install btrfs-progs

# Create a RAID-0 Btrfs volume mounted at /btrfs.
$ sudo mkdir /btrfs
$ sudo mkfs.btrfs -L btrfs -d raid0 -m raid0 -f /dev/sda /dev/sdae /dev/sdaj /dev/sdao /dev/sdat /dev/sday /dev/sdbc /dev/sdbh /dev/sdg /dev/sdl /dev/sdq /dev/sdv /dev/sdaa /dev/sdaf /dev/sdak /dev/sdap /dev/sdau /dev/sdaz /dev/sdbd /dev/sdc  /dev/sdh /dev/sdm /dev/sdr /dev/sdw /dev/sdab /dev/sdag /dev/sdal /dev/sdaq /dev/sdav /dev/sdb  /dev/sdbe /dev/sdd  /dev/sdi /dev/sdn /dev/sds /dev/sdx /dev/sdac /dev/sdah /dev/sdam /dev/sdar /dev/sdaw /dev/sdba /dev/sdbf /dev/sde  /dev/sdj /dev/sdo /dev/sdt /dev/sdy /dev/sdad /dev/sdai /dev/sdan /dev/sdas /dev/sdax /dev/sdbb /dev/sdbg /dev/sdf  /dev/sdk /dev/sdp /dev/sdu /dev/sdz
btrfs-progs v5.10.1 
See http://btrfs.wiki.kernel.org for more information.

Label:              btrfs
UUID:               82bc04c9-f954-4fe5-a2d5-f8f56021c904
Node size:          16384
Sector size:        4096
Filesystem size:    1.07PiB
Block group profiles:
  Data:             RAID0            10.00GiB
  Metadata:         RAID0             1.88GiB
  System:           RAID0            58.00MiB
SSD detected:       no
Incompat features:  extref, skinny-metadata
Runtime features:   
Checksum:           crc32c
Number of devices:  60
Devices:
   ID        SIZE  PATH
    1    18.19TiB  /dev/sda
    2    18.19TiB  /dev/sdae
    3    18.19TiB  /dev/sdaj
    4    18.19TiB  /dev/sdao
    5    18.19TiB  /dev/sdat
    6    18.19TiB  /dev/sday
    7    18.19TiB  /dev/sdbc
    8    18.19TiB  /dev/sdbh
    9    18.19TiB  /dev/sdg
   10    18.19TiB  /dev/sdl
   11    18.19TiB  /dev/sdq
   12    18.19TiB  /dev/sdv
   13    18.19TiB  /dev/sdaa
   14    18.19TiB  /dev/sdaf
   15    18.19TiB  /dev/sdak
   16    18.19TiB  /dev/sdap
   17    18.19TiB  /dev/sdau
   18    18.19TiB  /dev/sdaz
   19    18.19TiB  /dev/sdbd
   20    18.19TiB  /dev/sdc
   21    18.19TiB  /dev/sdh
   22    18.19TiB  /dev/sdm
   23    18.19TiB  /dev/sdr
   24    18.19TiB  /dev/sdw
   25    18.19TiB  /dev/sdab
   26    18.19TiB  /dev/sdag
   27    18.19TiB  /dev/sdal
   28    18.19TiB  /dev/sdaq
   29    18.19TiB  /dev/sdav
   30    18.19TiB  /dev/sdb
   31    18.19TiB  /dev/sdbe
   32    18.19TiB  /dev/sdd
   33    18.19TiB  /dev/sdi
   34    18.19TiB  /dev/sdn
   35    18.19TiB  /dev/sds
   36    18.19TiB  /dev/sdx
   37    18.19TiB  /dev/sdac
   38    18.19TiB  /dev/sdah
   39    18.19TiB  /dev/sdam
   40    18.19TiB  /dev/sdar
   41    18.19TiB  /dev/sdaw
   42    18.19TiB  /dev/sdba
   43    18.19TiB  /dev/sdbf
   44    18.19TiB  /dev/sde
   45    18.19TiB  /dev/sdj
   46    18.19TiB  /dev/sdo
   47    18.19TiB  /dev/sdt
   48    18.19TiB  /dev/sdy
   49    18.19TiB  /dev/sdad
   50    18.19TiB  /dev/sdai
   51    18.19TiB  /dev/sdan
   52    18.19TiB  /dev/sdas
   53    18.19TiB  /dev/sdax
   54    18.19TiB  /dev/sdbb
   55    18.19TiB  /dev/sdbg
   56    18.19TiB  /dev/sdf
   57    18.19TiB  /dev/sdk
   58    18.19TiB  /dev/sdp
   59    18.19TiB  /dev/sdu
   60    18.19TiB  /dev/sdz
Knight1 commented 2 years ago

Time to reset the counter on your shirt. Just rebuild the kernel one more time so we can try zfs 🥰

geerlingguy commented 2 years ago
$ sudo btrfs filesystem show
Label: 'btrfs'  uuid: 82bc04c9-f954-4fe5-a2d5-f8f56021c904
    Total devices 60 FS bytes used 128.00KiB
    devid    1 size 18.19TiB used 202.62MiB path /dev/sda
    devid    2 size 18.19TiB used 202.62MiB path /dev/sdae
    devid    3 size 18.19TiB used 203.62MiB path /dev/sdaj
    devid    4 size 18.19TiB used 203.62MiB path /dev/sdao
...

And in dmesg:

[ 2847.425438] perf: interrupt took too long (4963 > 4912), lowering kernel.perf_event_max_sample_rate to 40250
[ 2944.666513] raid6: neonx8   gen()  3644 MB/s
[ 2944.734484] raid6: neonx8   xor()  2657 MB/s
[ 2944.802487] raid6: neonx4   gen()  3947 MB/s
[ 2944.870533] raid6: neonx4   xor()  2761 MB/s
[ 2944.938525] raid6: neonx2   gen()  3421 MB/s
[ 2945.006486] raid6: neonx2   xor()  2529 MB/s
[ 2945.074487] raid6: neonx1   gen()  2716 MB/s
[ 2945.142501] raid6: neonx1   xor()  2039 MB/s
[ 2945.210488] raid6: int64x8  gen()  2585 MB/s
[ 2945.278492] raid6: int64x8  xor()  1471 MB/s
[ 2945.346499] raid6: int64x4  gen()  2560 MB/s
[ 2945.414488] raid6: int64x4  xor()  1484 MB/s
[ 2945.482493] raid6: int64x2  gen()  2349 MB/s
[ 2945.550496] raid6: int64x2  xor()  1314 MB/s
[ 2945.618498] raid6: int64x1  gen()  1816 MB/s
[ 2945.686506] raid6: int64x1  xor()   979 MB/s
[ 2945.686524] raid6: using algorithm neonx4 gen() 3947 MB/s
[ 2945.686529] raid6: .... xor() 2761 MB/s, rmw enabled
[ 2945.686534] raid6: using neon recovery algorithm
[ 2945.707259] xor: measuring software checksum speed
[ 2945.708831]    8regs           :  6352 MB/sec
[ 2945.710185]    32regs          :  7318 MB/sec
[ 2945.714241]    arm64_neon      :  3072 MB/sec
[ 2945.714260] xor: using function: 32regs (7318 MB/sec)
[ 2945.875144] Btrfs loaded, crc32c=crc32c-generic, zoned=no, fsverity=no
[ 2945.880380] BTRFS: device label btrfs devid 1 transid 5 /dev/sda scanned by systemd-udevd (24578)
[ 2945.887736] BTRFS: device label btrfs devid 6 transid 5 /dev/sday scanned by systemd-udevd (24593)
[ 2945.914456] BTRFS: device label btrfs devid 5 transid 5 /dev/sdat scanned by systemd-udevd (24590)
...

And to mount:

$ sudo mount /dev/sda /btrfs
$ sudo btrfs filesystem usage /btrfs
Overall:
    Device size:           1.07PiB
    Device allocated:         11.93GiB
    Device unallocated:        1.07PiB
    Device missing:          0.00B
    Used:            128.00KiB
    Free (estimated):          1.07PiB  (min: 1.07PiB)
    Free (statfs, df):         1.07PiB
    Data ratio:               1.00
    Metadata ratio:           1.00
    Global reserve:        3.25MiB  (used: 0.00B)
    Multiple profiles:              no

Data,RAID0: Size:10.00GiB, Used:0.00B (0.00%)
   /dev/sda  170.62MiB
   /dev/sdae     170.62MiB
...

Metadata,RAID0: Size:1.88GiB, Used:112.00KiB (0.01%)
   /dev/sda   32.00MiB
   /dev/sdae      32.00MiB
...

System,RAID0: Size:58.00MiB, Used:16.00KiB (0.03%)
   /dev/sdaj       1.00MiB
   /dev/sdao       1.00MiB
...

Unallocated:
   /dev/sda   18.19TiB
   /dev/sdae      18.19TiB
...
geerlingguy commented 2 years ago

Quick disk-benchmark.sh result for btrfs RAID 0:

Benchmark Result
fio 1M sequential read 213 MB/s
iozone 1M random read 144.82 MB/s
iozone 1M random write 233.90 MB/s
iozone 4K random read 19.45 MB/s
iozone 4K random write 15.92 MB/s
geerlingguy commented 2 years ago

Testing network copy performance:

# Install Samba.
$ sudo apt install -y samba samba-common-bin
$ sudo mkdir /btrfs/shared
$ sudo chmod -R 777 /btrfs/shared
$ sudo nano /etc/samba/smbd.conf

[shared]
path=/btrfs/shared
writeable=Yes
create mask=0777
directory mask=0777
public=yes

$ sudo systemctl restart smbd
geerlingguy commented 2 years ago

So... I started a 70 GB copy of a ton of video files for my current project to the btrfs RAID 0 array, and it kicked off going from 100-119 MB/sec, but after a couple minutes, got a lot slower (30 MB/sec). Then it started stalling out, and after a while Finder threw an error.

Over on the server side, I found this in dmesg: https://gist.github.com/geerlingguy/90a25813dfcdc26c1d4ab503bd7550d4

And if I check the btrfs filesystem status I see:

$ sudo btrfs filesystem show
Label: 'btrfs'  uuid: 82bc04c9-f954-4fe5-a2d5-f8f56021c904
    Total devices 60 FS bytes used 138.09MiB
    devid    2 size 18.19TiB used 202.62MiB path /dev/sdae
    devid    3 size 18.19TiB used 203.62MiB path /dev/sdaj
    devid    4 size 18.19TiB used 203.62MiB path /dev/sdao
    devid    6 size 18.19TiB used 203.62MiB path /dev/sday
    devid    7 size 18.19TiB used 203.62MiB path /dev/sdbc
    devid    8 size 18.19TiB used 203.62MiB path /dev/sdbh
    devid    9 size 18.19TiB used 203.62MiB path /dev/sdg
    devid   10 size 18.19TiB used 203.62MiB path /dev/sdl
    devid   11 size 18.19TiB used 203.62MiB path /dev/sdq
    devid   12 size 18.19TiB used 203.62MiB path /dev/sdv
    devid   13 size 18.19TiB used 203.62MiB path /dev/sdaa
    devid   14 size 18.19TiB used 203.62MiB path /dev/sdaf
    devid   15 size 18.19TiB used 203.62MiB path /dev/sdak
    devid   16 size 18.19TiB used 203.62MiB path /dev/sdap
    devid   18 size 18.19TiB used 203.62MiB path /dev/sdaz
    devid   19 size 18.19TiB used 203.62MiB path /dev/sdbd
    devid   20 size 18.19TiB used 203.62MiB path /dev/sdc
    devid   21 size 18.19TiB used 203.62MiB path /dev/sdh
    devid   22 size 18.19TiB used 203.62MiB path /dev/sdm
    devid   23 size 18.19TiB used 203.62MiB path /dev/sdr
    devid   24 size 18.19TiB used 203.62MiB path /dev/sdw
    devid   26 size 18.19TiB used 203.62MiB path /dev/sdag
    devid   27 size 18.19TiB used 203.62MiB path /dev/sdal
    devid   28 size 18.19TiB used 203.62MiB path /dev/sdaq
    devid   29 size 18.19TiB used 203.62MiB path /dev/sdav
    devid   30 size 18.19TiB used 203.62MiB path /dev/sdb
    devid   31 size 18.19TiB used 203.62MiB path /dev/sdbe
    devid   32 size 18.19TiB used 203.62MiB path /dev/sdd
    devid   33 size 18.19TiB used 203.62MiB path /dev/sdi
    devid   34 size 18.19TiB used 203.62MiB path /dev/sdn
    devid   35 size 18.19TiB used 203.62MiB path /dev/sds
    devid   36 size 18.19TiB used 203.62MiB path /dev/sdx
    devid   37 size 18.19TiB used 203.62MiB path /dev/sdac
    devid   38 size 18.19TiB used 203.62MiB path /dev/sdah
    devid   39 size 18.19TiB used 203.62MiB path /dev/sdam
    devid   40 size 18.19TiB used 203.62MiB path /dev/sdar
    devid   41 size 18.19TiB used 203.62MiB path /dev/sdaw
    devid   42 size 18.19TiB used 203.62MiB path /dev/sdba
    devid   43 size 18.19TiB used 203.62MiB path /dev/sdbf
    devid   44 size 18.19TiB used 203.62MiB path /dev/sde
    devid   45 size 18.19TiB used 203.62MiB path /dev/sdj
    devid   47 size 18.19TiB used 203.62MiB path /dev/sdt
    devid   48 size 18.19TiB used 203.62MiB path /dev/sdy
    devid   49 size 18.19TiB used 203.62MiB path /dev/sdad
    devid   50 size 18.19TiB used 203.62MiB path /dev/sdai
    devid   51 size 18.19TiB used 203.62MiB path /dev/sdan
    devid   52 size 18.19TiB used 203.62MiB path /dev/sdas
    devid   53 size 18.19TiB used 203.62MiB path /dev/sdax
    devid   54 size 18.19TiB used 203.62MiB path /dev/sdbb
    devid   55 size 18.19TiB used 203.62MiB path /dev/sdbg
    devid   56 size 18.19TiB used 203.62MiB path /dev/sdf
    devid   57 size 18.19TiB used 203.62MiB path /dev/sdk
    devid   58 size 18.19TiB used 203.62MiB path /dev/sdp
    devid   59 size 18.19TiB used 203.62MiB path /dev/sdu
    devid   60 size 18.19TiB used 203.62MiB path /dev/sdz
    *** Some devices missing

So it looks like a similar error, where HBAs just kinda jump offline, and not all drives come back. Something weird like that.

geerlingguy commented 2 years ago

After reboot, the filesystem was intact.

I did a small file copy over samba (< 100 MB), and it copied almost instantly, and worked. Then I did a larger file (400 MB), and it failed in the same way, but this time I was watching dmesg, and these are the initial failures when it seems the HBAs or driver gets overloaded:

[  278.151884] mpt3sas_cm1 fault info from func: mpt3sas_base_make_ioc_ready
[  278.151904] mpt3sas_cm1: fault_state(0x2623)!
[  278.151911] mpt3sas_cm1: sending diag reset !!
[  278.727834] mpt3sas_cm3 fault info from func: mpt3sas_base_make_ioc_ready
[  278.727851] mpt3sas_cm3: fault_state(0x2623)!
[  278.727858] mpt3sas_cm3: sending diag reset !!
[  278.999377] mpt3sas_cm1: diag reset: SUCCESS
[  279.062419] mpt3sas_cm1: CurrentHostPageSize is 0: Setting default host page size to 4k
[  279.181737] mpt3sas_cm1: _base_display_fwpkg_version: complete
[  279.181753] mpt3sas_cm1: FW Package Ver(05.00.00.00)
[  279.181892] mpt3sas_cm1: TimeSync Interval in Manuf page-11 is not enabled. Periodic Time-Sync will be disabled
[  279.182350] mpt3sas_cm1: SAS3616: FWVersion(05.00.00.00), ChipRevision(0x02), BiosVersion(09.09.00.00)
[  279.182359] NVMe
[  279.182363] mpt3sas_cm1: Protocol=(Initiator,Target), Capabilities=(TLR,EEDP,Diag Trace Buffer,Task Set Full,NCQ)
[  279.182477] mpt3sas_cm1: log_info(0x300301e0): originator(IOP), code(0x03), sub_code(0x01e0)
[  279.182511] mpt3sas_cm1: log_info(0x300301e0): originator(IOP), code(0x03), sub_code(0x01e0)
[  279.182544] mpt3sas_cm1: log_info(0x300301e0): originator(IOP), code(0x03), sub_code(0x01e0)
[  279.182576] mpt3sas_cm1: log_info(0x300301e0): originator(IOP), code(0x03), sub_code(0x01e0)
[  279.182615] mpt3sas_cm1: sending port enable !!
[  279.577425] mpt3sas_cm3: diag reset: SUCCESS
[  279.640569] mpt3sas_cm3: CurrentHostPageSize is 0: Setting default host page size to 4k
[  279.760314] mpt3sas_cm3: _base_display_fwpkg_version: complete
[  279.760331] mpt3sas_cm3: FW Package Ver(05.00.00.00)
[  279.760470] mpt3sas_cm3: TimeSync Interval in Manuf page-11 is not enabled. Periodic Time-Sync will be disabled
[  279.760926] mpt3sas_cm3: SAS3616: FWVersion(05.00.00.00), ChipRevision(0x02), BiosVersion(00.00.00.00)
[  279.760935] NVMe
[  279.760939] mpt3sas_cm3: Protocol=(Initiator,Target), Capabilities=(TLR,EEDP,Diag Trace Buffer,Task Set Full,NCQ)
[  279.761052] mpt3sas_cm3: log_info(0x300301e0): originator(IOP), code(0x03), sub_code(0x01e0)
[  279.761085] mpt3sas_cm3: log_info(0x300301e0): originator(IOP), code(0x03), sub_code(0x01e0)
[  279.761118] mpt3sas_cm3: log_info(0x300301e0): originator(IOP), code(0x03), sub_code(0x01e0)
[  279.761151] mpt3sas_cm3: log_info(0x300301e0): originator(IOP), code(0x03), sub_code(0x01e0)
[  279.761189] mpt3sas_cm3: sending port enable !!
[  302.003185] mpt3sas_cm1: port enable: SUCCESS
...

At that point, the cards seem to re-initialize. And what's really interesting is this time, the array recovered before the Finder file copy timed out, and so the copy finished successfully. And copying the 400 MB file back had no problem either. So it definitely seems to be a stability issue when writing bytes out to all 60 drives at once, continuously. CPU load maxes out at least one core whenever writes are saturating the bus.

I think my next plan is to do a JBOD-style array, so have the entire array set up sequentially (no RAID 0 striping), and see if writing in that manner is more efficient / less error-prone.

geerlingguy commented 2 years ago

Deleted the existing btrfs array with:

$ sudo systemctl stop smbd
$ sudo umount /btrfs
$ sudo wipefs --all -t btrfs /dev/sda /dev/sdae ...

And creating a new one with single for JBOD-style:

$ sudo mkfs.btrfs -L btrfs -d raid0 -m raid0 -f /dev/sda /dev/sdae /dev/sdaj /dev/sdao /dev/sdat /dev/sday /dev/sdbc /dev/sdbh /dev/sdg /dev/sdl /dev/sdq /dev/sdv /dev/sdaa /dev/sdaf /dev/sdak /dev/sdap /dev/sdau /dev/sdaz /dev/sdbd /dev/sdc  /dev/sdh /dev/sdm /dev/sdr /dev/sdw /dev/sdab /dev/sdag /dev/sdal /dev/sdaq /dev/sdav /dev/sdb  /dev/sdbe /dev/sdd  /dev/sdi /dev/sdn /dev/sds /dev/sdx /dev/sdac /dev/sdah /dev/sdam /dev/sdar /dev/sdaw /dev/sdba /dev/sdbf /dev/sde  /dev/sdj /dev/sdo /dev/sdt /dev/sdy /dev/sdad /dev/sdai /dev/sdan /dev/sdas /dev/sdax /dev/sdbb /dev/sdbg /dev/sdf  /dev/sdk /dev/sdp /dev/sdu /dev/sdz

$ sudo btrfs filesystem show
Label: 'btrfs'  uuid: b4a4b388-a948-4ca3-928c-469296e79e50
    Total devices 60 FS bytes used 128.00KiB
    devid    1 size 18.19TiB used 202.62MiB path /dev/sda
    devid    2 size 18.19TiB used 202.62MiB path /dev/sdae
...

$ sudo mount /dev/sda /btrfs
$ sudo mkdir /btrfs/shared
$ sudo chmod 777 /btrfs/shared
$ sudo systemctl start smbd
geerlingguy commented 2 years ago

Quick disk-benchmark.sh result for Btrfs 'single':

Benchmark Result
fio 1M sequential read 211 MB/s
iozone 1M random read 146.61 MB/s
iozone 1M random write 274.35 MB/s
iozone 4K random read 20.22 MB/s
iozone 4K random write 16.24 MB/s
geerlingguy commented 2 years ago

And after some more messing around, I am able to reliably get one of the cards (sometimes two) to do that cycle with heavy write activity.

And usually one or two drives doesn't reappear until after a full reboot of the system (this time it was just sdr).

$ sudo wipefs --all -t btrfs /dev/sda /dev/sdae /dev/sdaj /dev/sdao /dev/sdat /dev/sday /dev/sdbc /dev/sdbh /dev/sdg /dev/sdl /dev/sdq /dev/sdv /dev/sdaa /dev/sdaf /dev/sdak /dev/sdap /dev/sdau /dev/sdaz /dev/sdbd /dev/sdc  /dev/sdh /dev/sdm /dev/sdr /dev/sdw /dev/sdab /dev/sdag /dev/sdal /dev/sdaq /dev/sdav /dev/sdb  /dev/sdbe /dev/sdd  /dev/sdi /dev/sdn /dev/sds /dev/sdx /dev/sdac /dev/sdah /dev/sdam /dev/sdar /dev/sdaw /dev/sdba /dev/sdbf /dev/sde  /dev/sdj /dev/sdo /dev/sdt /dev/sdy /dev/sdad /dev/sdai /dev/sdan /dev/sdas /dev/sdax /dev/sdbb /dev/sdbg /dev/sdf  /dev/sdk /dev/sdp /dev/sdu /dev/sdz
wipefs: error: /dev/sdr: probing initialization failed: No such file or directory

Going to try an mdadm linear array next...

$ sudo mdadm --create --verbose /dev/md0 --level=linear --raid-devices=60 /dev/sda ...

It's nice to be able to more easily see the speed it's building the ext4 filesystem on the linear array—I can see each disk getting written to via atop sequentially, while 'Writing inode tables' is going on. And hopefully since it's just writing to one disk after the other, whatever condition is triggering the card reset won't happen.