geerlingguy / raspberry-pi-pcie-devices

Raspberry Pi PCI Express device compatibility database
http://pipci.jeffgeerling.com
GNU General Public License v3.0
1.52k stars 135 forks source link

Test LSI 9405W-16i SAS/NVMe HBA #196

Closed geerlingguy closed 2 years ago

geerlingguy commented 2 years ago

The LSI 9405W-16i HBA Should be similar to the 9460-16i, and should hopefully be supported on ARM (to some extent) unlike older cards like the 9305-16i (see #195). (Adding the term 9405 so this will also pop up in search.)

9405w-16i_hba_angle

geerlingguy commented 2 years ago

That took quite some time—and mounting took a few minutes on its own, as each block device was slowly read!

Here's benchmark results on the linear array (probably should be close to what an individual drive gets):

Benchmark Result
fio 1M sequential read 214 MB/s
iozone 1M random read 95.44 MB/s
iozone 1M random write 130.94 MB/s
iozone 4K random read 22.87 MB/s
iozone 4K random write 17.48 MB/s
geerlingguy commented 2 years ago

file-copy-samba-1

Testing a 73 GB SMB file copy from my Mac over the wired network, I was averaging 100-110 MB/sec throughout, though there were a number of periods where the speed would dip to 10-30 MB/sec and get spikey for up to 20 seconds or so:

file-copy-spikey-samba-2

smbd on the Pi was running at 130-150% CPU, and IRQ interrupts were high but only in the 50-80% range. I'm not 100% sure where the actual bottlenecking was.

geerlingguy commented 2 years ago

In speaking with a Broadcom engineer, it sounds like the error I'm seeing in the other RAID conditions, namely:

[  278.151884] mpt3sas_cm1 fault info from func: mpt3sas_base_make_ioc_ready
[  278.151904] mpt3sas_cm1: fault_state(0x2623)!
[  278.151911] mpt3sas_cm1: sending diag reset !!

Is a PCI Express message pull error detecting potential data corruption—and that's either related to signal integrity of the cabling or power.

#define IFAULT_IOP_PCI_EXPRESS_MSGPULLDMA_ERROR         (0x2623) /**< Message Pull State Machine encountered an error. */

I wouldn't be surprised if either, honestly... I'm using one of those GPU mining boards, and as I've learned in the past, they're not the paragon of excellence.

geerlingguy commented 2 years ago

That same engineer also recommended upgrading the cards' firmware. Right now they're all on version 05.00.00.00:

pi@sas:~ $ sudo ./storcli64 /c3 show
CLI Version = 007.2103.0000.0000 Dec 08, 2021
Operating system = Linux 5.15.35-v8+
Controller = 3
Status = Success
Description = None

Product Name = HBA 9405W-16i
Serial Number = SP93121358
SAS Address =  500605b00f3df7f0
PCI Address = 00:06:00:00
System Time = 05/13/2022 11:16:14
FW Package Build = 05.00.00.00
FW Version = 05.00.00.00
BIOS Version = 09.09.00.00_05.00.00.00
NVDATA Version = 04.03.00.03
PSOC Version = 00000000
Driver Name = mpt3sas
Driver Version = 39.100.00.00

Latest version is from January: https://docs.broadcom.com/docs/9405W_16i_Pkg_P22_SAS_SATA_NVMe_FW_BIOS_UEFI.zip (22.00.00.00).

geerlingguy commented 2 years ago

Hmm...

$ sudo /home/pi/storcli64 /c0 download file=/home/pi/9405W_16i_Pkg_P22_SAS_SATA_NVMe_FW_BIOS_UEFI/Firmware/HBA_9405W-16i_Mixed_Profile.bin
Downloading image.Please wait...

CLI Version = 007.2103.0000.0000 Dec 08, 2021
Operating system = Linux 5.15.35-v8+
Controller = 0
Status = Failure
Description = The firmware flash image is invalid
geerlingguy commented 2 years ago

Switching gears one more time... what if I create a RAID 6 array on the 16 drives attached to each storage controller, then stripe them together using mdadm?

./storcli64 /c0 add vd type=raid6 drives=0:0-15
./storcli64 /c1 add vd type=raid6 drives=0:0-15
./storcli64 /c2 add vd type=raid6 drives=0:0-15
./storcli64 /c3 add vd type=raid6 drives=0:0-15

However... it looks like the drives are all in JBOD mode right now, and if I try setting jbod=off with sudo ./storcli64 /c0 set jbod=off I get 'Un-supported command'. So trying to figure out if this will be possible.

Also, I might not be able to flash the firmware on the Pi; it might have to be done on an x86 machine :(

geerlingguy commented 2 years ago

New video featuring this card is live here: https://www.youtube.com/watch?v=BBnomwpF_uY

geerlingguy commented 2 years ago

Both @Coreforge and dumbasPL suggested in YouTube comments (example) forcing PCIe gen1 speeds for better stability. Might have to try that then try the various RAID setups again.

A few things I'd like to test before swapping back to the Xeon setup and pulling these HBAs:

slaygirlz commented 2 years ago

wait so this project started a year ago wow that's one day after my b day

MartijnVdS commented 2 years ago

A data point:

I have a 9405W-16i card in my Ryzen 1600 server, and I can't upgrade to any firmware newer than 14.0.0.0, with that same error message (I tried it from Linux and from an UEFI shell)

geerlingguy commented 2 years ago

@MartijnVdS - Oh... the plot thickens. Also, one of the Broadcom engineers mentioned I might be able to do incremental upgrades, starting from an older version and slowly progressing. But it would be interesting to see if I can get to 14.x but no further too. I'll test with an older revision early this week.

(To add a note since I didn't update this issue: I have tried upgrading to 22.x in my PC and it said the signature wasn't valid.)

MartijnVdS commented 2 years ago

That's how I got to version 14. But all firmware versions newer than that fail to install with that "The firmware flash image is invalid" message.

geerlingguy commented 2 years ago

Alrighty then... after far more debugging than I'd ever like to attempt again—but will, inevitably—I found out you can use at least the P14 version of StorCLI, I think from sometime in April 2020, and flash any of the images to the card, including the latest P23.

Just like with how one line patches usually have a backstory...

  1. Following the advice of some of the firmware maintainers, I upgraded one card only two releases at a time, and downloaded an older StorCLI version to do so.
  2. I started with P12 since that was a bit newer than P05 that I was running, but older than P14 which @MartijnVdS said he had running. That actually worked without a hitch using the latest storcli version (2022 release).
  3. I then did P14, and that worked too. Yay!
  4. I tried P15, P16, P17, and P18, and always got the The firmware flash image is invalid warning.
  5. I then downloaded the P14 version of StorCLI from April 2020 (I think, could be off a little on the date, but it was the P14 version), and tried again. Upgraded to P16. Success. P18. Success. Kept rebooting between each try.
  6. Windows Update decided it had enough of my clicking ignore and forced me to sit and wait for 20 minutes while it did it's thing.
  7. I switched back to the latest version of StorCLI and tried P20. Got The firmware flash image is invalid. Switched back to P14 version of StorCLI, P20 flash successful.
  8. Rinse and repeat with P22, and finally P23. Works fine with the P14 version, but the latest StorCLI (and the previous revision) don't work.

So then I pulled the 2nd card out of the Storinator, and plugged it in, and decided to try going straight from P05 to P23 using the P14 revision of StorCLI. It worked!

So then I pulled the other two cards and quickly flashed them up to P23. I'm wondering at this point if the flashing might've worked on the Pi, too, if I had run an older version of StorCLI...

Anyways, the proof is in the screenshot:

Powershell-Firmware-Version-23

MartijnVdS commented 2 years ago

Just to confirm: the above procedure works for me as well.

The hardest part was finding the old version of StorCLI on Broadcom's support web site.

After a lot of searching I ended up here: https://www.broadcom.com/support/download-search?pg=Storage+Adapters,+Controllers,+and+ICs&pf=SAS/SATA/NVMe+Host+Bus+Adapters&pn=&pa=&po=&dk=9405w&pl=&l=false

which has a heading "Management Software and Tools", which in turn has a tab "Archive", where you can find the older version.

You can also download the firmware there (make sure you get the correct one -- 16i or 16e).

geerlingguy commented 2 years ago

@MartijnVdS - Good to know I'm not alone there! And like you, I spent a while clicking around in vain trying to find older versions until I finally found the "Archive" link in each section. The results aren't really google-able either, since they're all buried in a javascript frontend :(

geerlingguy commented 2 years ago

All right, another BTRFS raid 0 attempt:

pi@sas:~ $ sudo mkfs.btrfs -L btrfs -d raid0 -m raid0 -f /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr /dev/sds /dev/sdt /dev/sdu /dev/sdv /dev/sdw /dev/sdx /dev/sdy /dev/sdz /dev/sdaa /dev/sdab /dev/sdac /dev/sdad /dev/sdae /dev/sdaf /dev/sdag /dev/sdah /dev/sdai /dev/sdaj /dev/sdak /dev/sdal /dev/sdam /dev/sdan /dev/sdao /dev/sdap /dev/sdaq /dev/sdar /dev/sdas /dev/sdat /dev/sdau /dev/sdav /dev/sdaw /dev/sdax /dev/sday /dev/sdaz /dev/sdba /dev/sdbb /dev/sdbc /dev/sdbd /dev/sdbe /dev/sdbf /dev/sdbg /dev/sdbh
btrfs-progs v5.10.1 
See http://btrfs.wiki.kernel.org for more information.

Label:              btrfs
UUID:               23f7340d-1a34-46d0-acc4-58c9418a90f3
Node size:          16384
Sector size:        4096
Filesystem size:    1.07PiB
Block group profiles:
  Data:             RAID0            10.00GiB
  Metadata:         RAID0             1.88GiB
  System:           RAID0            58.00MiB
SSD detected:       no
Incompat features:  extref, skinny-metadata
Runtime features:   
Checksum:           crc32c
Number of devices:  60
Devices:
   ID        SIZE  PATH
    1    18.19TiB  /dev/sda
    2    18.19TiB  /dev/sdb
    3    18.19TiB  /dev/sdc
    4    18.19TiB  /dev/sdd
    5    18.19TiB  /dev/sde
    6    18.19TiB  /dev/sdf
    7    18.19TiB  /dev/sdg
    8    18.19TiB  /dev/sdh
    9    18.19TiB  /dev/sdi
   10    18.19TiB  /dev/sdj
   11    18.19TiB  /dev/sdk
   12    18.19TiB  /dev/sdl
   13    18.19TiB  /dev/sdm
   14    18.19TiB  /dev/sdn
   15    18.19TiB  /dev/sdo
   16    18.19TiB  /dev/sdp
   17    18.19TiB  /dev/sdq
   18    18.19TiB  /dev/sdr
   19    18.19TiB  /dev/sds
   20    18.19TiB  /dev/sdt
   21    18.19TiB  /dev/sdu
   22    18.19TiB  /dev/sdv
   23    18.19TiB  /dev/sdw
   24    18.19TiB  /dev/sdx
   25    18.19TiB  /dev/sdy
   26    18.19TiB  /dev/sdz
   27    18.19TiB  /dev/sdaa
   28    18.19TiB  /dev/sdab
   29    18.19TiB  /dev/sdac
   30    18.19TiB  /dev/sdad
   31    18.19TiB  /dev/sdae
   32    18.19TiB  /dev/sdaf
   33    18.19TiB  /dev/sdag
   34    18.19TiB  /dev/sdah
   35    18.19TiB  /dev/sdai
   36    18.19TiB  /dev/sdaj
   37    18.19TiB  /dev/sdak
   38    18.19TiB  /dev/sdal
   39    18.19TiB  /dev/sdam
   40    18.19TiB  /dev/sdan
   41    18.19TiB  /dev/sdao
   42    18.19TiB  /dev/sdap
   43    18.19TiB  /dev/sdaq
   44    18.19TiB  /dev/sdar
   45    18.19TiB  /dev/sdas
   46    18.19TiB  /dev/sdat
   47    18.19TiB  /dev/sdau
   48    18.19TiB  /dev/sdav
   49    18.19TiB  /dev/sdaw
   50    18.19TiB  /dev/sdax
   51    18.19TiB  /dev/sday
   52    18.19TiB  /dev/sdaz
   53    18.19TiB  /dev/sdba
   54    18.19TiB  /dev/sdbb
   55    18.19TiB  /dev/sdbc
   56    18.19TiB  /dev/sdbd
   57    18.19TiB  /dev/sdbe
   58    18.19TiB  /dev/sdbf
   59    18.19TiB  /dev/sdbg
   60    18.19TiB  /dev/sdbh

Then:

pi@sas:~ $ sudo mount /dev/sda /btrfs
pi@sas:~ $ sudo btrfs filesystem usage /btrfs
Overall:
    Device size:           1.07PiB
    Device allocated:         11.93GiB
    Device unallocated:        1.07PiB
    Device missing:          0.00B
    Used:            128.00KiB
    Free (estimated):          1.07PiB  (min: 1.07PiB)
    Free (statfs, df):         1.07PiB
    Data ratio:               1.00
    Metadata ratio:           1.00
    Global reserve:        3.25MiB  (used: 0.00B)
    Multiple profiles:              no

Data,RAID0: Size:10.00GiB, Used:0.00B (0.00%)
   /dev/sda  170.62MiB
   /dev/sdb  170.62MiB
...
geerlingguy commented 2 years ago

Comparing to earlier btrfs raid0 using disk-benchmark.sh:

Benchmark Result (fw 05) Result (fw 23)
fio 1M sequential read 213 MB/s 238 MB/s
iozone 1M random read 144.82 MB/s 142.68 MB/s
iozone 1M random write 233.90 MB/s 245.33 MB/s
iozone 4K random read 19.45 MB/s 19.88 MB/s
iozone 4K random write 15.92 MB/s 16.35 MB/s
geerlingguy commented 2 years ago

Doing a network file copy of 30 GB results in similar behavior as before—with faults like 0x5854 and 0x2623—but it seems like the system is recovering in time for the file copy to progress.

I could also manually cancel the file copy in progress from macOS Finder, and after the array recovered, it cancelled the copy. Nice!

Afterwards, without having to reboot, I still had a clean btrfs mount:

pi@sas:~ $ sudo btrfs filesystem show
Label: 'btrfs'  uuid: 23f7340d-1a34-46d0-acc4-58c9418a90f3
    Total devices 60 FS bytes used 2.68GiB
    devid    1 size 18.19TiB used 202.62MiB path /dev/sda
    devid    2 size 18.19TiB used 202.62MiB path /dev/sdb
    devid    3 size 18.19TiB used 203.62MiB path /dev/sdc
...
    devid   59 size 18.19TiB used 203.62MiB path /dev/sdbg
    devid   60 size 18.19TiB used 203.62MiB path /dev/sdbh

So it looks like things are more stable, but there's probably still a power issue for that PCIe riser, or a signaling issue with the USB 3.0 cable that goes from the Pi to the riser.

geerlingguy commented 2 years ago

Other things I'd still like to try:

geerlingguy commented 2 years ago

Testing the link speed switching using pcie_set_speed.sh:

pi@sas:~ $ sudo ./pcie-set-speed.sh 03:00.0 1
Link capabilities: 0173dc12
Max link speed: 2
Link status: 7012
Current link speed: 2
Configuring 0000:02:01.0...
Original link control 2: 00000002
Original link target speed: 2
New target link speed: 1
New link control 2: 00000001
Triggering link retraining...
Original link control: 70110040
New link control: 70110060
Link status: 7011
Current link speed: 1

(I repeated that for all the devices 3-6.) Then the device speeds according to sudo lspci -vvvv were:

        LnkSta: Speed 2.5GT/s (downgraded), Width x1 (downgraded)
            TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-

Trying the large network file copy again...

geerlingguy commented 2 years ago

Network copy succeeded when the cards were all at Gen 1 link speeds, and the overall copy performance over SMB was about the same! I also re-ran the disk benchmark:

Benchmark Result (fw 23 Gen 2) Result (fw 23 Gen 1)
fio 1M sequential read 238 MB/s 113 MB/s
iozone 1M random read 142.68 MB/s 119.99 MB/s
iozone 1M random write 245.33 MB/s 142.53 MB/s
iozone 4K random read 19.88 MB/s 15.96 MB/s
iozone 4K random write 16.35 MB/s 12.77 MB/s

As expected, raw performance is lower. So things like resilvering would be extremely slow on an array with disks this large. But as a slow network file copy destination, it seems like if you want actual RAID instead of linear disk storage, it's doable at PCIe gen 1 link speeds.

I'm planning on testing banks of 15, 30, and 45 drives next, and for 15, testing with an HBA directly connected to the Pi, then through the PCIe switch, to see if the switch makes a difference even with just one card.

geerlingguy commented 2 years ago

Testing with RAID 0 on one card only, via PCIe switch:

pi@sas:~ $ sudo mkfs.btrfs -L btrfs -d raid0 -m raid0 -f /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn
Label:              btrfs
UUID:               a0cbd908-ead8-421e-a6f5-6a68963ed655
Node size:          16384
Sector size:        4096
Filesystem size:    254.67TiB
Block group profiles:
  Data:             RAID0            10.00GiB
  Metadata:         RAID0          1023.75MiB
  System:           RAID0            15.75MiB
SSD detected:       no
Incompat features:  extref, skinny-metadata
Runtime features:   
Checksum:           crc32c
Number of devices:  14
Devices:
   ID        SIZE  PATH
    1    18.19TiB  /dev/sda
    2    18.19TiB  /dev/sdb
    3    18.19TiB  /dev/sdc
    4    18.19TiB  /dev/sdd
    5    18.19TiB  /dev/sde
    6    18.19TiB  /dev/sdf
    7    18.19TiB  /dev/sdg
    8    18.19TiB  /dev/sdh
    9    18.19TiB  /dev/sdi
   10    18.19TiB  /dev/sdj
   11    18.19TiB  /dev/sdk
   12    18.19TiB  /dev/sdl
   13    18.19TiB  /dev/sdm
   14    18.19TiB  /dev/sdn
Benchmark Result (btrfs RAID 0 single HBA, 15 drives)
fio 1M sequential read 237.00 MB/s
iozone 1M random read 119.64 MB/s
iozone 1M random write 295.43 MB/s
iozone 4K random read 24.45 MB/s
iozone 4K random write 8.07 MB/s
geerlingguy commented 2 years ago

The network copy was successful through the PCIe switch, too, so it's definitely some sort of issue with multiple cards behind the switch.

Doing the same benchmark, but with the card connected directly to the Pi:

Benchmark Result (single HBA, switch) Result (single HBA, direct)
fio 1M sequential read 237.00 MB/s 272.00 MB/s
iozone 1M random read 119.64 MB/s 114.77 MB/s
iozone 1M random write 295.43 MB/s 294.09 MB/s
iozone 4K random read 24.45 MB/s 24.09 MB/s
iozone 4K random write 8.07 MB/s 9.36 MB/s

So the switch doesn't seem to make much difference, except maybe in the case of raw block access to a single drive (that's what the fio benchmark I'm using is actually testing... /dev/sda in this case). The other tests are running through the Btrfs RAID 0 array.

geerlingguy commented 2 years ago

Well, I maybe spoke too soon. With the card direct connected, the network file copy ran at about 70 MB/sec instead of the 50-55 MB/sec I was getting when I had the HBA behind the switch. Also, file copies where it's just a read (copying data from the Pi to my Mac) max out the throughput at about 110 MB/sec.

I noticed there is a cycle when writing the data to the drives:

  1. A large amount of data is copied at 900+ Mbps and IRQs go up to 70-80% according to atop
  2. RAM cache seems to fill
  3. Hard drive utilization jumps to busy / 60-70% as IRQs fall off and network speed drops to 300-400 Mbps
  4. RAM cache clears out a little, and network copy goes back to 900+ Mbps
  5. Repeat

I'm going to test once more through the switch to see if my network file copy testing was a fluke, or what.

geerlingguy commented 2 years ago

Okay, so my earlier test with one card in the switch must've been strange, because now I'm getting identical performance through both the switch and the Pi direct. Anyways, next tests are trying 30 drives, then 45, to see when we start getting those weird errors.

geerlingguy commented 2 years ago

System power draw:

Number of Drives Idle draw Benchmark draw Maximum draw (boot)
15 199W 217W 315W
60 502W 512W 632W

A few other measurements:

geerlingguy commented 2 years ago

Now trying a 30 drive RAID:

$ sudo mkfs.btrfs -L btrfs -d raid0 -m raid0 -f /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr /dev/sds /dev/sdt /dev/sdu /dev/sdv /dev/sdw /dev/sdx /dev/sdy /dev/sdz /dev/sdaa /dev/sdab /dev/sdac /dev/sdad

Label:              btrfs
UUID:               9c9023ac-5b97-44db-b2b0-a35b525854a0
Node size:          16384
Sector size:        4096
Filesystem size:    545.71TiB
Block group profiles:
  Data:             RAID0            10.00GiB
  Metadata:         RAID0          1023.75MiB
  System:           RAID0            30.00MiB
SSD detected:       no
Incompat features:  extref, skinny-metadata
Runtime features:   
Checksum:           crc32c
Number of devices:  30
Devices:
   ID        SIZE  PATH
    1    18.19TiB  /dev/sda
...

And benchmark results both switched:

Benchmark Result (single HBA, 15 drives) Result (two HBA, 30 drives)
fio 1M sequential read 237.00 MB/s 221.00 MB/s
iozone 1M random read 119.64 MB/s 144.93 MB/s
iozone 1M random write 295.43 MB/s 201.44 MB/s
iozone 4K random read 24.45 MB/s 23.63 MB/s
iozone 4K random write 8.07 MB/s 13.42 MB/s
geerlingguy commented 2 years ago

The network copy definitely slows down with 30 drives, at least the write. I averaged around 50-70 MB/sec write speeds, though read is still at 110 MB/sec or so.

And with overclock at 2.147 GHz, I was able to get back up to 80-100 MB/sec speeds on the write. So CPU performance at the default 1.5 GHz clock definitely cripples us beyond one HBA. I'm going to test with 45 drives now.

geerlingguy commented 2 years ago
pi@sas:~ $ sudo mkfs.btrfs -L btrfs -d raid0 -m raid0 -f /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr /dev/sds /dev/sdt /dev/sdu /dev/sdv /dev/sdw /dev/sdx /dev/sdy /dev/sdz /dev/sdaa /dev/sdab /dev/sdac /dev/sdad /dev/sdae /dev/sdaf /dev/sdag /dev/sdah /dev/sdai /dev/sdaj /dev/sdak /dev/sdal /dev/sdam /dev/sdan /dev/sdao /dev/sdap /dev/sdaq /dev/sdar /dev/sdas

Label:              btrfs
UUID:               2ff7d20c-1fd4-46e9-b40f-0ba489607be3
Node size:          16384
Sector size:        4096
Filesystem size:    818.57TiB
Block group profiles:
  Data:             RAID0            10.00GiB
  Metadata:         RAID0             1.41GiB
  System:           RAID0            45.00MiB
SSD detected:       no
Incompat features:  extref, skinny-metadata
Runtime features:   
Checksum:           crc32c
Number of devices:  45
Devices:
   ID        SIZE  PATH
    1    18.19TiB  /dev/sda
...

And benchmark results:

Benchmark Result (1xHBA, 15 drives) Result (2xHBA, 30 drives) Result (3xHBA, 45 drives)
fio 1M sequential read 237.00 MB/s 221.00 MB/s 218.00 MB/s
iozone 1M random read 119.64 MB/s 144.93 MB/s 134.82 MB/s
iozone 1M random write 295.43 MB/s 201.44 MB/s 228.53 MB/s
iozone 4K random read 24.45 MB/s 23.63 MB/s 21.08 MB/s
iozone 4K random write 8.07 MB/s 13.42 MB/s 15.59 MB/s

That's without overclock. Overclock comparison:

Benchmark Result (45 drives, 1.5GHz) Result (45 drives, 2.2 GHz)
fio 1M sequential read 218.00 MB/s 257.00 MB/s
iozone 1M random read 134.82 MB/s 177.17 MB/s
iozone 1M random write 228.53 MB/s 221.99 MB/s
iozone 4K random read 21.08 MB/s 20.85 MB/s
iozone 4K random write 15.59 MB/s 17.93 MB/s
geerlingguy commented 2 years ago

All right, with or without overclock, we start hitting the random card resets/PCIe errors with 3 HBAs (45 drives) during SMB copies. I'm going to swap to the 3rd and 4th HBA and see if maybe it's just one bad HBA (though I've seen multiple cards reset when running 60 drives... I just want to verify it's the number of HBAs, and not necessarily a bad HBA).

geerlingguy commented 2 years ago

I swapped HBAs and still got the lockup—so at PCIe Gen 2 speeds, the Pi definitely starts having issues between 30 and 45 drives / 2-3 HBAs. Though I can't rule out the PCIe switch board I'm using either. The thing is... I've already invested at least a hundred or so hours into this (maybe more), and it's time to put a pin in it.

I think I can soundly recommend only running one HBA on a Raspberry Pi. 320 TB is good enough for anyone, right? Especially when you'll only reliably get 100 MB/sec of write speeds over the network, max.

I'll update the power specs in a bit. Not going to try a different USB 3.0 cable as I don't have a shorter one :P

geerlingguy commented 2 years ago

One more test running all 60 drives in btrfs RAID 0 with 2.2 GHz overclock and using PCIe Gen 1 link speed:

Benchmark Result (fw 23 Gen 2) Result (fw 23 Gen 1) Result (fw 23 Gen 1 OC)
fio 1M sequential read 238 MB/s 113 MB/s 137 MB/s
iozone 1M random read 142.68 MB/s 119.99 MB/s 149.06 MB/s
iozone 1M random write 245.33 MB/s 142.53 MB/s 164.02 MB/s
iozone 4K random read 19.88 MB/s 15.96 MB/s 20.19 MB/s
iozone 4K random write 16.35 MB/s 12.77 MB/s 17.17 MB/s
GhostDevCode commented 1 year ago

OK