geerlingguy / raspberry-pi-pcie-devices

Raspberry Pi PCI Express device compatibility database
http://pipci.jeffgeerling.com
GNU General Public License v3.0
1.52k stars 135 forks source link

Pi 5 HAT: Radxa Penta SATA HAT #615

Open geerlingguy opened 3 months ago

geerlingguy commented 3 months ago

Radxa sells an updated version of their Penta SATA HAT for $45, and it includes four SATA drive connectors, plus one edge connector for a 5th drive, 12V power inputs (molex or barrel jack) to power both the drives and the Pi 5 via GPIO, a cable for the 5th drive, an FFC cable to connect the HAT to the Pi 5, and screws for the mounting.

radxa-penta-hat

It looks like the SATA controller is a JMB585 PCIe Gen 3x2 SATA controller, so it could benefit from running the Pi 5's PCIe lane at Gen 3.0 speeds (setting dtparam=pciex1_gen=3 in /boot/firmware/config.txt). Radxa sent me a unit for testing.

ThomasKaiser commented 2 months ago

could you please share some information of the tool you're using for IO benchmark?

He's using https://raw.githubusercontent.com/geerlingguy/pi-cluster/master/benchmarks/disk-benchmark.sh to be called as explained in any of his sbc-review issues, e.g. this

There are at least three problems with this script, one being a major one:

To talk about disk performance a switch to performance governor would be needed prior to execution [1]

Quick test on a Rock 5 ITX with an 256 GB EVO Plus A2 SD card comparing three different settings:

performance (this represents 'storage performance w/o settings involved'):

READ: bw=87.2MiB/s (91.4MB/s), 87.2MiB/s-87.2MiB/s (91.4MB/s-91.4MB/s), io=999MiB (1048MB), run=11459-11459msec
                                                          random    random
          kB  reclen    write  rewrite    read    reread    read     write
      102400       4     2848     2924                      12238     2971
      102400    1024    62283    62087                      77176    61358

In contrast Radxa's defaults since 2022 and Armbian defaults until 2024: ondemand with io_is_busy=1:

READ: bw=81.4MiB/s (85.4MB/s), 81.4MiB/s-81.4MiB/s (85.4MB/s-85.4MB/s), io=935MiB (980MB), run=11482-11482msec
                                                          random    random
          kB  reclen    write  rewrite    read    reread    read     write
      102400       4     2838     2940                      11663     2921 
      102400    1024    60790    62549                      77639    60492                                                                

We see small drops in performance everywhere and also a bit of results variation since 2940 KB/s with ondemand compared to 2924 KB/s with performance can't be the result of settings since no other governor can 'outperform' performance:

Retesting with schedutil since being the new Armbian default from 2024 on and also what many SBC vendors might be using since for their OS images they usually don't think a single second about kernel config but just ship with the Android kernel the SoC vendor has thrown at them:

READ: bw=85.1MiB/s (89.3MB/s), 85.1MiB/s-85.1MiB/s (89.3MB/s-89.3MB/s), io=978MiB (1026MB), run=11490-11490msec
                                                          random    random
          kB  reclen    write  rewrite    read    reread    read     write
      102400       4     2062     2193                       8973     2165
      102400    1024    54671    53655                      61013    54159 

Compared to ondemand with the respective tweaks the important 4K performance dropped by 25%, with larger block sizes it's not that drastic and the fio test with the unrealistic 4 concurrent read jobs even improves (but since we haven't measured at least 3 times we have no idea whether these different numbers are due to different settings or more probably: 'results variation'. Running a benchmark only once is almost always wrong, it has to be repeated at least three times, then standard deviation has to be calculated and if too high either more measurements or results go into trash).

But what these synthetic benchmarks don't tell anyway: real-world storage performance that is easily halved by the switch to schedutil since unlike benchmarks with continuous storage access where the cpufreq driver has a chance to ramp up clockspeeds in real-world situations the clockspeeds will remain low when only short I/O accesses happen. That's what you get when you switch a central setting without any evaluation and obiously 'just for fun' :)

At least it should be obvious that disk-benchmark.sh is not able to report about disk performance but only 'disk performance tampered by some default settings' in its current form.

One might argue using 'OS defaults' would be the right thing since that's what they ship and users have to live with but me as someone who only does 'active reviews' (not just reporting numbers but improving numbers) can't disagree more since the best idea is to run the test in both modes: OS image defaults vs. performance and then pointing the OS image makers at the difference and hint at how to fix this (worked all the time, just not with the Banana Pi and Armbian guys).

[1] for Cluster in /sys/devices/system/cpu/cpufreq/policy* ; do [[ -e "${Cluster}" ]] || break; echo performance >"${Cluster}/scaling_governor"; done

geerlingguy commented 2 months ago

@ThomasKaiser - To properly benchmark storage solutions, you need to do a lot more than I think either of us do in a typical review cycle for a storage device.

In my case, when it actually matters, I will test across different OSes with 100+ GB files, with folders with 5,000+ small files, and average the eventual total time for the copy back and forth.

The disk-benchmark.sh script is a quick way to get an 'with the default OS image, in ideal circumstances, with smaller files, here's the kind of performance one can expect'. There are huge differences depending on if you use ext3/ext4, ZFS, Btrfs, Debian, Ubuntu, a board vendor's custom distro, performance or ondemand governors (which can change behavior even depending on the distro / image you might be using). It's a fool's game making definitive statements based on any single benchmarks, which is why I only use the disk-benchmarks.sh script for a quick "here's what this board does" thing.

And I do think it's useful to not sit there tweaking and tuning the default distro image for best performance, because I want my tests to reflect what most users will see. If they buy a Raspberry Pi, they will go to the docs and see they should flash Pi OS to the card using Imager.

The docs don't mention setting performance, so I don't do that in my "official" benchmarks. I follow the vendor guides as close as possible, and if their own images are poorly optimized, that's not a 'me' problem. And I'm happy to re-run the entire gauntlet if a vendor reaches out, like Turing Pi did with the RK1.

ThomasKaiser commented 2 months ago

@geerlingguy doesn't change anything wrt different testing methodology for sequential reads and writes. In case you accept https://github.com/geerlingguy/pi-cluster/pull/12 this will become obvious with future testings and then you might decide to adjust your reporting or not :)

geerlingguy commented 2 months ago

True; honestly my main concern is to have a few different tests since I know many people just throw hdparm at it and call it a day. I like fio and iozone a lot better, though I have yet to find a way to test all aspects of ZFS filesystems in a way that ZFS caching doesn't interfere (I wish there were a way to tell ZFS 'fill all caches, then run the test', instead of having to copy across tens of GB of files before starting to get more useful data).

ThomasKaiser commented 2 months ago

True; honestly my main concern is to have a few different tests since I know many people just throw hdparm at it and call it a day.

Correct, that's the garbage the majority of 'Linux tech youtubers' rely on. Ignoring (not knowing) that hdparm uses 128K block size which was huge when it was hardcoded (last century) but is a joke today.

I like fio and iozone a lot better

Both are storage performance tests unlike hdparm (which's benchmarking capabilites were a tool for kernel developers 30 years ago when only spinning rust existed attached by dog slow interfaces)

though I have yet to find a way to test all aspects of ZFS filesystems in a way that ZFS caching doesn't interfere

Simple solution: avoid ZFS for benchmarks and try to educate your target audience about the ZFS benefits (spoiler alert: they don't want this content ;) )

geerlingguy commented 2 months ago

re: ZFS: Avoiding it is impossible if you want to show people what kind of performance you get on a modern NAS, since it seems like half the homelab world is focused on ZFS, and the other half a split between old school RAID (mdadm), Btrfs, and all the weird unholy things proprietary vendors cobble together (like Unraid).

Also, if you don't mention ZFS when talking about storage, you end up with so many random comments about 'why not ZFS', it's the modern homelab equivalent to 'btw I use Arch' or 'why don't you use [vim|emacs|nano]?' :D

Unavoidable, unfortunately!

Anyway, I plan on deploying this HAT as a replica target for my main ZFS array... we'll see how that works out! Still looking to find a case for it. Too lazy to CAD my own heh

axiopaladin commented 2 months ago

If you still have this on-hand and would be willing to make a few measurements... How thick of a 2.5" drive can be mounted directly to the hat? Modern 2.5" SSDs are typically 7mm thick, while (high-capacity) 2.5" HDDs (lower speed but much cheaper per-TB) usually come in at 15mm thick. Are those too fat to stack all 4 slots?

teodoryantcheff commented 2 months ago

@axiopaladin image

pfriedel commented 2 months ago

And rsync was also highly consistent, but slower still, at like 27 MB/sec (using -avz):

Just as with Finder, rsync in stock macOS is kind of old and kind of trash:

% /usr/bin/rsync --version
rsync  version 2.6.9  protocol version 29
Copyright (C) 1996-2006 by Andrew Tridgell, Wayne Davison, and others.

It's old enough to vote! Which transfers a random 3.5G folder off my Desktop in 2 minutes and 38 seconds:

/usr/bin/rsync -avz source_folder /Volumes/home/test1 105.90s user 4.97s system 70% cpu 2:38.20 total

homebrew provides a much newer version:

% /opt/homebrew/bin/rsync --version
rsync  version 3.2.7  protocol version 31
Copyright (C) 1996-2022 by Andrew Tridgell, Wayne Davison, and others.

Which transfers the same folder off my Desktop in 1 minute and 23 seconds:

/opt/homebrew/bin/rsync -avz source_folder /Volumes/home/test2 18.36s user 8.67s system 32% cpu 1:23.53 total

I mean, that's still only like half of 1G line speed here, but better than 27M/s. I think I ended up having to use something like Get Backup Pro to really get the most performance out of my longer synchronization tasks - it's only another 10 seconds faster in this 3.5G test, but that compounds over hundreds of gigs of data.

ThomasKaiser commented 2 months ago

/usr/bin/rsync -avz source_folder /Volumes/home/test1 105.90s user 4.97s system 70% cpu 2:38.20 total vs. /opt/homebrew/bin/rsync -avz source_folder /Volumes/home/test2 18.36s user 8.67s system 32% cpu 1:23.53 total

Almost twice as fast with less than half CPU utilization hints at block sizes chosen by 1st rsync run being a lot smaller than 2nd run.

I personally find it hard to 'measure' with tools that may adjust blocksize depending on some defaults/algorithms that have changed over time. And while rsync has a -B/--block-size=BLOCKSIZE option to force a higher blocksize the upper limit is still 128K which is way too small to saturate modern networks.

lolren commented 1 month ago

Does anyone know if you can boot from a SSD on the hat? Im planing to buy one but I dont want to use a MicroSD or USB. Regards

geerlingguy commented 1 month ago

@lolren - Right now, no.

Also Michael Klements did a build with the Penta SATA HAT, and released four variants of his 3D printable case, which is pretty nice looking!

celly commented 1 month ago

@geerlingguy can you verify something I'm seeing?

When plugged in with the 12v DC barrel jack -- The external ATX power supply pins are live.

geerlingguy commented 1 month ago

@celly - Yes, it looks like there's no backfeed protection on the board (tested with my multimeter just now, also confirmed live 12V on the 12V molex pin), so the +12v DC just passes through from the molex to the barrel plug. I would recommend against plugging two power supplies into the board at the same time!

pfriedel commented 1 month ago

Yeah, Radxa specifically calls out that you shouldn't plug the Pi and the HAT in at the same time on their Raspberry Pi page for the product, now. I can't say if that was always there or not, though.

celly commented 1 month ago

@geerlingguy thanks for checking. Wanted to make sure I didn't have a bad board before I plugged drives in.

@pfriedel Yeah, that makes sense. But I'd figure they would have a diode or something on it to protect it from being live. Since, if it is plugged in, even if you think the pi is off, those pins are still live, since they seem to be connected directly to mains.

On a plus side, I may take advantage of this, and use that to power a 12v fan.

Also, sorry to hijack, but this seems to be the best place for info on this right now, so one note about cases. In case anyone finds this and is looking for a case for it, the RADXA official case is not ready for primetime. I have spent the last 3 days fighting it, and each part leads to more headaches and disappointment.

The three major issues are, there is not any clearance for the PCIe cable, so to get the pi and hat in the case, you'll damage the cable. The drive holder is clever, but doesn't allow for any airflow. And, finally, it is designed for the top fan to be from their fan / OLED board, which isn't available.

Not to mention, unless you are very experienced with your 3D printer it is a very tough print, with lots of press fits with zero room for error, and also tabs and screw holes that are not meant for PLA. It really is more meant for mass production and not hobby printing.

If you need a case, I'd start with Michael Klements one from this comment until someone has a chance to tweak the official one.

pfriedel commented 1 month ago

Yeah, for what it's worth I think there are 3 options for connecting a fan:

  1. The Molex 12v header - no PWM and maybe a little clunky, but it's pretty straightforward.
  2. There's a little JST PH 1.25 5v connector on the bottom of the HAT. I mean, I think it's PH1.25, I'm pretty sure the cable I borrowed the plug from was originally a Horizon Hobby adapter lead, which is PH1.25. Your mileage may vary.
  3. Tapping into the official 2x5 JST PHD (I think? I'll have a sample coming in a few days) and pulling one of the GPIO connectors for a 5v PWM fan, if you're feeling fancy.

And boy howdy, you want a fan if you put this in a case for the Pi if nothing else. I stuck a random heatsink from a Pi4 heatsink kit onto the SATA controller chip after chopping off two of the five fins and it just barely clears two 9.5mm drives. Is it doing any good? Who knows, but it probably isn't hurting. On the other hand, I don't think this is a new chip for Radxa, and people have been building NASes off of their hardware for a while, so maybe it just runs warm normally.

Oh, it should also be noted that 15mm drives do not fit. 9.5 is fine if tight. I don't have any 12.5mm drives to test if they fit the spacing or not, unfortunately.

Update: That header is definitely JST PHD 2x5, works like a charm, although the PWM fan I have is about as noisy at 50% as my Noctua 4010 5v is at 100%. I should have figured. And the 5v PWM 4010 is on backorder everywhere. Maybe the 4020 will fit...

celly commented 1 month ago

@pfriedel That is all great information. Thank you so much for taking the time.

Probing the JST and the 10-pin looking for a solid 5v connection is what started all of this for me. The weird fan connector threw me for a loop -- nothing I had fit it. I saw 5v, but not sure how much power I could safely pull through it as I want to use a larger fan.

The 10-pin is interesting since it has I2C and GPIO pass thru on it along with 5v, but I didn't want to tap into that yet, since adding a cheap OLED display is too tempting.

Once I saw that there was 12v on the molex, I decided to go with a 12v 80mm ultra quiet noctua fan that can be run at 1000rpm, and a female to 3-pin molex connector. It is stupid. But it'll be the best type of stupid -- quiet stupid.

The case you designed is awesome -- really great job. I wish you had sent me that case a few hours ago before I decided to design my own.. 🤣 The one I just finished is a bit more "Server" as I decided to not expose the HDMI ports in exchange for a wider case with a larger fan. If it works, I'll share it in the next few days after the fan gets here.

Update: The 12v molex works like a dream for the fan. I published my case using it here -- https://cults3d.com/en/3d-model/gadget/pi-5-nas-tower-for-radxa-hat-with-option-noctua-fan