Closed geerlingguy closed 2 years ago
And this is not directly to be put onto the site, but I'm planning on doing more testing with some PCI Express gear in building another version of the Pi NAS... and might also take a peek at seeing if I could fit the Pi board into the NAS enclosure (which is nice and small, and would be perfect for a Raspberry Pi board!
With this setup we should get an apple to apple comparison. Like the approach :)
Setting up RAID 5 on the QVO 8TB SSDs, it looks like the performance here is right in line with what I got on the Radxa Taco—a sync at around 95 MB/sec, which means it's hitting the throughput limit on the PCIe x1 lane in that RTD1296 chip.
(Noting that on the Lockerstor 4, the sync is going at 194 MB/sec, which seems to indicate double the throughput vs the Drivestor 4.)
One observation about the fan: It seems to hover around 960 rpm at 'low'. I noticed the magnetic front panel covers up all the direct ventilation—it stands off a tiny bit, but that basically negates all the direct airflow.
If I pull it off, I feel a lot more air going between the drives. Probably wouldn't leave that panel on, especially if using large, hot HDDs (I think it looks cooler with it off, too).
But the Lockerstor 4 idles around 500 rpm... it's a bit quieter but I think it's mostly down to lower 'low' rpm.
High speed fan mode is quite loud and sucks down a bit of air :)
I'm considering opening up the case and looking into a nice noctua replacement fan. Maybe.
Contender #2 is going to be the Radxa Taco (which I previously tested in my 48 TB Pi NAS build. I have 4x 4TB Seagate IronWolf NAS drives in it. I'm going to install OMV and see how it fares.
First thing to note: all four drives spun up simultaneously (there was no staggered spinup), so I'm sure the initial power surge is kinda hefty... but they all spun up, so at least the board can pull through the power needed. With my Kill-A-Watt, I'm seeing:
iperf3
and fio
sequential read: 25.5Wpi@omv:~ $ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 3.6T 0 disk
sdb 8:16 0 3.6T 0 disk
sdc 8:32 0 3.6T 0 disk
sdd 8:48 0 3.6T 0 disk
mmcblk0 179:0 0 14.8G 0 disk
├─mmcblk0p1 179:1 0 256M 0 part /boot
└─mmcblk0p2 179:2 0 14.6G 0 part /
nvme0n1 259:0 0 7.3T 0 disk
Heh... don't look too closely at that NVMe drive size. I'm going to see about using it as a cache.
Process for bringing up OMV:
sudo apt-get update && sudo apt-get upgrade
, then sudo reboot
wget -O - https://github.com/OpenMediaVault-Plugin-Developers/installScript/raw/master/install | sudo bash
admin
/ openmediavault
.Aside: OMV's UI has undergone quite a refresh, though the whole "you have changes to apply" thing that pops under the top of the screen is still annoying. Just save changes when I click save! :P Also, no way to re-start a md raid resync after a reboot in the UI, still have to log in and run
sudo mdadm --readwrite /dev/md0
via SSH...
Initial thought for storage array is either use openmediavault-zfs and create RAIDZ1 pool with NVMe SSD as cache, or just do straight up RAID5 and bcache
(not sure if that's supported all in the UI in OMV).
Wish I could try out TrueNAS but that's X86 only still (maybe that will change?), for some silly reasons like "everyone on the market currently runs on X86 hardware", pfft.
Setting up bcache:
pi@omv:~ $ sudo apt-get install bcache-tools
...
pi@omv:~ $ sudo make-bcache -B /dev/md0
UUID: eb360a2d-4c62-451d-8549-a68621c633e5
Set UUID: c8b5c63c-0a44-49f3-bb65-cd4df9b751a0
version: 1
block_size: 1
data_offset: 16
pi@omv:~ $ sudo make-bcache -C /dev/nvme0n1
UUID: 15bf54e9-be21-4478-b676-a08dad937963
Set UUID: dea419ba-d795-4566-b01f-bb57fa96eb21
version: 0
nbuckets: 15261770
block_size: 1
bucket_size: 1024
nr_in_set: 1
nr_this_dev: 0
first_bucket: 1
I could make-bcache using bcache-tools, I was thinking bcache
might actually be enabled on Pi OS, but looks like it's not. Would need a kernel recompile.
My idea is:
Time to recompile the kernel! Here's my menuconfig
changes:
# First, enable bcache.
> Device Drivers
> Multiple devices driver support (RAID and LVM)
> Block device as cache (BCACHE)
# Second, add in RTL8125 2.5G Ethernet support.
Device Drivers
> Network device support
> Ethernet driver support
> Realtek devices
> Realtek 8169/8168/8101/8125 ethernet support
Note: The RTL8125 support is already enabled upstream in the latest Pi OS kernel. Just hasn't made its way down to the default Pi OS distro/image kernel yet :( — you can also install the driver manually instead of recompiling the kernel, if you just need 2.5G support.
Recompiling now...
(Aside: I just noticed OMV must take over control of /etc/ssh/sshd_config
because it was wiped clean and has some rather insecure defaults, like PermitRootLogin yes
, and no prohibit-password
, along with PasswordAuthentication yes
)
pi@omv:~ $ sudo mkfs.ext4 /dev/bcache0
...
pi@omv:~ $ sudo mount /dev/bcache0 /mnt
pi@omv:~ $ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 3.6T 0 disk
└─md0 9:0 0 7.3T 0 raid5
└─bcache0 254:0 0 7.3T 0 disk /mnt
sdb 8:16 0 3.6T 0 disk
└─md0 9:0 0 7.3T 0 raid5
└─bcache0 254:0 0 7.3T 0 disk /mnt
sdc 8:32 0 3.6T 0 disk
└─md0 9:0 0 7.3T 0 raid5
└─bcache0 254:0 0 7.3T 0 disk /mnt
mmcblk0 179:0 0 14.8G 0 disk
├─mmcblk0p1 179:1 0 256M 0 part /boot
└─mmcblk0p2 179:2 0 14.6G 0 part /
nvme0n1 259:0 0 7.3T 0 disk
pi@omv:~ $ cat /sys/block/bcache0/bcache/state
no cache
When I try to attach the NVMe drive, I get:
pi@omv:~ $ sudo make-bcache -C /dev/nvme0n1
Can't open dev /dev/nvme0n1: Device or resource busy
I had to unregister nvme0n1 following the directions in the bcache documentation under "Remove or replace a caching device".
Then I reattached it:
# umount /mnt
# make-bcache -C /dev/nvme0n1
UUID: 6d9e32ad-498a-4fe5-a0b7-86a66d01aaa6
Set UUID: 9e59a381-40cb-43d9-bd66-c77586977759
version: 0
nbuckets: 15261770
block_size: 1
bucket_size: 1024
nr_in_set: 1
nr_this_dev: 0
first_bucket: 1
# cd /sys/block/md0/bcache/
# echo 9e59a381-40cb-43d9-bd66-c77586977759 > attach
# cat state
clean
# mount /dev/bcache0 /mnt
(I think I just had never attached the cache device properly in the first place...).
Bcache tips:
# Get stats
tail /sys/block/bcache0/bcache/stats_total/*
Since I don't want to forget any of this, I wrote up a guide: Use bcache for SSD caching on a Raspberry Pi.
Apparently if you set up a volume the way I did via the CLI, OMV won't see it, and you can't manage it via the UI. Oopsie! Going to set up a Samba share via CLI.
I wonder if OMV causes some strange state to occur with the network controller—it seemed to take over interface management, I had to add the 2.5G interface in OMV's UI:
And unlike my testing on Pi OS directly, I seemed to be hitting IRQ interrupts maxing out a CPU core on the 2.5G connection, limiting the bandwidth to 1.88 Gbps:
pi@omv:~ $ iperf3 -c 10.0.100.100
Connecting to host 10.0.100.100, port 5201
[ 5] local 10.0.100.199 port 42720 connected to 10.0.100.100 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 220 MBytes 1.84 Gbits/sec 0 501 KBytes
[ 5] 1.00-2.00 sec 225 MBytes 1.89 Gbits/sec 0 501 KBytes
[ 5] 2.00-3.00 sec 225 MBytes 1.89 Gbits/sec 0 523 KBytes
[ 5] 3.00-4.00 sec 225 MBytes 1.89 Gbits/sec 0 549 KBytes
[ 5] 4.00-5.00 sec 226 MBytes 1.89 Gbits/sec 0 549 KBytes
[ 5] 5.00-6.00 sec 225 MBytes 1.88 Gbits/sec 0 576 KBytes
[ 5] 6.00-7.00 sec 225 MBytes 1.89 Gbits/sec 0 576 KBytes
[ 5] 7.00-8.00 sec 225 MBytes 1.89 Gbits/sec 0 576 KBytes
[ 5] 8.00-9.00 sec 226 MBytes 1.90 Gbits/sec 0 576 KBytes
[ 5] 9.00-10.00 sec 225 MBytes 1.89 Gbits/sec 0 576 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 2.19 GBytes 1.88 Gbits/sec 0 sender
[ 5] 0.00-10.00 sec 2.19 GBytes 1.88 Gbits/sec receiver
I wonder if I should just ditch OMV for the testing :/
Using my disk-benchmark.sh
script.
With bcache enabled, writeback
mode:
Benchmark | Result |
---|---|
fio 1M sequential read | 345 MB/s |
iozone 1M random read | 366.80 MB/s |
iozone 1M random write | 373.88 MB/s |
iozone 4K random read | 40.00 MB/s |
iozone 4K random write | 53.11 MB/s |
With bcache disabled, none
mode:
Benchmark | Result |
---|---|
fio 1M sequential read | 354 MB/s |
iozone 1M random read | 75.02 MB/s |
iozone 1M random write | 30.87 MB/s |
iozone 4K random read | 1.35 MB/s |
iozone 4K random write | 0.36 MB/s |
Using my rsync
network copy test.
With bcache enabled, writeback
mode:
# Mac to Pi (write)
sent 8.59G bytes received 35 bytes 70.72M bytes/sec
total size is 8.59G speedup is 1.00
# Pi to Mac (read)
sent 8.59G bytes received 35 bytes 110.86M bytes/sec
total size is 8.59G speedup is 1.00
With bcache disabled, none
mode:
# Mac to Pi (write)
sent 8.59G bytes received 35 bytes 70.14M bytes/sec
total size is 8.59G speedup is 1.00
# Pi to Mac (read)
sent 8.59G bytes received 35 bytes 105.42M bytes/sec
total size is 8.59G speedup is 1.00
fio
ongoing): 1.51 Gbpsfio
standalone: 376 MB/sfio
(while iperf3
ongoing): 367 MB/s(Used iperf3 -c 10.0.100.100 -tinf
and sudo fio --filename=/dev/bcache0 --direct=1 --rw=read --bs=1024k --ioengine=libaio --iodepth=64 --size=16G --runtime=120 --numjobs=4 --group_reporting --name=fio-rand-read-sequential --eta-newline=1 --readonly
).
I think I have the data I want from the Taco, on to the ASUSTOR!
Since it seems like some of the apps like iperf3
in App Central aren't available on aarch64/ARM64, I installed docker-ce
via App Central and ran tests through there.
These benchmarks were run inside Docker container with: docker run -it -v /dev/md1:/dev/md1 -v /volume1/home/admin:/volume1/home/admin --privileged debian:bullseye /bin/bash
, then I downloaded my disk-benchmark.sh
script (modified to remove sudo
references), and ran it with: DEVICE_UNDER_TEST=/dev/md1 DEVICE_MOUNT_PATH=/volume1/home/admin ./disk-benchmark.sh
.
Benchmark | Result |
---|---|
fio 1M sequential read | 251 MB/s |
iozone 1M random read | 62.55 MB/s |
iozone 1M random write | 110.13 MB/s |
iozone 4K random read | 1.47 MB/s |
iozone 4K random write | 3.72 MB/s |
Using my rsync
network copy test.
# Mac to ASUSTOR (write)
sent 8.59G bytes received 35 bytes 89.97M bytes/sec
total size is 8.59G speedup is 1.00
# ASUSTOR to Mac (read)
sent 8.59G bytes received 35 bytes 144.40M bytes/sec
total size is 8.59G speedup is 1.00
Note: writes on the ASUSTOR were more consistent, with little fluctuation or 'dead times' when it seemed interrupts were stacked up and queues/caches were clearing. Also, I re-tested a couple giant copies with large video folders to confirm the speeds, and they seemed consistent with rsync
measurements in the Terminal.
fio
ongoing): 1.85 Gbpsfio
standalone: 278 MB/sfio
(while iperf3
ongoing): 283 MB/sFor
fio
via Docker: Useddocker run -it -v /dev/md1:/dev/md1 --privileged manjo8/fio fio --filename=/dev/md1 --direct=1 --rw=read --bs=1024k --ioengine=libaio --iodepth=64 --size=8G --runtime=60 --numjobs=4 --group_reporting --name=fio-rand-read-sequential --eta-newline=1 --readonly
— confirmedmd1
was proper volume withmdadm -D /dev/md1
).For
iperf3
via Docker: Useddocker run -it ajoergensen/iperf3 -c 10.0.100.100
.
Annoyances benchmarking the ASUSTOR:
htop
would quit with Error opening terminal: xterm-256-color.
. I had to specify it in the command: TERM=xterm htop
emboardmand
processes running in htop
, and that led me down a rabbit hole that led to the post Reverse engineering and fine-tuning Asustor NAS fans which discusses the process.
Hmm... looking at my performance numbers and comparing everything back to the Taco: https://github.com/geerlingguy/raspberry-pi-pcie-devices/issues/268#issuecomment-965555364 — in that thread I used the RTL driver from Realtek's website, instead of the kernel module that I compiled by hand from the Pi linux tree... and I got 2.35 Gbps...
So maybe I need to do a little re-testing using the driver instead of the kernel driver. Maybe Realtek's driver has some other optimizations that are in the 5.11/12/13/14/15 source that aren't in 5.10?
I was also getting 80 MB/s writes and 125.43 MB/s reads on the Taco with the RTL driver instead of the in-kernel driver, which is faster than the 70/110 I got here. All these numbers seem to be 15-25% better with Realtek's driver :/
Getting Realtek 2.5G NIC working using it's driver instead of the one in the 5.10 kernel source:
Download the 2.5G Ethernet LINUX driver r8125 for kernel up to 5.6 driver version 9.007.01
(had to solve an annoying math captcha first).
Install kernel headers: sudo apt-get install -y raspberrypi-kernel-headers
Run:
tar vjxf r8125-9.007.01.tar.bz2
cd r8125-9.007.01/
sudo ./autorun.sh
Install iperf3 again: sudo apt install -y iperf3
Run test:
pi@taco:~ $ iperf3 -c 10.0.100.100
Connecting to host 10.0.100.100, port 5201
[ 5] local 10.0.100.49 port 53116 connected to 10.0.100.100 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 274 MBytes 2.30 Gbits/sec 0 782 KBytes
[ 5] 1.00-2.00 sec 281 MBytes 2.36 Gbits/sec 0 782 KBytes
[ 5] 2.00-3.00 sec 280 MBytes 2.35 Gbits/sec 0 782 KBytes
[ 5] 3.00-4.00 sec 281 MBytes 2.35 Gbits/sec 0 782 KBytes
[ 5] 4.00-5.00 sec 280 MBytes 2.35 Gbits/sec 0 865 KBytes
[ 5] 5.00-6.00 sec 281 MBytes 2.35 Gbits/sec 0 865 KBytes
[ 5] 6.00-7.00 sec 280 MBytes 2.35 Gbits/sec 0 913 KBytes
[ 5] 7.00-8.00 sec 281 MBytes 2.35 Gbits/sec 0 913 KBytes
[ 5] 8.00-9.00 sec 280 MBytes 2.35 Gbits/sec 0 913 KBytes
[ 5] 9.00-10.00 sec 280 MBytes 2.35 Gbits/sec 0 913 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 2.73 GBytes 2.35 Gbits/sec 0 sender
[ 5] 0.00-10.01 sec 2.73 GBytes 2.34 Gbits/sec receiver
(2.2 Gbps in opposite direction.)
Well I'll be! Going to have to rebuild RAID 5 array and re-test everything, drat.
I pulled the array out of the Drivestor 4 Pro, plugged it straight into the Taco, then used mdadm
to recover and run the array on the Pi as /dev/md0
:
pi@taco:~ $ sudo mdadm -A /dev/md0 /dev/sd{a,b,c}4 --run
mdadm: /dev/md0 has been started with 3 drives.
pi@taco:~ $ sudo mount /dev/md0 /mnt
pi@taco:~ $ ls /mnt
aquota.user home lost+found Public Web
Going to re-run some tests now.
Re-testing Taco SMB copy tests with bcache disabled:
# Mac to Pi (write)
sent 8.59G bytes received 35 bytes 75.70M bytes/sec
total size is 8.59G speedup is 1.00
# Pi to Mac (read)
sent 8.59G bytes received 35 bytes 97.09M bytes/sec
total size is 8.59G speedup is 1.00
With bcache enabled (TODO).
fio
ongoing): 2.35 Gbpsfio
standalone: 275 MB/sfio
(while iperf3
ongoing): 275 MB/s(Used iperf3 -c 10.0.100.100 -tinf
and sudo fio --filename=/dev/md0 --direct=1 --rw=read --bs=1024k --ioengine=libaio --iodepth=64 --size=16G --runtime=120 --numjobs=4 --group_reporting --name=fio-rand-read-sequential --eta-newline=1 --readonly
).
More discussion on https://github.com/raspberrypi/linux/issues/4133 — seems like some strange things afoot with samba performance on the Pi, not sure what's going on there, but it should be faster.
I was also trying out the bcmstat script by @MilhouseVH today, found one little snafu... https://github.com/MilhouseVH/bcmstat/issues/23
On the Taco / Pi OS, I just rebuilt the kernel with the in-tree driver in rpi-5.15.y linux, and after a reboot:
pi@taco15:~ $ uname -a
Linux taco15 5.15.10-v8+ #1 SMP PREEMPT Mon Dec 20 03:16:02 UTC 2021 aarch64 GNU/Linux
pi@taco15:~ $ dmesg | grep r8169
[ 4.714152] r8169 0000:04:00.0: enabling device (0000 -> 0002)
[ 4.783518] r8169 0000:04:00.0: can't read MAC address, setting random one
[ 4.818143] libphy: r8169: probed
[ 4.822214] r8169 0000:04:00.0 eth1: RTL8125B, 9e:57:a0:b7:ee:01, XID 641, IRQ 73
[ 4.822247] r8169 0000:04:00.0 eth1: jumbo features [frames: 9194 bytes, tx checksumming: ko]
[ 6.791432] RTL8226B_RTL8221B 2.5Gbps PHY r8169-0-400:00: attached PHY driver (mii_bus:phy_addr=r8169-0-400:00, irq=MAC)
[ 6.991585] r8169 0000:04:00.0 eth1: Link is Down
[ 47.452192] r8169 0000:04:00.0 eth1: Link is Up - 2.5Gbps/Full - flow control rx/tx
pi@taco15:~ $ iperf3 -c 10.0.100.100
Connecting to host 10.0.100.100, port 5201
[ 5] local 10.0.100.105 port 40554 connected to 10.0.100.100 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 164 MBytes 1.37 Gbits/sec 0 171 KBytes
[ 5] 1.00-2.00 sec 168 MBytes 1.40 Gbits/sec 0 171 KBytes
[ 5] 2.00-3.01 sec 168 MBytes 1.40 Gbits/sec 0 182 KBytes
[ 5] 3.01-4.00 sec 168 MBytes 1.41 Gbits/sec 0 182 KBytes
[ 5] 4.00-5.00 sec 169 MBytes 1.41 Gbits/sec 0 182 KBytes
[ 5] 5.00-6.01 sec 169 MBytes 1.41 Gbits/sec 0 201 KBytes
[ 5] 6.01-7.00 sec 166 MBytes 1.40 Gbits/sec 0 273 KBytes
[ 5] 7.00-8.01 sec 168 MBytes 1.40 Gbits/sec 0 273 KBytes
[ 5] 8.01-9.00 sec 165 MBytes 1.38 Gbits/sec 0 273 KBytes
[ 5] 9.00-10.00 sec 165 MBytes 1.38 Gbits/sec 0 273 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 1.63 GBytes 1.40 Gbits/sec 0 sender
[ 5] 0.00-10.01 sec 1.63 GBytes 1.40 Gbits/sec receiver
Very strange—I wonder if something in the rpi kernel fork for 5.15 is screwed up in terms of PCIe or networking that's fixed up in the 5.10 kernel? I did a clean clone of the repo and checked out the tip of 5.15.y.
I also popped apart the ASUSTOR Drivestor 4 and explored it's innards:
Video and blog post are coming up tomorrow :)
Video and blog post are up:
Can the ASUSTOR Drivestor 4 handle a 2 drive failure?
@formvoltron - That depends on which two drives and which RAID type you have set up :)
I was thinking of a raid 5 level.
On Wed, Dec 22, 2021 at 10:35 AM Jeff Geerling @.***> wrote:
@formvoltron https://github.com/formvoltron - That depends on which two drives and which RAID type you have set up :)
— Reply to this email directly, view it on GitHub https://github.com/geerlingguy/raspberry-pi-pcie-devices/issues/162#issuecomment-999665682, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADE3FE6QCHO7USGZRKHMM3USHV5FANCNFSM47OUOKZQ . You are receiving this because you were mentioned.Message ID: @.***>
@formvoltron - With RAID 5, you can have one drive failure. You have to replace the failed drive and wait for it to be incorporated into the degraded array. RAID 6 or RAID 10 are safer if you are worried about more than one drive failure. RAID 5 is often not recommended for very large hard drives nowadays.
First I'd heard of Raid 6. Reading up on it it sounds like exactly what I'd want for ultra reliable storage: handling 2 drive failures. Thank you for the excellent YT vid & review. If I were a younger man I'd go for the pi. But seeing that I'm greying and crotchety I'd certainly opt for the ready made NAS.
Wrt your benchmarking:
rsync
and 'the network stack' were using? This number has a huge impact on performance therefore at least I prefer to test with tools that use predictable block sizes (like e.g. Helios Lantest)sbc-bench -m
?testparm
? I just noticed that a bunch of former optimisations that were in my 'OMV for SBC install script' have disappeared in the meantime...@ThomasKaiser - For the benchmark monitoring, I ran the tests in three different conditions: 1. monitoring with atop
at 2s intervals, 2. monitoring with top
in 2s intervals, and 3. not running any tool to monitor resource usage at all during the benchmark. I do that third test because even though it should be minimal, injecting the calls to get monitoring data could conceivably impact performance (no matter how minimally).
Nothing else was running during the benchmarks (I actually ran them all again without OMV installed at all) besides what is preinstalled in the lite Pi OS 64-bit image.
For the network file copy I used this rsync command, and I also set up a separate comparison (that I didn't fully document in this issue) where I did the following:
rsync
to copy the folder.And the final result between rsync and Finder were within about 1s of each other (which surprised me... it felt like rsync was slower, just watching my network graph in iStatMenus). I repeated that test twice.
I haven't done any more advanced checking of the SMB connection details—but it seems other people in the Pi community have noticed similar issues with samba file copies not being as fast as they were a year or two ago.
As for the rsync
command and you mentioning there's no easy way to time Finder copies. Here's a q&d script I wrote for a colleague a year ago:
(** The server path – in this case /Volumes/Server/lantest/ - needs to be world
writeable and there need to be 2 files inside named 100M and 1G with appropriate
size. They must not consist of zeros but true data, e.g. from /dev/urandom **)
set FileSize to "100M"
-- set FileSize to "1G"
set OriginalFile to "Server:lantest:" & FileSize
set DestinationFolder to (path to desktop)
set StartTime to do shell script "perl -MTime::HiRes=time -e 'printf \"%.1f\", time'"
with timeout of 100000 seconds
tell application "Finder"
duplicate (OriginalFile as alias) to (DestinationFolder as alias) with replacing
end tell
end timeout
set EndTime to do shell script "perl -MTime::HiRes=time -e 'printf \"%.1f\", time'"
set DownTimeDifference to do shell script "echo " & EndTime & " - " & StartTime & " | bc"
with timeout of 100000 seconds
tell application "Finder"
duplicate (((DestinationFolder as string) & FileSize) as alias) to ("Server:lantest:" as alias) with replacing
end tell
end timeout
set StartTime to do shell script "perl -MTime::HiRes=time -e 'printf \"%.1f\", time'"
set UpTimeDifference to do shell script "echo " & StartTime & " - " & EndTime & " | bc"
set LogMessage to ((DownTimeDifference as string) & " Sec down, " & UpTimeDifference as string) & " Sec up."
log LogMessage
set the clipboard to LogMessage
The requirement for 'real data' instead of zeroes as you do it with mkfile
was due to testing through different VPN solutions with different compression algorithms / efficiency.
But still using either Finder or rsync
misses the fact that you want to know which block sizes are used when doing sequential transfer tests. As already mentioned: using Helios Lantest is a good idea for this.
Wrt monitoring. Yep, both atop
and top
can add significant load and AFAIK neither tool shows real CPU clockspeeds (just relative CPU utilisation which is somewhat meaningless since a Pi busy at 100% at 600 MHz has less computing power compared to a Pi at 50% / 1800 MHz)
Wrt monitoring. Yep, both
atop
andtop
can add significant load and AFAIK neither tool shows real CPU clockspeeds (just relative CPU utilisation which is somewhat meaningless since a Pi busy at 100% at 600 MHz has less computing power compared to a Pi at 50% / 1800 MHz)
You can try new utility btop
which is a great replacement for top + extra features (C++ version; Binaries for Linux are statically compiled)
https://github.com/aristocratos/btop
Using this tool, you can see the actual CPU clock speed on RPI4 too.
You can try new utility btop
Has this utility support for querying ThreadX on the RPi? Otherwise it's rather useless here.
Even if btop
tries to display the actual CPU clockspeeds on all Raspberries you can't this doing 'the Linux way' (relying on numbers from sysfs) but you need to use a mailbox interface to ask the main OS (that's an RTOS called ThreadX running on the VideoCore CPU cores and not on the ARM cores). The Linux kernel on RPi has no idea at which clockspeeds the ARM cores are running. That's why monitoring this is also important.
Sample output from sbc-bench -m
on a RPi 3B+:
Time fake/real load %cpu %sys %usr %nice %io %irq Temp VCore
17:35:30: 1570/1200MHz 4.26 20% 0% 17% 0% 1% 0% 64.5°C 1.3312V
17:36:00: 1570/1200MHz 3.92 72% 1% 71% 0% 0% 0% 69.3°C 1.3312V
17:36:30: 1570/1200MHz 3.56 82% 1% 81% 0% 0% 0% 70.9°C 1.3312V
17:37:01: 1570/1200MHz 3.88 89% 1% 87% 0% 0% 0% 73.1°C 1.3312V
17:37:31: 1570/1200MHz 3.63 75% 1% 74% 0% 0% 0% 72.5°C 1.3312V
(the user thinking he's done some nasty overclocking as the 1570 MHz are reported by all Linux tools while in reality ThreadX silently clocked the ARM cores down to 1.2GHz)
@ThomasKaiser - There are two different goals in benchmarking, I think—and I am usually targeting a different goal in my tests than I think you may be.
First goal: what kind of performance can be reasonably expected doing end-user tasks on a system set up by an end user, like dragging a file to a NAS in the Finder, or synchronizing two directories on the command line with rsync
.
Second goal: What is the reasonably stable measurable performance you can get with a known baseline.
I think your suggestions would help with the second, but my target is usually the first. Ideally you can meet both goals to give a full picture, but the tests that went into my review/comparison were more targeting the first, and I didn't take the time to target the second.
(And in reality, the two goals are usually mixed/intertwined a bit.)
I normally want whatever numbers I show people on screen and in my public blog posts to reflect the ground truth of what they'd get if they followed a tutorial and got all the default stuff running, then dragged one of their own files or folders over to a NAS.
There's definitely room for both numbers, though—and that's why I love digging into deeper articles from sites like anandtech, and benchmarks from you :)
(I just wanted to make that clear—I am always interested in expanding the benchmarks I run and being able to have a better understanding of surprising 'real world' numbers like those I've been seeing on the Pi with Samba.)
- noticed that a bunch of former optimisations that were in my 'OMV for SBC install script' have disappeared in the meantime...
@ThomasKaiser could you please elaborate on what optimizations you mean?
@mi-hol just a quick list:
Anyway, the most important bit are smb.conf
settings (check with testparm
):
min receivefile size = 16384
socket options = TCP_NODELAY IPTOS_LOWDELAY
use sendfile = Yes
getwd cache = yes
write cache size = 524288
The stuff below doesn't help with benchmarks but in real-life NAS situations when the ondemand
governor is choosen since it helps keeping the CPU cores at high clockspeeds when the client is the bottleneck (e.g. copying thousands of small files):
echo 1 >/sys/devices/system/cpu/cpufreq/ondemand/io_is_busy
echo 25 >/sys/devices/system/cpu/cpufreq/ondemand/up_threshold
echo 10 >/sys/devices/system/cpu/cpufreq/ondemand/sampling_down_factor
echo 200000 >/sys/devices/system/cpu/cpufreq/ondemand/sampling_rate
(it can take twice as long without io_is_busy
set to 1
but honestly I haven't looked into this for years after I added this stuff to Armbian).
Just did a quick check with my RPi 4 and Buster, 5.10.63-v8+ (aarch64), an armhf userland and Samba 4.9.5-Debian:
Nothing to complain about. Getting 90/100 MB/s with a single-threaded SMB copy with just 1MB block size is totally fine.
As already mentioned, block sizes matter. Quick testing through 1M, 4M and 16M:
Command line used: iozone -e -I -a -s 500M -r 1024k -r 4096k -r 16384k -i 0 -i 1
Output is in kBytes/sec
Time Resolution = 0.000001 seconds.
Processor cache size set to 1024 kBytes.
Processor cache line size set to 32 bytes.
File stride size set to 17 * record size.
kB reclen write rewrite read reread
512000 1024 89421 88773 105749 9784603
512000 4096 102437 86328 105403 8770580
512000 16384 110293 111368 106379 10872846
And this is the stuff Finder and Windows Explorer do on their own. They auto-tune settings and increase block sizes more and more until there's no further benefit. Also they start multiple copies in parallel. As such it's useless to test for 'NAS performance' if you don't take this into account or at least check it. You might be testing one NAS with a small block size and the other with a huge one and tell your audience afterwards the difference would be caused by hardware (something that happens all the time with kitchen-sink benchmarking).
Speaking of Finder... using the AppleScript for Jeff above it's with a file created by dd if=/dev/urandom of=1G bs=1M count=1024
on the RPi as follows (3 consecutive tests):
9.4 Sec down, 10.0 Sec up.
9.4 Sec down, 10.0 Sec up.
9.4 Sec down, 9.9 Sec up.
~100 MB/sec in both directions. Fine with me especially since the switch in between is some crappy ALLNET thingy that is the oldest GbE gear lying here around.
There are two different goals in benchmarking
True, there's passive benchmarking (also called generating/collecting numbers and graphs for a target audience that wants some entertainment) and there's active benchmarking which means a) getting a clue why numbers are as they are and b) getting an idea how to improve numbers.
As an example of passive benchmarking gone wrong: In your RPi Zero 2 W review you reported 221 Mbps maximum transfer rates for wired Ethernet. No idea how you generated that number but most probably you did not benchmark the Pi but your USB Ethernet dongle that is likely based on an ASIX AX88179 and not the only reasonable choice for the job: RTL8153B?
Reporting that 221 Mbps number matches the 'what kind of performance can be reasonably expected doing end-user tasks on a system set up by an end user' expectation since they could end up buying the 'wrong' Ethernet dongle.
But wouldn't it be better if said end-users learn that there are huge differences with those dongles and there's no need to stick with such low numbers since any thingy based on the RealTek chipset achieves ~100 Mbps more? :)
@ThomasKaiser - I believe you're implicating I'm throwing out meaningless numbers... but that's simply not the case. Unlike 99% of reviewers/entertainers, I thoroughly document every step in my process, every part number I use and test, every system I test on, and every command or script I run, so at a minimum you can reproduce my exact number (and I often do, dozens of times, before publishing any result).
That does not mean my 'entertainment' numbers are incorrect, or wrong. It may mean they are incomplete, or don't paint the whole picture when it comes to benchmarking—that's fine with me.
But they're not wrong ;)
Edit: Additionally, in my latest video, I did explicitly mention I'm not sure why the numbers have changed in the default kernel and OS configurations that ship with Pi OS / Debian.
Great content, Jeff! However I do agree with Thomas that usb ethernet adapter chipset might be important here too... It's like with your findings for UAS mode for RPI4, where JMicron was causing all the different issues compared to ASM1153
@FlyingHavoc - No doubt, and there are people who dive deep into testing every single chipset out there (and I'm actually doing something approaching that in this particular project, but only via PCIe, not USB-to-whatever)... but I have limited time and budget, so ideally other people can also do the tests and the information can be promulgated through GitHub issues, forum posts, blog posts, etc.
It would be great if there were more central resources (like my page for PCIe cards on the Pi) for USB chipset support for Network, SATA, etc., but there just isn't, so when I do my testing, I have to work within my means.
I basically have gone as far as I can this year, literally spending over $10,000 on different devices, test equipment, etc., and it's obvious (especially from the few comments above) that that is nowhere near enough to paint a complete picture of every type of device I test.
There are groups like the Linus Media Group investing hundreds of thousands (possibly millions) of dollars into benchmarking in a new lab, and they'll probably still be dwarfed by even a medium sized manufacturer's testing lab in terms of hours and resources.
All that to say, I'm doing my best, always trying to improve, but also realizing I'm one person, trying to help a community, in the best ways I can. And if my benchmarking is taken as being misleading, that's not my intention, and it's also often a matter of perspective.
Also, as this thread is going wildly off course, I'm considering locking it unless discussion stays on topic: SMB performance, the RTL8125B chip, or Realtek's or Broadcom's SoC performance are welcome. As are discussions around using a Pi as a NAS or the ASUSTOR's performance (especially regarding the Realtek driver).
If you want to point out flaws in myraid other devices (USB to SATA, USB to Ethernet, and graphics, etc.), please either find a more relevant issue for it, or open a new issue or discussion (especially for general benchmarking).
you're implicating I'm throwing out meaningless numbers
Nope. Sorry for not being more clear or appearing rude (non native english speaker here always accused of the same).
It's not a matter of 'wrong' numbers but of methodology. Quoting one of my personal heroes: Casual benchmarking: you benchmark A, but actually measure B, and conclude you've measured C.
To stay on topic: as already mentioned you need to monitor and/or control the environment the benchmarks are running in. And with NAS performance measurements it's block size that matters. As such I tried to give some suggestions like Lantest, using iozone to test with different block sizes, an AppleScript snippet to time Finder copies and so on.
BTW: you do an amazing job with all your extremely well documented testings especially compared to those YT guys who ignore people still able to read. And you do also a great educational job (you're the one who introduced the concept of random I/O to the RPi world :) ). As such please forgive me critising your methodology here and there... my goal is to get insights and improve overall situation in this area :)
@ThomasKaiser - Okay, thanks :) — and like I said, I am always eager to do better. The AppleScript alone will save a bit of time—I thought I would be resigned to having to screen record and count frames forever :P
Just wanted to mention I got a new build of ADM today to test the Realtek driver. So I'm going to check with iperf3 if it's any faster.
Before (ADM 4.0.1.ROG1):
root@asustor-arm:/volume1/home/admin # docker run -it ajoergensen/iperf3 -c 10.0.100.100
Connecting to host 10.0.100.100, port 5201
[ 5] local 172.17.0.2 port 48634 connected to 10.0.100.100 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 227 MBytes 1.91 Gbits/sec 7 7.26 MBytes
[ 5] 1.00-2.00 sec 225 MBytes 1.89 Gbits/sec 0 7.26 MBytes
[ 5] 2.00-3.00 sec 225 MBytes 1.89 Gbits/sec 0 7.26 MBytes
[ 5] 3.00-4.00 sec 225 MBytes 1.89 Gbits/sec 0 7.26 MBytes
[ 5] 4.00-5.00 sec 225 MBytes 1.89 Gbits/sec 0 7.44 MBytes
[ 5] 5.00-6.00 sec 225 MBytes 1.89 Gbits/sec 0 7.44 MBytes
[ 5] 6.00-7.00 sec 224 MBytes 1.88 Gbits/sec 0 7.44 MBytes
[ 5] 7.00-8.00 sec 225 MBytes 1.89 Gbits/sec 0 7.44 MBytes
[ 5] 8.00-9.00 sec 225 MBytes 1.89 Gbits/sec 0 7.44 MBytes
[ 5] 9.00-10.00 sec 225 MBytes 1.89 Gbits/sec 0 7.44 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 2.20 GBytes 1.89 Gbits/sec 7 sender
[ 5] 0.00-10.01 sec 2.20 GBytes 1.89 Gbits/sec receiver
After (custom ADM build 4.0.2.BPE1):
root@asustor-arm:/volume1/home/admin # docker run -it ajoergensen/iperf3 -c 10.0.100.100
Connecting to host 10.0.100.100, port 5201
[ 5] local 172.17.0.2 port 44632 connected to 10.0.100.100 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 224 MBytes 1.88 Gbits/sec 45 7.19 MBytes
[ 5] 1.00-2.00 sec 222 MBytes 1.86 Gbits/sec 0 7.19 MBytes
[ 5] 2.00-3.00 sec 221 MBytes 1.86 Gbits/sec 0 7.49 MBytes
[ 5] 3.00-4.00 sec 221 MBytes 1.86 Gbits/sec 0 7.49 MBytes
[ 5] 4.00-5.00 sec 222 MBytes 1.87 Gbits/sec 0 7.49 MBytes
[ 5] 5.00-6.00 sec 221 MBytes 1.86 Gbits/sec 0 7.49 MBytes
[ 5] 6.00-7.00 sec 221 MBytes 1.86 Gbits/sec 0 7.49 MBytes
[ 5] 7.00-8.00 sec 222 MBytes 1.87 Gbits/sec 0 7.49 MBytes
[ 5] 8.00-9.00 sec 221 MBytes 1.86 Gbits/sec 0 7.49 MBytes
[ 5] 9.00-10.00 sec 222 MBytes 1.86 Gbits/sec 0 7.49 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 2.17 GBytes 1.86 Gbits/sec 45 sender
[ 5] 0.00-10.02 sec 2.17 GBytes 1.86 Gbits/sec receiver
Samba performance test (after):
mkfile 8G test.zip && \
rsync -h --stats test.zip /Volumes/Public/test.zip && \
rm -f test.zip && sleep 60 && \
rsync -h --stats /Volumes/Public/test.zip test.zip && \
rm -f /Volumes/Public/test.zip && \
rm -f test.zip
# Mac to ASUSTOR (write)
sent 8.59G bytes received 35 bytes 89.04M bytes/sec
total size is 8.59G speedup is 1.00
# ASUSTOR to Mac (read)
sent 8.59G bytes received 35 bytes 135.31M bytes/sec
total size is 8.59G speedup is 1.00
(Compare to earlier results: https://github.com/geerlingguy/raspberry-pi-pcie-devices/issues/162#issuecomment-996906725)
I did notice with the new driver in place, it mentioned not using MSI/MSI-X—maybe due to the bus constraints on the Realtek SoC on the Drivestor 4 Pro, it switches into a mode that's as slow as the in-kernel driver (technically very slightly slower):
root@asustor-arm:/volume1/home/admin # dmesg | grep r8125
[ 14.052836] r8125 2.5Gigabit Ethernet driver 9.007.01-NAPI loaded
[ 14.059374] r8125 0001:01:00.0: enabling device (0000 -> 0003)
[ 14.067619] r8125 0001:01:00.0: no MSI/MSI-X. Back to INTx.
[ 14.089225] r8125: This product is covered by one or more of the following patents: US6,570,884, US6,115,776, and US6,327,625.
[ 14.107688] r8125 Copyright (C) 2021 Realtek NIC software team <nicfae@realtek.com>
[ 27.097654] r8125: eth0: link up
So this one should be a bit more interesting...
After seeing my earlier ASUSTOR vs Pi CM4 NAS videos, ASUSTOR sent me their Drivestor 4 Pro - AS3304T, which is even more directly comparable to the CM4-based NAS I built:
A quick specs comparison:
Both use ARM architecture CPUs, unlike the Lockerstor 4 that I tested previously (it was AMD64)—and this brings a few wrinkles. It can't do things like run VirtualBox VMs, and some software that may have worked on other models before might not work on this model due to the aarch64 platform.
A few other notable differences with ADM 4.0 (I am still on 3.x on my Lockerstor):
darkmode support, yay! (Though it doesn't autodetect).
A few questions I'd like to answer: