geerlingguy / sbc-reviews

Jeff Geerling's SBC review data - Raspberry Pi, Radxa, Orange Pi, etc.
MIT License
350 stars 9 forks source link

Radxa Rock 5 model B #3

Open geerlingguy opened 1 year ago

geerlingguy commented 1 year ago

DSC00038

Basic information

Linux/system information

# output of `neofetch`
       _,met$$$$$gg.          rock@rock-5b 
    ,g$$$$$$$$$$$$$$$P.       ------------ 
  ,g$$P"     """Y$$.".        OS: Debian GNU/Linux 11 (bullseye) aarch64 
 ,$$P'              `$$$.     Host: Radxa ROCK 5B 
',$$P       ,ggs.     `$$b:   Kernel: 5.10.66-27-rockchip-gea60d388902d 
`d$$'     ,$P"'   .    $$$    Uptime: 3 mins 
 $$P      d$'     ,    $$P    Packages: 1195 (dpkg) 
 $$:      $$.   -    ,d$$'    Shell: bash 5.1.4 
 $$;      Y$b._   _,d$P'      Terminal: /dev/pts/0 
 Y$$.    `.`"Y$$$$P"'         CPU: (8) @ 1.800GHz 
 `$$b      "-.__              Memory: 353MiB / 3739MiB 
  `Y$$
   `Y$$.                                              
     `$$b.                                            
       `Y$$b.
          `"Y$b._
              `"""

# output of `uname -a`
Linux rock-5b 5.10.66-27-rockchip-gea60d388902d #rockchip SMP Mon Oct 24 08:25:47 UTC 2022 aarch64 GNU/Linux

Benchmark results

CPU

Power

Disk

SanDisk SD1NBDA4-32G eMMC module

Benchmark Result
fio 1M sequential read 213 MB/s
iozone 1M random read 235.76 MB/s
iozone 1M random write 209.29 MB/s
iozone 4K random read 13.27 MB/s
iozone 4K random write 20.06 MB/s

PiBenchmarks.com result: https://pibenchmarks.com/benchmark/66561/

KIOXIA XG6 1TB PCIe Gen 3 x4 SSD

Benchmark Result
fio 1M sequential read 3.091 GB/s
iozone 1M random read 1.037 GB/s
iozone 1M random write 1.311 GB/s
iozone 4K random read 35.023 MB/s
iozone 4K random write 89.889 MB/s

PiBenchmarks.com result: https://pibenchmarks.com/benchmark/66519/

SanDisk Extreme 128 GB microSD

Benchmark Result
fio 1M sequential read 87.7 MB/s
iozone 1M random read 82.22 MB/s
iozone 1M random write 66.57 MB/s
iozone 4K random read 5.28 MB/s
iozone 4K random write 2.64 MB/s

PiBenchmarks.com result: https://pibenchmarks.com/benchmark/66562/

curl https://raw.githubusercontent.com/geerlingguy/pi-cluster/master/benchmarks/disk-benchmark.sh | sudo bash

Run benchmark on any attached storage device (e.g. eMMC, microSD, NVMe, SATA) and add results under an additional heading. Download the script with curl -o disk-benchmark.sh [URL_HERE] and run sudo DEVICE_UNDER_TEST=/dev/sda DEVICE_MOUNT_PATH=/mnt/sda1 ./disk-benchmark.sh (assuming the device is sda).

Also consider running PiBenchmarks.com script.

Network

iperf3 results:

(Be sure to test all interfaces, noting any that are non-functional.)

GPU

Memory

tinymembench results:

Click to expand memory benchmark result ``` tinymembench v0.4.10 (simple benchmark for memory throughput and latency) ========================================================================== == Memory bandwidth tests == == == == Note 1: 1MB = 1000000 bytes == == Note 2: Results for 'copy' tests show how many bytes can be == == copied per second (adding together read and writen == == bytes would have provided twice higher numbers) == == Note 3: 2-pass copy means that we are using a small temporary buffer == == to first fetch data into it, and only then write it to the == == destination (source -> L1 cache, L1 cache -> destination) == == Note 4: If sample standard deviation exceeds 0.1%, it is shown in == == brackets == ========================================================================== C copy backwards : 9132.2 MB/s (2.2%) C copy backwards (32 byte blocks) : 9063.5 MB/s C copy backwards (64 byte blocks) : 9066.0 MB/s C copy : 9735.1 MB/s C copy prefetched (32 bytes step) : 9854.6 MB/s C copy prefetched (64 bytes step) : 9884.5 MB/s C 2-pass copy : 4832.7 MB/s C 2-pass copy prefetched (32 bytes step) : 7227.8 MB/s (0.1%) C 2-pass copy prefetched (64 bytes step) : 7534.9 MB/s C fill : 28274.8 MB/s (0.2%) C fill (shuffle within 16 byte blocks) : 28267.1 MB/s (0.1%) C fill (shuffle within 32 byte blocks) : 28271.7 MB/s (0.1%) C fill (shuffle within 64 byte blocks) : 28201.9 MB/s NEON 64x2 COPY : 9999.8 MB/s NEON 64x2x4 COPY : 9886.3 MB/s NEON 64x1x4_x2 COPY : 7281.4 MB/s NEON 64x2 COPY prefetch x2 : 9959.3 MB/s NEON 64x2x4 COPY prefetch x1 : 10235.1 MB/s NEON 64x2 COPY prefetch x1 : 9937.7 MB/s NEON 64x2x4 COPY prefetch x1 : 10237.2 MB/s --- standard memcpy : 9964.5 MB/s standard memset : 28159.3 MB/s --- NEON LDP/STP copy : 10003.4 MB/s NEON LDP/STP copy pldl2strm (32 bytes step) : 10007.5 MB/s NEON LDP/STP copy pldl2strm (64 bytes step) : 10032.5 MB/s NEON LDP/STP copy pldl1keep (32 bytes step) : 10137.0 MB/s NEON LDP/STP copy pldl1keep (64 bytes step) : 10129.5 MB/s NEON LD1/ST1 copy : 9888.0 MB/s NEON STP fill : 28219.8 MB/s (0.2%) NEON STNP fill : 28176.8 MB/s ARM LDP/STP copy : 9974.1 MB/s ARM STP fill : 28187.6 MB/s (0.1%) ARM STNP fill : 28129.3 MB/s ========================================================================== == Memory latency test == == == == Average time is measured for random memory accesses in the buffers == == of different sizes. The larger is the buffer, the more significant == == are relative contributions of TLB, L1/L2 cache misses and SDRAM == == accesses. For extremely large buffer sizes we are expecting to see == == page table walk with several requests to SDRAM for almost every == == memory access (though 64MiB is not nearly large enough to experience == == this effect to its fullest). == == == == Note 1: All the numbers are representing extra time, which needs to == == be added to L1 cache latency. The cycle timings for L1 cache == == latency can be usually found in the processor documentation. == == Note 2: Dual random read means that we are simultaneously performing == == two independent memory accesses at a time. In the case if == == the memory subsystem can't handle multiple outstanding == == requests, dual random read has the same timings as two == == single reads performed one after another. == ========================================================================== block size : single random read / dual random read 1024 : 0.0 ns / 0.0 ns 2048 : 0.0 ns / 0.0 ns 4096 : 0.0 ns / 0.0 ns 8192 : 0.0 ns / 0.0 ns 16384 : 0.0 ns / 0.0 ns 32768 : 0.0 ns / 0.0 ns 65536 : 0.0 ns / 0.0 ns 131072 : 1.1 ns / 1.5 ns 262144 : 2.2 ns / 2.8 ns 524288 : 3.5 ns / 4.0 ns 1048576 : 9.4 ns / 12.2 ns 2097152 : 14.1 ns / 15.4 ns 4194304 : 64.7 ns / 102.6 ns 8388608 : 153.4 ns / 215.7 ns 16777216 : 202.0 ns / 255.4 ns 33554432 : 226.9 ns / 269.7 ns 67108864 : 241.1 ns / 276.5 ns ```

Phoronix Test Suite

Results of the pi-general-benchmark.sh:

geerlingguy commented 1 year ago

I'm perplexed as to how the Orange Pi 5 (with the RK3588S) is benchmarking faster than the RK3588 in the Rock 5 model B, at least according to both my testing and Christopher Barnatt. Maybe an issue with the Debian 11 image provided by Radxa?

samuk commented 1 year ago

I'd be interested how it goes with the Armbian kernel/ ISO both the Rock & Orange are supported so it might equal them out to be running the same kernel?

TheRemote commented 1 year ago

Curious. I tried this really quickly on my Orange Pi 5 vs. Rock 5B:

orangepi@orangepi5:~$ sysbench --threads=8 cpu run
sysbench 1.0.20 (using system LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 8
Initializing random number generator from current time

Prime numbers limit: 10000

Initializing worker threads...

Threads started!

CPU speed:
    events per second: 13758.30

General statistics:
    total time:                          10.0007s
    total number of events:              137621

Latency (ms):
         min:                                    0.38
         avg:                                    0.58
         max:                                   28.04
         95th percentile:                        1.10
         sum:                                79836.93

Threads fairness:
    events (avg/stddev):           17202.6250/6752.63
    execution time (avg/stddev):   9.9796/0.01

orangepi@orangepi5:~$ uname -a
Linux orangepi5 5.10.110-rockchip-rk3588 #1.1.0 SMP Fri Jan 6 15:58:17 CST 2023 aarch64 GNU/Linux

orangepi@orangepi5:~$ cat /etc/os-release
PRETTY_NAME="Orange Pi 1.1.0 Bullseye"
NAME="Debian GNU/Linux"
VERSION_ID="11"
VERSION="11 (bullseye)"
VERSION_CODENAME=bullseye
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

And the Rock 5B:

rock@rock-5b:~$ sysbench --threads=8 cpu run
sysbench 1.0.20 (using system LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 8
Initializing random number generator from current time

Prime numbers limit: 10000

Initializing worker threads...

Threads started!

CPU speed:
    events per second: 14803.50

General statistics:
    total time:                          10.0006s
    total number of events:              148066

Latency (ms):
         min:                                    0.36
         avg:                                    0.54
         max:                                   11.03
         95th percentile:                        1.10
         sum:                                79954.50

Threads fairness:
    events (avg/stddev):           18508.2500/9344.12
    execution time (avg/stddev):   9.9943/0.00

rock@rock-5b:~$ uname -a
Linux rock-5b 5.10.66-27-rockchip-gea60d388902d #rockchip SMP Mon Oct 24 08:25:47 UTC 2022 aarch64 GNU/Linux

rock@rock-5b:~$ cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 11 (bullseye)"
NAME="Debian GNU/Linux"
VERSION_ID="11"
VERSION="11 (bullseye)"
VERSION_CODENAME=bullseye
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

Great catch. I couldn't reproduce it quickly using sysbench but I'd imagine that you're on to something. I know there are a lot of revisions of Rock 5B already whereas for the Orange Pi 5 we should have relatively the same board. I wonder if some of this can be explained by board revisions / differences on the Rock 5B.

Which version of the board do you have? I have V1.42.2022.08.29. There are already several different versions of the Rock 5B out there:

They don't make it clear what the changes are to these at all but they all have their own spec sheets / schematics / etc. on the Radxa site.

EDIT: Actually I can see in your picture that you have the exact same revision as me. I might have a better explanation for this.

Check this out:

rock@rock-5b:/sys/devices/system/cpu/cpufreq/policy0$ lscpu
Architecture:                    aarch64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
...
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:        Mitigation; __user pointer sanitization
Vulnerability Spectre v2:        Not affected

Not impacted by Spectre v2 meaning security mitigations are enabled. Now for the Orange Pi 5:

orangepi@orangepi5:~$ lscpu
Architecture:                    aarch64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
...
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:        Mitigation; __user pointer sanitization
Vulnerability Spectre v2:        Vulnerable: Unprivileged eBPF enabled

Oh yeah. Somebody is cheating here. Orange Pi 5 has security mitigations disabled for Spectre v2. That is definitely going to have a performance impact! It's definitely an unfair advantage to have the security mitigations shut off however they're doing it whether it's via the kernel cmdline / built into the kernel itself / etc.

This setting did not impact my simple sysbench test but your more comprehensive benchmarking is utilizing functions of the CPU that this is going to have an impact on. That's definitely a point in favor of comprehensive benchmarking!

The Rock 5B is definitely configured a lot more securely here even though it's hurting it on the benchmarking!

TheRemote commented 1 year ago

I actually have a follow-up on this for you to try. I was able to update the Radxa Rock 5B kernel. You and I were both running the stock kernel 5.10.66-27.

The Radxa updates are broken in the stock image. If you try to do sudo apt update it will tell you you need to add the apt repository key. Here's how we fix it with a one-liner:

wget -O - apt.radxa.com/bullseye-stable/public.key | sudo apt-key add -

Now you can do:

sudo apt update && sudo apt upgrade

It will tell you that ~70 packages will be upgraded and a few packages will be downgraded. Let it. This took me all the way up to kernel 5.10.110 (after a reboot):

rock@rock-5b:~$ uname -a
Linux rock-5b 5.10.110-35-rockchip-g98c1daa32982 #rockchip SMP Thu Jan 12 03:36:12 UTC 2023 aarch64 GNU/Linux

Now our kernel is 5.10.110 just like the Orange Pi 5's. They are (very nearly) identical versions.

That's not the interesting part though. Take a look at this:

rock@rock-5b:~$ lscpu
Architecture:                    aarch64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
...
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:        Mitigation; __user pointer sanitization
Vulnerability Spectre v2:        Vulnerable: Unprivileged eBPF enabled

The security settings now match! It's for the worse as far as vulnerabilities go since they're both now vulnerable (yay) but it's very good for benchmarking purposes that they're now the same.

Can you try updating your kernel after fixing the apt key for Radxa and retest and see if the scores are more reasonable?

geerlingguy commented 1 year ago

The security settings now match! It's for the worse as far as vulnerabilities go since they're both now vulnerable (yay) but it's very good for benchmarking purposes that they're now the same.

Heh... that's not a great endorsement, that when you update your system, you end up with more vulnerabilities :P

But I will try.

RadxaYuntian commented 1 year ago

I did not recall we touched the mitigation configuration, so this might be a side effect of updating our kernel to a newer Rockchip SDK release. In any case I'll add them to my kernel config override.

TheRemote commented 1 year ago

Absolutely! I would have tried against the Geekbench result but I didn't realize it needs a license. It's pretty cheap at least but it's hard to want to pay for someone else's benchmark when I am developing one because it makes me feel like "ugh, I should just add whatever features I want to pay someone else for to mine!".

It turns out you can toggle this setting too. On the Orange Pi 5:

root@orangepi5:/home/orangepi# sudo lscpu
Architecture:                    aarch64
...
Vulnerability Spectre v2:        Vulnerable: Unprivileged eBPF enabled

Now the magic to shut it off:

echo 1 > /proc/sys/kernel/unprivileged_bpf_disabled

root@orangepi5:/home/orangepi# sudo lscpu
Architecture:                    aarch64
...
Vulnerability Spectre v2:        Mitigation; CSV2, BHB

And now we have security mitigations enabled again (but only until a reboot). Interestingly these still don't quite match toggling the setting. The only way I got them to completely match was updating the Rock 5B's kernel to the latest.

Here's an interesting reference. They actually added a new kernel flag to disable the BPF mitigations in the official kernel because according to this article it can impact benchmarks by up to 20%. This is specifically BHB (branch history buffer). It's also specifically high performance ARM chips that are having the issue with the mitigations to the extent where they are adding new kernel flags specifically for it.

You can toggle the Rock 5B's mitigations in this same way but only after updating to the new kernel. This kernel flag won't exist in the -66 kernel. You can reenable the security mitigations with that flag temporarily until a reboot (or you could add it to sysctl.conf etc.).

This might not be anything you guys did at Radxa. I'd imagine it's upstream. Maybe the reason the settings match after updating is simply due to upstream kernel changes in the RockChip kernel perhaps? It behaves completely identically to the Orange Pi 5 when they are both on RockChip kernel 110 (although I understand there's a Radxa branch etc.).

It sounds like more changes are coming related to this. Official flags in 6.1 at least although maybe we're a ways from seeing that come all the way down to the consumer boards.

samuk commented 1 year ago

Is anyone intersted in comparing with the 6.xx kernel? https://github.com/armbian/build/releases/tag/23.02.0-trunk.0191 @RadxaYuntian perhaps you could test and officially support Armbian on these devices?

RadxaYuntian commented 1 year ago

@TheRemote: The security config has been enabled: https://github.com/radxa-repo/bsp/commit/8584bc2648fd6a59a64f6fb8f416197b9bd92e67 This config set unprivileged_bpf_disabled to 2, so admin can either change it to 0 to enable unprivileged eBPF, or to 1 to disable it until next reboot. You can check the linked SUSE KB article for some additional details.

@samuk: We won't officially support Armbian since ultimately Armbian cannot provide the same hardware support as the Rockchip SDK (which is a good thing because Rockchip SDK comes with its own compromises), and we don't have enough resources to support 2 systems. We do submit PRs to Armbian because some of our customers want to use that one, but they usually understand the limitations, and are capable for developing their own solutions.

TheRemote commented 1 year ago

@RadxaYuntian That's fantastic, I'm familiar with the 2 setting for privileged users and that's great that is available. Using it in trusted/secure places would be my preference as well. It's definitely a more precise and elegant solution to give you options to still take advantage of that performance when safe / appropriate.

Thank you for such a fast response and taking care of this!

samuk commented 1 year ago

@RadxaYuntian understood. My particular interest is the upcoming CM5 modules. Would you able to provide $free hardware to the Armbian devs to assist their work?

RadxaYuntian commented 1 year ago

We have provided free hardware samples to known projects for quite awhile. However, that also depends on if they are interested in it. Last time when I offered Zero 2 to 2 Armbian developers only one person took it.

rpardini commented 1 year ago

Last time when I offered Zero 2 to 2 Armbian developers only one person took it.

True. I took it a few months ago, and is it's a great/fantastic board, and Armbian support for it is in now what I consider in good shape, with all firmware and patches included, and fully working mainline u-boot 22.10 and kernel 6.2-rc5; thanks to Neil Armstrong & co. I just wish it was... actually released so it can be mainlined...

rpardini commented 1 year ago

Ref the rk3588(s) boards: Armbian has a dedicated legacy/vendor kernel with the 2.6.x/5.10.x stuff from rk/Radxa, plus extra patches/drivers/overlays etc. The OPi5 has joined that party and the Khadas Edge2 is heading there too. Adding the Radxa variants (CM5, A/B, etc) shouldn't be too hard.

We've also an edge kernel carrying some of sre's mainline patches for the rk3588 -- but that's not really usable yet, but we're ready for when it is.

github-actions[bot] commented 11 months ago

This issue has been marked 'stale' due to lack of recent activity. If there is no further activity, the issue will be closed in another 30 days. Thank you for your contribution!

Please read this blog post to see the reasons why I mark issues as stale.

geerlingguy commented 10 months ago

FYI I've run a few more tests (phoronix test suite ones, added to the OP), running Armbian's latest release for the Orange Pi because their download folder in Google Drive is empty, and running the latest Debian 11 build (35, I think?) for the Rock 5 B, and the OP5 is still faster in a couple benchmarks, but others are neck in neck.

lukeomatik commented 8 months ago

Hi. Is it wise to assume sort of same results for the Orange Pi 5 Plus? It is a RK3588 (non S) with 2 Independent 2.5gbps ports (they did a dirty job on the usb3.0 ports instead, ive read the schematics)