ThomasKaiser / sbc-bench

Simple benchmark for single board computers
BSD 3-Clause "New" or "Revised" License
676 stars 78 forks source link

Proposal: add stockfish benchmark #55

Closed ThomasKaiser closed 2 years ago

ThomasKaiser commented 2 years ago

From cnx-software.

First invocation on Rock 5B in lazy mode (phoronix-test-suite benchmark pts/stockfish-1.4.0) already ended up with the board freezing at the 2nd stockfish run. Attaching fan to power and repeating again also again freeze during 2nd stockfish bench 128 8 24 default depth run.

General problem was already known since so far on some boards highest DRAM clock wasn't usable and users needed to switch from 2112 MHz to 1560 MHz for stable operation.

My board hasn't seen any freezes on highest DRAM clock so this was a surprise. By updating my Armbian image to latest version I was hoping for getting most recent boot BLOBs as part of u-boot package. It now reads ii linux-u-boot-rock-5b-legacy 22.11.0-trunk.0106 arm64 Uboot loader 2017.09 but problems got even worse and now the board freezes on 2112 MHz DRAM clock already at 1st benchmark execution. Maybe @amazingfate can comment on whether my OS image is expected to run on latest BLOBs or not?

With lower DRAM clock everything works as expected but at 2112 MHz DRAM clock the board freezes regardless of the A76's clockspeeds (and as such DVFS/consumption) so it looks solely related to DRAM clock:

A76 clock DRAM clock Watts SoC temp Nodes per second
2360 MHz 528 MHz 8-9W 40°C 3238057
2360 MHz 1068 MHz 9-10W 43.5°C 4122771
2360 MHz 1560 MHz 10-11W 46°C 4653285
2360 MHz 2112 MHz 12W 46°C freeze
1800 MHz 2112 MHz 8-9W 39°C freeze

With other CPU benchmarks I haven't seen consumption exceeding 9W on Rock 5B so stockfish is really a potent load generator / stability tester. On top of making heavy use of SIMD extensions it also is heavy on memory access: walking through the different DRAM clockspeeds ended up with significantly different scores: https://openbenchmarking.org/result/2211099-NE-2211093NE82

Quick check on an AMD EPYC 7232P (8C/16T) thing also hints at stockfish being more demanding than both cpuminer and 7-zip:

First chart is from a NetIO powermeter (measuring at the wall), 2nd is the server's internal BMC showing PSU1 (PSU2 is always in standby on this machine so the whole productive consumption is PSU1's thing), the last two are the BMC measurements for CPU and DRAM separately (though no idea to which number the memory controller contributes):

Bildschirmfoto 2022-11-09 um 19 52 39 Kopie

ThomasKaiser commented 2 years ago

And while we're at it let's benchmark some benchmarks. Here with regard to the influence of DRAM clockspeed: how this has an effect on especially memory bandwidth and latency and the scores used currently by sbc-bench + stockfish.

The values as follows:

DRAM 7-zip single 7-zip multi AES memcpy memset 4M ns 64M ns kH/s stockfish
528 2587 13050 1344830 3570 8450 63.2/99.3 235.8/271.3 22.06 3238057
1068 2940 15120 1344500 6270 16950 46.9/73.6 166.3/192.2 22.05 4122771
1560 3086 16040 1344060 8620 24390 38.6/58.8 139.9/158.0 22.03 4653285
2112 3167 16640 1343220 10850 29330 35.7/53.7 123.2/139.0 22.03 freeze

To interpret the results (not talking about memory bandwidth/latency since these numbers are self-explanatory):

Speaking about the 7-zip multi scores... those above were all generated with same kernel version (a smelly 5.10 Rockchip BSP kernel). But with different kernel versions multi-threaded behaviour can change significantly as already outlined in my reasoning to use 7-zip as benchmark.

Let's have a look on kernel version and ODROID-XU4:

Kernel / Compiler 7-zip single 7-zip multi CPU utilisation compression CPU utilisation decompression
Kernel 4.9 / GCC 6.3 1622 6370 64% 78%
Kernel 4.14 / GCC 7.3 1633 7100 64% 78%
Kernel 5.4 / GCC 9.3 1604 8980 94% 84%

The single-threaded score is the same with all kernel versions but the multi-threaded scores differ a lot and also the reported CPU utilization. It's a scheduler and not a benchmark problem.

ThomasKaiser commented 2 years ago

Another suggestion from cnx-software: rule out the A55 cores:

root@rock-5b:/home/tk# echo performance >/sys/devices/platform/dmc/devfreq/dmc/governor
root@rock-5b:/home/tk# echo performance >/sys/devices/system/cpu/cpufreq/policy4/scaling_governor
root@rock-5b:/home/tk# echo performance >/sys/devices/system/cpu/cpufreq/policy6/scaling_governor
root@rock-5b:/home/tk# for i in 3 2 1 0 ; do echo 0 >/sys/devices/system/cpu/cpu${i}/online; done
root@rock-5b:/home/tk# htop (confirm that A55 cores are offline)
root@rock-5b:/home/tk# phoronix-test-suite benchmark pts/stockfish-1.4.0
...
Stockfish 15:
    pts/stockfish-1.4.0 [Total Time]
    Test 1 of 1
    Estimated Trial Run Count:    3                      
    Estimated Time To Completion: 14 Minutes [09:38 CET] 
        Started Run 1 @ 09:24:08
        Started Run 2 @ 09:28:58

Rock 5B frozen after 4:45m. Reported consumption 'at wall': 9-10W (all measurements with active fan which contributes 700mW to measurements).

ThomasKaiser commented 2 years ago

First implementation done: https://github.com/ThomasKaiser/sbc-bench/commit/bddc8d44c04c744ad0c341a480e3312a8dfce24e

amazingfate commented 2 years ago

Armbian has updated to the latest bl31 firmware since this commit. You have to see the current used firmware from serial console output,

ThomasKaiser commented 2 years ago

@amazingfate sbc-bench -s reliably freezes my Rock 5B even with latest BLOBs on 2112 MHz DRAM clock.