ThomasKaiser / sbc-bench

Simple benchmark for single board computers
BSD 3-Clause "New" or "Revised" License
658 stars 78 forks source link

Results for Ampere Altra Dev Platform M96-28 with 96 GB RAM (16G x 6) #72

Closed geerlingguy closed 1 year ago

geerlingguy commented 1 year ago
$ sudo ./sbc-bench.sh -j
Starting to examine hardware/software for review purposes...

Average load and/or CPU utilization too high (too much background activity). Waiting...

Too busy for benchmarking: 22:00:35 up 35 min,  2 users,  load average: 0.20, 0.10, 0.03,  cpu: 0%
Too busy for benchmarking: 22:00:40 up 35 min,  2 users,  load average: 0.18, 0.10, 0.03,  cpu: 0%
Too busy for benchmarking: 22:00:45 up 36 min,  2 users,  load average: 0.17, 0.10, 0.03,  cpu: 0%
Too busy for benchmarking: 22:00:50 up 36 min,  2 users,  load average: 0.23, 0.11, 0.03,  cpu: 0%
Too busy for benchmarking: 22:00:55 up 36 min,  2 users,  load average: 0.30, 0.13, 0.04,  cpu: 0%
Too busy for benchmarking: 22:01:00 up 36 min,  3 users,  load average: 0.27, 0.13, 0.04,  cpu: 0%

...

I didn't see anything happen for a few minutes (does it have a 5 or 10 minute delay or something? I'm letting it sit in another window. I see load average of 0.00 for the past 5 min now.

Note: it took a while before the benchmark stopped spitting out 'Too busy for benchmarking'. Even though nothing was running on the system, it showed cppc_fie consuming 1-2% for a while... then after a while it settled down and got started:

   1390 root      rt   0       0      0      0 S   0.3   0.0   0:12.38 cppc_fie
geerlingguy commented 1 year ago

If I run it without -j, I get further: http://ix.io/4zGI

jgeerling@adlink-ampere:~/sbc-bench$ sudo ./sbc-bench.sh

sbc-bench v0.9.42

Installing needed tools: apt -f -qq -y install lm-sensors sysstat powercap-utils p7zip, tinymembench, ramlat, mhz. Done.
Checking cpufreq OPP. Done (results will be available in 7-10 minutes).
Executing tinymembench. Done.
Executing RAM latency tester. Done.
Executing OpenSSL benchmark. Done.
Executing 7-zip benchmark. Done.
Checking cpufreq OPP again. Done (8 minutes elapsed).

Results validation:

  * Measured clockspeed not lower than advertised max CPU clockspeed
  * No swapping
  * Background activity (%system) OK
  * Throttling occured

Memory performance
memcpy: 10132.6 MB/s
memset: 44753.7 MB/s

7-zip total scores (3 consecutive runs): 249521,248641,249970, single-threaded: 3858

OpenSSL results:
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-128-cbc     796165.06k  1576202.24k  2023527.51k  2171779.07k  2227093.50k  2232833.37k
aes-128-cbc     791871.84k  1579594.97k  2023827.97k  2169844.05k  2226918.74k  2232669.53k
aes-192-cbc     750426.13k  1387396.22k  1718240.43k  1807296.85k  1857686.19k  1861642.92k
aes-192-cbc     744861.90k  1387979.75k  1717576.70k  1807480.15k  1857890.99k  1861675.69k
aes-256-cbc     715428.22k  1227382.25k  1484381.87k  1563873.62k  1593264.81k  1596145.66k
aes-256-cbc     716891.95k  1230795.03k  1486725.72k  1563989.67k  1593251.16k  1596080.13k

Full results uploaded to http://ix.io/4zGI
ThomasKaiser commented 1 year ago

Thanks Jeff for your time and letting me know of the issues. At least the 2.8 GHz of your system are confirmed. :)

Wrt the initial delay it should not happen since the conditions are as follows:

# Only continue if average load is less than 0.1 or averaged CPU utilization is lower
# than 2.5% for 30 sec. Please note that average load on Linux is *not* the same as CPU
# utilization: https://www.brendangregg.com/blog/2017-08-08/linux-load-averages.html

Since I can't reproduce what went wrong I disabled the loadavg/utilization check on hosts with 64 or more cores for now.

As for the results there's an interesting bit. @wtarreau received the platform recently though 'only' equipped with one Q80-26 (80 cores at 2.6 GHz). Though his RAM config should be slightly slower than yours (6 x 16GB at 2933 MT/s vs. yours at 3200 MT/s) his memory latency is slightly better than yours according to tinymembench which is reflected in 7-ZIP MIPS scores.

When looking at single-threaded results his 3748 is at 97.1% of the 3858 your system generates while the clockspeed difference should show something closer to 93%. Same with the multi-threaded results: his 80 cores at 2.6 GHz score 214390 vs. 249380 (that's only 14% but the performance difference should be closer to 23%). The Huaqin P6410 (2 x Ampere Altra Max) scores back this since unlike stuff like Geekbench the 7-zip benchmark scales well with count of cores as long as the memory subsystem can keep up.

So I wonder whether there's something available in UEFI setting wrt memory timings that could improve results significantly?

wtarreau commented 1 year ago

At least on my side there's not much. Well, you can force the DDR freq if you want, but I wouldn't do that!

I managed to mostly "fix" the frequency issues by disabling CPPC and LPI in the BIOS. Now it boots at 2.6 but after some load it goes down to 2.3 and stays there. I documented my experience on their forum here: https://www.ipi.wiki/community/forum/topic/100157/cpu-frequency-capped-to-23-ghz-instead-of-26-ghz

Regarding FIE I've observed the same and even up to 10% sometimes! Once disabled in the BIOS it doesn't happen anymore. Note that you can also blacklist the cppc_cpufreq module. In this case it will not change the frequency from the boot and it should remain at full speed. However if you unload it it will go down to 1 GHz. There's a patch that was merged in 6.1 that addresses this issue and supports disabling this FIE thing. I haven't tried it.

For now I'm running with CPPC disabled so that the machine boots at the nominal frequency.

ThomasKaiser commented 1 year ago

https://www.ipi.wiki/community/forum/topic/100157/cpu-frequency-capped-to-23-ghz-instead-of-26-ghz

If you ever get a reaction? If the 1st level supporters even understand what's happening?

To clarify... what do you refer to as FIE?

wtarreau commented 1 year ago

No reaction yet. But yesterday was off in the US, maybe that counts.

FIE: apparently it's "frequency invariance". You see in "top" that cppc_fie eats quite a bit of CPU all the time. There's an explanation and a patch here, that were merged: https://lore.kernel.org/lkml/20220818211619.4193362-2-jeremy.linton@arm.com/

5.15 doesn't have the option to disable it though. However when disabling both CPPC (the cpufreq driver) and LPI (low power idle) in the BIOS the module doesn't load anymore and we don't waste CPU cycles anymore.

What's frustrating is to see that if we had waited a bit more, for the same price we could have had the 96c at 2.8G version!