DietPi-Config | CPU performance benchmark

Fourdee commented 6 years ago

Add ability to benchmark CPU performance, using sysbench and a MAX prime value.

This will give users a rough idea of CPU performance, when we compare the total time against other systems.

Fourdee commented 6 years ago

Sysbench testing (removed in favor of bash + int benchmark): https://github.com/Fourdee/DietPi/issues/1253#issuecomment-346881878

Odroid C2 (clearly "cheating" this test?):

Z83-ii | Intel Atom Cherry Trail x5-Z8300

RPi 3:

BBB:

Fourdee commented 6 years ago

Completed.

ThomasKaiser commented 6 years ago

Odroid C2 (clearly "cheating" this test?):

No, just you simply using exactly the wrong tool: sysbench can be used as a compiler benchmark but not to compare CPU performance of different systems/architectures.

An arm64 Stretch binary running on RPi 3 will finish in less than 3.1 seconds (while upstream arm64 binaries for Jessie take ~30% longer). With sysbench everything that matters are compiler switches and that's why you get different numbers depending on the GCC version the binary has been built with (that's the distro dependency) and especially allowing for certain features (that's why arm64 binaries on ARMv8 SoCs outperform armhf binaries by magnitudes). BTW: if you choose sysbench 0.5 instead of the 0.4.12 Debian still uses numbers get also worse.

TL;DR: There's no other tool compared with sysbench that is less capable to show CPU performance comparing different platforms

Fourdee commented 6 years ago

@ThomasKaiser

No, just you simply using exactly the wrong tool: sysbench

Whats the correct tool? 😄

With sysbench everything that matters are compiler switches and that's why you get different numbers depending on the GCC version the binary has been built with (that's the distro dependency) and especially allowing for certain features (that's why arm64 binaries on ARMv8 SoCs outperform armhf binaries by magnitudes).

Thanks for the info 👍

So in theory, the only way to ensure a consistent benchmark across platforms:

We can compile our own sysbench binaries, ensuring compile options and GCC versions match (where possible) for each CPU arch.
Find an alternative (but still the GCC version and compile options will be a factor across CPU arch's)
Code my own bash benchmark, integer based only as floating point would require bc, again, causing compile options and GCC versions to be a factor. This all assuming bash compile options again are not a factor.

Fourdee commented 6 years ago

Multi-threaded bash integer benchmark:

DietPi users can simply run `dietpi-config` > Tools > Benchmark > CPU bench to run.

Split over total cores, to reach max value of `1000000`. Lower time = faster.

RPi Zero = 5 Minutes, 40 seconds ⏳
RPi 3 = 24.2 seconds
RPi 3 B+ = 20.6 seconds
Rock64 = 19.5 seconds
NanoPi K1+ (H5) = 18.76 seconds
Odroid C2 = 16.8 seconds
Z83-II = 15.5 seconds
NanoPi Fire 3 = 10.2 seconds
XU4 = 11.6 seconds
Asus TB = 11.1 seconds
NanoPC-T4 / RockPro64 = 9.2 seconds
Odroid N1 = 8.5 seconds
FX6300 (VM) = 3.6 seconds
Ryzen 5 2600 = 1.63 seconds 👏

Non-DietPi users: Copy/paste all

cat << _EOF_ > /tmp/dietpi-bench
#!/bin/bash
target_max_int=1000000
cores=\$(nproc --all)
int_split=\$((\$target_max_int / \$cores ))
aStart_Int=()
aEnd_Int=()

#Split the max int target based on total cores
for (( i=0; i<\$cores; i++ ))
do

    aEnd_Int[\$i]=\$(( (\$i + 1) * \$int_split ))
    aStart_Int[\$i]=\$(( \${aEnd_Int[\$i]} - \$int_split ))

    echo \${aStart_Int[\$i]} \${aEnd_Int[\$i]}

done

Run_Bench()
{

    while (( \${aStart_Int[\$1]} < \${aEnd_Int[\$1]} ))
    do

        ((aStart_Int[\$1]++))

    done

}

#Launch benchmark threads
for (( i=0; i<\$cores; i++ ))
do

    Run_Bench \$i &

done

#Wait for jobs to finish
for job in \`jobs -p\`
do
    echo \$job
    wait \$job
done

#delete[]
unset aStart_Int
unset aEnd_Int

_EOF_
chmod +x /tmp/dietpi-bench

#Run
time /tmp/dietpi-bench

ThomasKaiser commented 6 years ago

Whats the correct tool?

There is none since it's all about use case. :)

I normally rely on 7z b for a rough estimate of integer performance (and also memory bandwidth which influences results slightly) but on boards with low memory conditions 7z might get killed by the kernel since allocating too much memory and then some use cases aren't really affected by above 'integer performance'.

Or the other way around: on certain SoCs we deal with accelerators for this and that so taking this not into account would be misleading. If someone searches for full disk encryption or a board to be used as VPN endpoint then all those 64-bit SoCs that have ARMv8 AES crpyto extensions licensed are magnitudes faster than those who don't (the latter group consists of all 32-bit SoCs, Raspberry Pi 3 and ODROID-C2).

MichaIng / DietPi