Open mithro opened 5 years ago
@mithro Got a working port of embench to work on hardware.
So, the CPU config on which i tried it is :
To compare it against ri5cy (1.28 DMIPS/Mhz) and got :
Benchmark Speed
--------- -----
aha-mont64 0.92
crc32 0.74
cubic 0.55
edn 0.86
huffbench 0.99
matmult-int 0.83
minver 0.77
nbody 0.74
nettle-aes 0.88
nettle-sha256 1.12
nsichneu 0.30
picojpeg 0.79
qrduino 0.85
sglib-combined 0.76
slre 0.79
st 0.81
statemate 0.96
ud 0.80
wikisort 0.93
--------- -----
Geometric mean 0.79
Geometric SD 1.31
Geometric range 0.43
All benchmarks run successfully
So a relative speed of 0.79 for embench, vs a relative speed of 0.84 for dhrystone, which seem reasonable.
It is ported there : https://github.com/SpinalHDL/embench-iot/tree/master/config/riscv32/boards/vexriscv
with the following commands :
./build_all.py --clean --arch riscv32 --board vexriscv --cc riscv64-unknown-elf-gcc --cflags "-O2 -march=rv32im -mabi=ilp32" --ldflags "-march=rv32im -mabi=ilp32"
/benchmark_speed.py --target-module run_vexriscv_gdb --gdb-command riscv64-unknown-elf-gdb --timeout 100 --cpu-mhz 10
Note that while the system was running at 100 Mhz, i specified 10 Mhz to make the reduce the duration of the tests.
FYI - @kgugala @pgielda
A few things about the comparison,
@Dolu1990 -- I don't trust any benchmark not running on real hardware :-) -- Out of interest what does vexriscv get on a verilator simulated version?
@mithro Currently i only ported embench to run on targets with GDB, i will port it for pure simulations and come back with results :)
Else i forgot, CPU with flash as the cortex m4 benched in https://github.com/jeremybennett/embench-iot-results/blob/master/details/cortexm4-armv7m-gcc-9.2-o2.mediawiki only (stm32f4-discovery) run at 16 Mhz in the bench, the lower frequency they run, the less penality they get when they have a branch prediction miss.
see https://www.st.com/resource/en/reference_manual/DM00031020-.pdf
Table 10. Number of wait states according to CPU clock (HCLK) frequency
So if the stm32f4 was running at full speed (168 Mhz), there would be a flash wait state of 6 cycle, instead of 1 cycle at 16 Mhz. Which is likely to have a big impact on performance results.
FYI - @jeremybennett
Found a avoidable critical path in VexRiscv, which opened a few possibilities to improve performances while keeping 100 Mhz (mainly less fetch stages, branch done in execute instead of memory stage). So on Litex ArtyA7 SMP, i got
Benchmark Speed
--------- -----
aha-mont64 1.02
crc32 0.87
cubic 0.60
edn 0.88
huffbench 1.08
matmult-int 0.84
minver 0.87
nbody 0.84
nettle-aes 0.87
nettle-sha256 1.15
nsichneu 0.32
picojpeg 0.95
qrduino 0.98
sglib-combined 0.88
slre 0.91
st 0.90
statemate 1.36
ud 0.90
wikisort 1.06
--------- -----
Geometric mean 0.88
Geometric SD 1.33
Geometric range 0.51
All benchmarks run successfully
Still have to test it with a simulater zero latency memory system
(above results using again ri5cy as baseline)
Hello.
I am unable to access https://github.com/SpinalHDL/embench-iot/tree/master/config/riscv32/boards/vexriscv Was the port removed?
I want to see exactly how it was ported, I'm trying to do the same for another board. thanks
https://www.embench.org/
https://www.sigarch.org/embench-recruiting-for-the-long-overdue-and-deserved-demise-of-dhrystone-as-a-benchmark-for-embedded-computing/