SpinalHDL / VexRiscv

A FPGA friendly 32 bit RISC-V CPU implementation
MIT License
2.41k stars 408 forks source link

Benchmark VexRISCV using embench #77

Open mithro opened 5 years ago

mithro commented 5 years ago

https://www.embench.org/

Dhrystone and Coremark have been the defacto standard microcontroller benchmark suites for the last thirty years, but these benchmarks no longer reflect the needs of modern embedded systems. Embench was explicitly designed to meet the requirements of modern connected embedded systems. The benchmarks are relevant, portable, and well implemented.

https://www.sigarch.org/embench-recruiting-for-the-long-overdue-and-deserved-demise-of-dhrystone-as-a-benchmark-for-embedded-computing/

Dolu1990 commented 4 years ago

@mithro Got a working port of embench to work on hardware.

So, the CPU config on which i tried it is :

To compare it against ri5cy (1.28 DMIPS/Mhz) and got :

Benchmark           Speed
---------           -----
aha-mont64           0.92
crc32                0.74
cubic                0.55
edn                  0.86
huffbench            0.99
matmult-int          0.83
minver               0.77
nbody                0.74
nettle-aes           0.88
nettle-sha256        1.12
nsichneu             0.30
picojpeg             0.79
qrduino              0.85
sglib-combined       0.76
slre                 0.79
st                   0.81
statemate            0.96
ud                   0.80
wikisort             0.93
---------           -----
Geometric mean       0.79
Geometric SD         1.31
Geometric range      0.43
All benchmarks run successfully

So a relative speed of 0.79 for embench, vs a relative speed of 0.84 for dhrystone, which seem reasonable.

It is ported there : https://github.com/SpinalHDL/embench-iot/tree/master/config/riscv32/boards/vexriscv

with the following commands :

./build_all.py --clean --arch riscv32  --board vexriscv --cc riscv64-unknown-elf-gcc --cflags "-O2 -march=rv32im -mabi=ilp32" --ldflags "-march=rv32im -mabi=ilp32"
/benchmark_speed.py --target-module run_vexriscv_gdb --gdb-command riscv64-unknown-elf-gdb  --timeout 100 --cpu-mhz 10 

Note that while the system was running at 100 Mhz, i specified 10 Mhz to make the reduce the duration of the tests.

mithro commented 4 years ago

FYI - @kgugala @pgielda

Dolu1990 commented 4 years ago

A few things about the comparison,

mithro commented 4 years ago

@Dolu1990 -- I don't trust any benchmark not running on real hardware :-) -- Out of interest what does vexriscv get on a verilator simulated version?

Dolu1990 commented 4 years ago

@mithro Currently i only ported embench to run on targets with GDB, i will port it for pure simulations and come back with results :)

Else i forgot, CPU with flash as the cortex m4 benched in https://github.com/jeremybennett/embench-iot-results/blob/master/details/cortexm4-armv7m-gcc-9.2-o2.mediawiki only (stm32f4-discovery) run at 16 Mhz in the bench, the lower frequency they run, the less penality they get when they have a branch prediction miss.

see https://www.st.com/resource/en/reference_manual/DM00031020-.pdf
Table 10. Number of wait states according to CPU clock (HCLK) frequency

So if the stm32f4 was running at full speed (168 Mhz), there would be a flash wait state of 6 cycle, instead of 1 cycle at 16 Mhz. Which is likely to have a big impact on performance results.

Dolu1990 commented 4 years ago

https://github.com/embench/embench-iot-results/issues/4

mithro commented 4 years ago

FYI - @jeremybennett

Dolu1990 commented 4 years ago

Found a avoidable critical path in VexRiscv, which opened a few possibilities to improve performances while keeping 100 Mhz (mainly less fetch stages, branch done in execute instead of memory stage). So on Litex ArtyA7 SMP, i got

Benchmark           Speed
---------           -----
aha-mont64           1.02
crc32                0.87
cubic                0.60
edn                  0.88
huffbench            1.08
matmult-int          0.84
minver               0.87
nbody                0.84
nettle-aes           0.87
nettle-sha256        1.15
nsichneu             0.32
picojpeg             0.95
qrduino              0.98
sglib-combined       0.88
slre                 0.91
st                   0.90
statemate            1.36
ud                   0.90
wikisort             1.06
---------           -----
Geometric mean       0.88
Geometric SD         1.33
Geometric range      0.51
All benchmarks run successfully

Still have to test it with a simulater zero latency memory system

Dolu1990 commented 4 years ago

(above results using again ri5cy as baseline)

iamkarthikbk commented 2 years ago

Hello.

I am unable to access https://github.com/SpinalHDL/embench-iot/tree/master/config/riscv32/boards/vexriscv Was the port removed?

I want to see exactly how it was ported, I'm trying to do the same for another board. thanks

pgielda commented 2 years ago

Just some renames:

https://github.com/SpinalHDL/embench-iot/tree/master/config/riscv32/boards/vexriscv_litex https://github.com/SpinalHDL/embench-iot/tree/master/config/riscv32/boards/vexriscv_saxon https://github.com/SpinalHDL/embench-iot/tree/master/config/riscv32/boards/vexriscv_sim