CompressedGen Resource Usage

SpinalHDL / VexRiscv

A FPGA friendly 32 bit RISC-V CPU implementation

MIT License

2.47k stars 413 forks source link

CompressedGen Resource Usage #239

Closed robindust-ce closed 2 years ago

robindust-ce commented 2 years ago

This is more of a general wondering than an issue but maybe someone can explain. I am testing the influence of "CompressedGen = true" on the resource usage of murax. I got the following results on Cyclone-V (ALM usage of Vex itself):

murax default config + Mul/Div + BarrelShift + bypassing + static branch prediction: 1020 ALMs
same as above with compressedGen = true: 1063 ALMs

I was pleasantly surprised to see that compressed instruction support comes at such low cost. The following comparison, however, has a much bigger difference in ALM usage:

murax default config: 465 ALMs
murax default config + compressedGen: 638 ALMs

Can someone explain/verify how/that those results make sense?

Dolu1990 commented 2 years ago

The main impact of CompressedGen = true is in the combinatorial path delay i would say, (which can be mitigated with injectorStage = true, at the cost of 1 cycle more branch penality)

Then, 43 alm in one case, 173 in the other =>

How where the clock constrained in the tests ?

Was the clock relaxed or stressed ?

robindust-ce commented 2 years ago

I'm not sure which information exactly you are looking for as I'm still learning and don't know the terminology. I basically just derive the pll clock at 150 MHz when compressedGen = false and at 100 MHz when compressedGen = true (compiler is set to "performance high effort").

Configuration: Clock Frequency / Reported Maximum Clock Frequency:

High Performance Murax (1020 ALM): 150/152 MHz
High Performance Murax compressed (1063 ALMs) : 100/129 MHz
Default Murax (465 ALMs): 150/162 MHz
Default Murax compressed (638 ALMs): 100/129 MHz

Dolu1990 commented 2 years ago

So, i would say, put the pll clock much lower, something about 20 Mhz, that way, the synthesis tool will not try some funky area heavy optimisations. This maybe one reason why in some case, the area usage has "jitter".

I would say that curerntly, the "outliner" result is the High Performance Murax (1020 ALM) one.

I would not be surprise that the synthesis tool tried to optimze something very much by spending some area. And that when there is RVC, RVC is using about 150 lut, and move the critical path, so the crazy expensive optimisatiion done in the previous run do not occure anymore, saving area, offseting the 150 lut usage down.

robindust-ce commented 2 years ago

Good point, I didn't consider that introducing a new critical path might heavily influence the optimizations by the compiler. I will test with low clock frequencies when I find some time and report the results here.

Thank you very much for the quick responses!

robindust-ce commented 2 years ago

So at 20 MHz and balanced compiler optimization I got the following results:

High Performance Murax: 911 ALMs
High Performance Murax compressed: 975 ALMs
Default Murax: 460 ALMs
Default Murax compressed: 566 ALMs

So now the differences are 64 vs 106 ALMs.

Dolu1990 commented 2 years ago

Cool, thanks for the numbers :D

i think that's ok, right ?

robindust-ce commented 2 years ago

Yes, that's good.

Congratulations on this great project btw! I'm having a lot of fun exploring it.