Open franktaTian opened 1 month ago
Hi,
Technicaly speaking, i think it "already" support it, just that the ParamSimple class (thing which provide a way to configure the CPU in a easy way) doesn't provide support for more than 2. It is the only play where things relating that are hardcoded. Note that i never tested anythign with more than 2 issue.
Hi,
Technicaly speaking, i think it "already" support it, just that the ParamSimple class (thing which provide a way to configure the CPU in a easy way) doesn't provide support for more than 2. It is the only play where things relating that are hardcoded. Note that i never tested anythign with more than 2 issue.
Cool!
I tried to get a 4-issue Vexii simply by copying and adding if (lanes>=3)
and if (lanes>=4 )
in Param.scala
just like:
https://github.com/SpinalHDL/VexiiRiscv/blob/05ed94c61b7042b7e5e5f8798a9b9e85f6d4d8c2/src/main/scala/vexiiriscv/Param.scala#L629-L653
and also added num of decoders to 4. There is no problem for the generation and simulation.
I benchmarked 3-issue and 4-issue RV32IMC on dhrystone and coremark:
The performance difference will be higher if you toggle more performance options like late-alu.
Anyway, I believe there is no big problem with multi-issue, you can modify Param.scala
to get even more lanes XD
and also added num of decoders to 4. There is no problem for the generation and simulation.
LOL Nice :)
2-issue: 16149 Dhrystones/Second, 0.76 DMIPS/MHz. 1.53 Coremark/MHz.
Hmm that is weird, the performance are well bellow what it should be.
Did you enabled the branch predictors aswell ? Did you had caches ? One thing is that by default, most performance oriented feature are disabled.
The one thing were i can see have so many lane scale, is for AES (for instance) and well optimized code, as GCC will likely generate coupled code which do not take advantages of in order execution over all those lanes
Hmm that is weird, the performance are well bellow what it should be.
Did you enabled the branch predictors aswell ? Did you had caches ? One thing is that by default, most performance oriented feature are disabled.
I didn't enable anything beyond the default LOL. If those performance features are enabled, we can get a larger gap between 2-issue and 4-issue, like 4.16 Coremark/MHz v.s. 4.38 Coremark/MHz (tested with late-alu, lsu-l1, fetch-l1 and predictors).
There is a few more : withDispatcherBuffer = true // may do a big difference withAlignerBuffer = true // will not make a big difference
lsu-l1, fetch-l1
Did you increase the number of way to at least 4 ?
There is a few more : withDispatcherBuffer = true // may do a big difference withAlignerBuffer = true // will not make a big difference
lsu-l1, fetch-l1
Did you increase the number of way to at least 4 ?
Yeah, they do amplify the advantage of multi-issue! Got a 4.32 Coremark/MHz v.s. 4.85 Coremark/MHz after adding dispatcher buffer, and a 4.51 v.s. 5.04 after enabling all of them and increasing to 4 ways of L1. Looks like the dispatcher buffer matters more🤔.
Hi, Will VexiiRiscv be extended to support configuratble multi-issue ? For example, 4-issue not just 1 or 2 issue(s).