chipsalliance / Cores-VeeR-EH1

VeeR EH1 core
Apache License 2.0
822 stars 221 forks source link

Delay after fetching instructions when not using icache #85

Closed crazy-catlady closed 3 years ago

crazy-catlady commented 3 years ago

I have the following program:

and x3, x1, x12       # instr1
and x4, x5, x6         # instr2
and x7, x8, x10       # instr 3
and x12, x14, x16   # instr 4

and x13, x11, x2     # instr 5
and x20, x21, x22   # instr 6
and x23, x24, x25   # instr 7
and x26, x27, x28   # instr 8

The execution trace like this: image

What I notice here is that it takes 10 cycles for the new instructions (5,6) to arrive at the decoder. This is not beneficial for fast execution, since the CPU is kind of idle for 5 cycles between writing instructions 3 and 4 and reading instructions 5 and 6.

Are there configuration parameters to increase the instruction/fetch buffer? Such that it is possible to have more than 4 instructions in execution? I also couldn't really find out if this is due to the instruction buffer size in the decoder (which seems to have 4 instruction slots but somehow only has 2) or the fetch buffer size in the IFU.

Any ideas?

agrobman commented 3 years ago

this core should be able to execute 2 integer instructions per clock. Make sure you are running code from the cache and fast external memory or from ICCM. the MRAC CSR register needs to be programmed to enable caching ...

crazy-catlady commented 3 years ago

I turned on the icache and now it works as expected after the respective instructions are cached. Thanks!

However, I do not need 16 kB of icache, i want to have way less.

Is it safe to change NUM_WAYS and NUM_BANKS (https://github.com/chipsalliance/Cores-SweRV/blob/7332edc0adaa7e9a0c842d169154429e8d987786/design/ifu/ifu_ic_mem.sv#L228:L229) to do so?

agrobman commented 3 years ago

you can set cache size to 8K, I think - run swerv.config -help for possible build parameters ranges. BTW, you can check out more modern smaller EL2 core ...

crazy-catlady commented 3 years ago

Ok, I managed to tweak it down to 2048 Byte. However, it is more of a hack...I had to add modules to mem_lib.sv and modify the config script of course.

The EL2 does not exactly fit my purposes, since I need a core as pipelined as possible, preferably dual issue.

However, thank you very much for your help. I really appreciate it. All my questions are resolved now.