This PR proposes to register the fetched instruction in fetch stage before using it in decoder.
This allows us to use bigger memories without timing violations, as the net delay between the memory and the decoder is considerable.
The initial implementation drops support for compressed instructions.
Preliminary results:
Baseline
Registered
Setup
-0.079
0.015
LUTs
3118
3133
FFs
1353
1453
CoreMark
202
184
A small toll to pay for a little bit more headroom in timing.
Maybe with this change we can re-add the branch prediction and gain performance?
This PR proposes to register the fetched instruction in fetch stage before using it in decoder. This allows us to use bigger memories without timing violations, as the net delay between the memory and the decoder is considerable.
The initial implementation drops support for compressed instructions. Preliminary results:
A small toll to pay for a little bit more headroom in timing. Maybe with this change we can re-add the branch prediction and gain performance?