Open n-kremeris opened 1 year ago
On Thu, Aug 17, 2023 at 06:34:49AM -0700, Norbert Kremeris wrote:
I can confirm that it boots fine when used with Litex sim when the rocket is regenerated with the following configurations:
class LitexConfig_linux_1_1 extends Config( new WithNBigCores(1) ++ new WithEdgeDataBits(64) ++ new WithInclusiveCache() ++ new BaseLitexConfig )
class LitexConfig_linux_1_2 extends Config( new WithNBigCores(1) ++ new WithEdgeDataBits(128) ++ new WithInclusiveCache() ++ new BaseLitexConfig )
I have a very stupid, basic question: how did you "connect" the
L2 InclusiveCache and RocketChip sources? I.e., where did you copy
the rocket-chip-inclusive-chache git repo w.r.t. the rocket-chip repo,
and what (if any) files did you edit to "link" them, before adding
"WithInclusiveCache()" to your chosen class in
rocket-chip/src/main/scala/system/Configs.scala
and running the verilog generation/elaboration, presumably via make ... CONFIG=...
?
I have a very stupid, basic question: how did you "connect" the L2 InclusiveCache and RocketChip sources? I.e., where did you copy the rocket-chip-inclusive-chache git repo w.r.t. the rocket-chip repo, and what (if any) files did you edit to "link" them, before adding "WithInclusiveCache()" to your chosen class in rocket-chip/src/main/scala/system/Configs.scala and running the verilog generation/elaboration, presumably via
make ... CONFIG=...
?
I have added it loosely following Chipyard's examples. I have checked out the inclusive cache repository in rocket-chip/src/main/scala/rocket-chip-inclusive-cache, and I have added the withInclusiveCache()
mixin to the relevant Litex configurations inside system/Configs.scala as shown on the first post:
class LitexConfig_linux_1_1 extends Config(
new WithNBigCores(1) ++
new WithEdgeDataBits(64) ++
new WithInclusiveCache() ++
new BaseLitexConfig
)
class LitexConfig_linux_1_2 extends Config(
new WithNBigCores(1) ++
new WithEdgeDataBits(128) ++
new WithInclusiveCache() ++
new BaseLitexConfig
)
To be sure, I also deleted the generated-src
folder that is included in pythondata-cpu-rocket.
I can confirm that the L2 cache is definitely included when building both the internal rocketchip emulator and when running the verilog regenerate code in Litex's update.sh
(the compilation output shows the L2 connections being made and it is included in the generated internal DTS, the L2 exists in the verilog, flushing via a flush register works). From my understanding, there should be no additional modifications required for the cache to work (it should be able to directly replace the rocketchips L2 coherence Broadcast Hub).
To change the configuration, i edited the top level Makefrag
file 9 CONFIG ?= $(CFG_PROJECT).DefaultConfig
(or, as you mentioned, the configuration can be passed to the simulator via make directly)
I did not have to edit any other files inside the RocketChip sources.
Something to note is that without the L2 cache, targeting digilent nexys video, both cpu-mem-width 1 and 2 work fine.
On Thu, Aug 17, 2023 at 07:10:23AM -0700, Norbert Kremeris wrote:
I have checked out the inclusive cache repository in rocket-chip/src/main/scala/rocket-chip-inclusive-cache, and I have added the withInclusiveCache() mixin to the relevant Litex configurations inside system/Configs.scala
Thanks, that will help me build an L2-capable rocket variant to tinker around with on my end, and see what happens!
From my understanding, there should be no additional modifications required for the cache to work
If it's still exposing the three AXI ports (mem, mmmio, and frontend_bus), then it should behave identically as far as the surrounding LiteX environment is concerned, whether it has an L2 internally or not.
MMIO is connected to a bus with all the LiteX device ports (referred to as CSRs), MEM is connected directly to the LiteDRAM data port in a point-to-point configuration, with nothing else inbetween, and DMA is routed to/from the frontend_bus port on the device side to the MEM port right through the Rocket chip, which is expected to maintain coherence between memory and internal caches.
None of that should change if L2 is added to Rocket, internally.
So if things don't behave as expected, I'd instinctively suspect the L2 cache as the "culprit"... :)
But let me actually try running some of this stuff first, before I could have an actual informed opinion...
I tried 4-cores and mem-width 2 (128 bit width) on my nexys-video board, and it did hang during memtest. However, it also failed to pass timing at the requested 50MHz frequency. Trying 2-cores now (compilation takes a while), and will try 1 core after that.
But it'd be interesting to know if your build passed or failed timing as well. If timing fails, we have no way to tell with any sort of confidence whether anything related to the L2 cache is or isn't buggy... :)
EDIT: I will then try using my sitlinv-stlnv7325 (v2) board; it can run L2-less rocket cores at 100MHz, so maybe downgrading to 50MHz will allow for enough headroom to pass timing closure, and get some useful test results. Can't use it with e.g. sata (yet), but it should be good for this test :)
@n-kremeris : I managed to test on the stlnv7325, where at 75MHz we're fast enough for the DRAM to work at all (i.e., in the L2-cache-less version), and we can pass vivado timing for the variant where L2-inclusive-cache is enabled.
I could reproduce your observation (having wider-than-64 MEM ports and L2-cache enabled will result in a hard hang any time we try to access any memory, but doing external-to-rocket MEM port width adaptation works).
IOW, L2-cache enabled rocket is happy as long as its MEM port isn't changed from the 64-bit default width.
I've opened https://github.com/chipsalliance/rocket-chip-inclusive-cache/issues/25 to request confirmation of my hypothesis that the inclusive-cache assumes default mem port width, i.e., that this is a bug (well, rather an oversight) that fails to account for use cases where one would want Rocket's externally visible port to be wider than the default...
In the mean time, I should learn to read Chisel... :D
So, I decided to do some measurements:
comparing 64-bit wide MEM
port, with vs. without L2 inclusive-cache:
utilization | with-L2 | without-L2 |
---|---|---|
LUT as logic | 83345(40.90%) | 78580(38.56%) |
Reg. as flip-flop | 42641(10.46%) | 39983( 9.81%) |
BRAM | 198(44.61%) | 62(14.04%) |
comparing 8xWide (512 bit) vs. 64-bit MEM
port (the latter using the LiteX-provided width adapter to 512-wide LiteDRAM:
utilization | 8x internal | 8x via litex |
---|---|---|
LUT as logic | 81831(40.15%) | 78580(38.56%) |
Reg. as flip-flop | 43663(10.71%) | 39983( 9.81%) |
BRAM * | 62(14.04%) | 62(14.04%) |
* no difference, since both have same (no) L2 cache
As it turns out, having LiteX do the MEM <-> LiteDRAM
width adaptation results in fewer resources (LUT, FF) being utilized as compared to when we have the width "conversion" internal to Rocket.
I'm going to have to re-assess the pros and cons (i.e., why did I "trust" or "prefer" Rocket's own internal width conversion over that provided externally by LiteX, and would it make sense to avoid tinkering with Rocket's native port width at all, thus eliminating some of the (way too many) sub-variants in litex-hub)...
EDIT: Test performed using:
litex-boards/litex_boards/targets/sitlinv_stlv7325_v2.py --build --cpu-type rocket \
--cpu-variant full --cpu-num-cores 2 --cpu-mem-width [1|8] \
--sys-clk-freq 75e6 --with-ethernet --with-sdcard --with-sata --sata-gen 1
I built bitstream for ecpix5 (native litedram width 128-bit, or 2x) using the 1x (64-bit wide) rocket model, with a width adapter provided by LiteX.
After loading opensbi, kernel, and initrd from sdcard (something that used to work fine with bitstream built using a 2x-wide rocket variant), it gets stuck at "liftoff" -- which IMO means the data it copied from sdcard (using DMA) got corrupted somehow. (I think, and @enjoy-digital please correct me if I'm wrong, that booting over ethernet doesn't use DMA, whereas booting from sdcard or sata does).
Either way, there's something to be said for keeping the wider mem-port variants around until we have a better understanding of what's actually happening.
I'm also going to wait for the inclusive-cache issue to get some responses, hopefully at some point soon... :)
@gsomlo Thank you very much for your time spent investigating and writing up your results, I will be following the issues around this :)
I added L2 cache to the full
64-bit width variants via https://github.com/litex-hub/pythondata-cpu-rocket/commit/018e94119d8777f2934ce3fb3ba8b02947cacc33
64-bit mem port width seems to be the only configuration supported by upstream with L2 cache, so using L2-enabled rocket with 128-bit (or wider) LiteDRAM will have to be accomplished by fixing the AXI up-converter provided by LiteX (see also issue #1753), given how it's unlikely that any future rocket-chip fixes and improvements will make their way into Litex (see https://github.com/chipsalliance/rocket-chip/issues/3483)
I found that removing https://github.com/enjoy-digital/litex/blob/master/litex/soc/integration/soc.py#L1635-L1650 (using Wishbone to up-convert between rocket and LiteDRAM) is a correct (if potentially suboptimal) workaround. One can use 64-bit (single-width) rocket variants on which the L2 cache just happens to work correctly...
Hi All!
I'm trying to integrate the L2 InclusiveCache from ChipsAlliance (https://github.com/chipsalliance/rocket-chip-inclusive-cache) with a single Rocket core to be used inside the Litex SoC.
I can confirm that it boots fine when used with Litex sim when the rocket is regenerated with the following configurations:
And below are the cmdlines used to launch Litex sim:
litex_sim --with-sdram --sdram-data-width 64 --cpu-type rocket --cpu-variant linux --cpu-num-cores 1 --cpu-mem-width 1 --jobs 12 --threads 12
litex_sim --with-sdram --sdram-data-width 128 --cpu-type rocket --cpu-variant linux --cpu-num-cores 1 --cpu-mem-width 2 --jobs 12 --threads 12
Both options boot to bios without issues and successfully pass the built in memory test.
Additionally, i have tried running a tiny baremetal program using the internal verilator based rocketchip simulator (rocket-chip/emulator) and that also works with both the 64 and the 128 bit mem configuration (the application is loaded into the main_ram area)
The design with the included L2 cache works when synthesized for a real FPGA targetting the Digilent Nexys Video board, but only when using
--cpu-mem-width 1
, which implicitly makes the Litex SoC builder generate a memory width adapter from 64bits to 128bits for LiteDRAM, as the Digilent Nexys video board uses 128bit wide memory bus. Below is the command used to build this version of the bitstream:./litex-boards/litex_boards/targets/digilent_nexys_video.py --build --cpu-type rocket --cpu-variant linux --cpu-num-cores 1 --cpu-mem-width 1 --sys-clk-freq 50e6 --with-ethernet --bus-data-width 64 --bus-address-width 32 --csr-csv ./csr.csv
However, when using the Rocket variant with 128 bit memory bus width with
--cpu-mem-width 2
, the bios memtest hangs upon the first write attempt and does not proceed, as can be seen below:Based on the fact that litex_sim (and rocket's internal sim) works when using L2 with memory width set to either 64 or 128, I assume there is something strange happening from the Litex.
I would really appreciate some advice on how to narrow down where the problem is, as I would like to avoid having to use the memory width adaptor. Thanks in advance!