enjoy-digital / litex_agilex5_test

Initial Test/Support of LiteX on Intel Agilex5 FPGAs.
3 stars 0 forks source link

Improve AXI to LPDDR4 Access Efficiency. #12

Closed enjoy-digital closed 2 months ago

enjoy-digital commented 2 months ago

Currently, initial LPDDR4 accesses are being handled by a simple AXI 32-bit to 256-bit UpConverter. While this solution functions correctly, it is inefficient because each 32-bit access is expanded to a 256-bit transfer. This results in suboptimal utilization of the memory bandwidth.

To enhance access efficiency, we propose two possible improvements:

  1. Add an L2 Cache to the LiteX SoC:
    Introducing an L2 cache would not only act as a cache to reduce memory access latency but also handle the 32-bit to 256-bit conversion directly. This approach would improve efficiency for all CPUs.

    • We could consider using the core_axi_cache from ultraembedded for this purpose. It is an interesting option and worth testing, as it would also eliminate the dependency on verilog-axi.
  2. Increase the Memory Interface Width on the CPU:
    Alternatively, we could increase the memory interface width directly on the CPU to 256 bits. This involves a custom call to the CPU's add_memory_buses with a 256-bit width, connecting these interfaces directly to the DRAM in 256-bit mode.

    • This solution would work with CPUs like NaxRiscv and VexRiscv that utilize AXI interfaces.
    • Here’s an example of integrating a similar approach (from another project, for VexRiscv-SMP, converting between LiteDRAMNative to Wishbone):
    from litedram.frontend.wishbone import LiteDRAMNative2Wishbone
    self.cpu.add_memory_buses(address_width=32, data_width=64)
    for n, port in enumerate(self.cpu.memory_buses):
       bus = wishbone.Interface(address_width=32, data_width=64, addressing="word")
       self.submodules += LiteDRAMNative2Wishbone(port, bus, base_address=0x40000000)
       self.bus.add_master(name=f"mem{n}", master=bus)

    This code would need to be adapted to connect the buses directly to the LPDDR4 core instead of the SoC's main bus.

trabucayre commented 2 months ago

With @ultraembedded l2_cache lpddr4_l2_cache

Dolu1990 commented 2 months ago

There is realy something else which isn't right. if i remember well the logic analyser on the AXI port which was going to the DDR was showing very high latencies, right ? Overall, if that inferface show more than 200 ns latency (avg), i think there is a miss-configuration somewere.

trabucayre commented 2 months ago

Wtih L2_cache Linux is booting on Vexriscv-SMP with a boot time ¹ 19s

-> No needs to test/impement second optio and axi_interface module may be removed

trabucayre commented 2 months ago

axi_adapter is removed: core_axi_cache is now used by default.

enjoy-digital commented 2 months ago

This is now done and working, we can close.