enjoy-digital / litex_agilex5_test

Initial Test/Support of LiteX on Intel Agilex5 FPGAs.
3 stars 0 forks source link

LPDDR4 status #6

Closed trabucayre closed 2 months ago

trabucayre commented 4 months ago

IDLE (arvalid always low)

lpddr_idle

lpddr_idle.vcd.zip

Start read (arvalid high during one clock cycle)

lpddr_start_read lpddr_start_read.vcd.zip

read end (rvalid goes low, rlast goest high)

lpddr_last_valid_8192 lpddr_last_valid_8192.vcd.zip

Dolu1990 commented 3 months ago

Hi ^^ @trabucayre

By curiousity, what was the frequency of the scope_clk ?

trabucayre commented 3 months ago

Hi, around 100MHz (gtkwave time is wrong :) )

enjoy-digital commented 3 months ago

@Dolu1990: This behaviour was in fact related to an issue in Quartus 24.1. Quartus 24.2 fixes this issue. @trabucayre is now looking at briging the LiteX 32-bit AXI interface to the 256-bit AXI interface of the DRAM controller and will share updates here.

trabucayre commented 3 months ago
litex> mem_write 0x80000000 0xcafebabe

litex> mem_write 0x80000008 0xAABBCCDD

litex> mem_read 0x80000008
Memory dump:
0x80000008  be ba fe ca                                      ....
litex> mem_read 0x80000010
Memory dump:
0x80000010  be ba fe ca                                      ....

read_sequence_20240729

Write sequence looks fine:

But read operation seems to access first word

trabucayre commented 3 months ago

Same bitstream but with:

litex> mem_read 0x80000040
Memory dump:
0x80000040  62 5b 5b db                                      b[[.

read_sequence_20240729_2

Adress seems fine but issue is related to 32b <-> 256b select

enjoy-digital commented 3 months ago

Thanks @trabucayre, the accesses on the 256-bit bus seems fine and issue indeed seems related to 32-bit data selection from the returned 256-bit. It would be worth seeing if this behaves similarly on the simulation (just return a dummy 256-bit value and see if selection is correct).

enjoy-digital commented 3 months ago

@trabucayre: The default values of the adapter will probably have to be adusted: https://github.com/enjoy-digital/litex_verilog_axi_test/blob/master/verilog_axi/axi/axi_adapter.py#L21-L23, especially: convert_narrow_burst

It could be useful to study the code here: https://github.com/alexforencich/verilog-axi/blob/25912d48fec2abbf3565bbefe402c1cff99fe470/rtl/axi_adapter_rd.v

trabucayre commented 3 months ago

Same behavior. I have to improve information with:

litex> mem_write 0x80000000 0xdeadbeef

litex> mem_write 0x80000004 0xaabbccdd

litex> mem_write 0x80000008 0xcafebabe

litex> mem_write 0x8000000c 0x12345678

litex> mem_read 0x80000000
Memory dump:
0x80000000  ef be ad de                                      ....            

litex> mem_read 0x80000004
Memory dump:
0x80000004  dd cc bb aa                                      ....            

litex> mem_read 0x80000008
Memory dump:
0x80000008  ef be ad de                                      ....            

litex> mem_read 0x8000000c
Memory dump:
0x8000000c  dd cc bb aa                                      ....            

litex> mem_read 0x80000008
Memory dump:
0x80000008  ef be ad de                                      ....            

litex> mem_read 0x8000000c
Memory dump:
0x8000000c  dd cc bb aa                                      ....            

litex> mem_read 0x8000000c 32
Memory dump:
0x8000000c  dd cc bb aa ef be ad de dd cc bb aa ef be ad de  ................
0x8000001c  dd cc bb aa 01 60 9b 6d 03 b0 ed b6 01 60 9b 6d  .....`.m.....`.m

It look like a mask/shift issue when addressing data > 64b in a 256b area. 0x0 or 0x08 have the same value, and 0x4 or 0xc have the same value (but r.data is correctly filled).

I have already started to study code and with this behavior it's maybe more easy to focus on some few parts.

Dolu1990 commented 3 months ago

Note, looking at the lpddr latency for the read traces, it seems to be ~60 cycles, which at 100 Mhz mean 600 ns => that is a lot. For reference, on Arty A7 @ 100 Mhz (with litedram ctrl @ 100 Mhz + 800 Mtransfer DDR3) it is between 15-25 cycles latency.

At which frequancy the lpddr controler is running ? Maybe just just got unlucky with your trace, and you got it just at the moment it was doing a DDR refresh XD

enjoy-digital commented 3 months ago

@trabucayre: If you think this could be an issue in the Verilog AXI Adapter, it could also be worth doing 32 <-> 64 <-> 128 <-> 256 adaptations and see if it behaves differently. (So with 3 adapters).

trabucayre commented 3 months ago

@enjoy-digital I will test. This screenshot shows simulation with an hardcoded data and signals used to decode and shift data

sim_read

By reading axi_adapter_rd.v:

trabucayre commented 3 months ago

With a case statement to manually select a word according to addr_reg slice sim shows the correct sequence. Maybe something wrong with >> operator.

Edit: This fix works with agilex5 too.

trabucayre commented 3 months ago

With manual decoding or by cascading AXIInterface:

litex_bios_main_ram_test

enjoy-digital commented 3 months ago

@trabucayre: Great, thanks! Now that we have a first version working refining will be easier. That's possible verilog_axi modules are less tested on Intel FPGA than on Xilinx FPGA.

enjoy-digital commented 3 months ago

@trabucayre: Please also commit the modified adapter in case it could be useful later.

enjoy-digital commented 3 months ago

Thanks, BTW, if this work, latency will probably be reduced with it, so we can maybe switch to it for the Linux bitstreams.

trabucayre commented 3 months ago

verilog_axi_rd_decode_32b_256b.patch seems not have effects to the memspeed.

enjoy-digital commented 2 months ago

This is now working, we can close.