enjoy-digital / litedram

Small footprint and configurable DRAM core
Other
375 stars 120 forks source link

Leveling fails for DFI databits < 32 #256

Closed jedrzejboczar closed 3 years ago

jedrzejboczar commented 3 years ago

I noticed this issue while working on DFI rate converter (converts DFI from faster clock to slower clock with more phases). I created a wrapper around A7DDRPHY that moves the PHY to work in sys2x->sys8x (instead of sys->sys4x) and adds DFI conversion from sys to sys2x (so that the wrapper PHY is seen in sys by the controller). A7DDRPHY on ArtyA7 by default uses DFI with 4 phases and dfi_databits=32. After conversion the new DFI has to use 8 phases with dfi_databits=16 (to maintain burst size).

But I think the problem is not directly related to this DFI rate converter. I set up a fresh Litex env and applied the followin changes: arty-8-bit-dfi-16-bit-failure.zip (modified platform file to use only 8 DQ lines). This results in dfi_databits=16 and leveling failure when building the default target with:

python litex-boards/litex_boards/targets/digilent_arty.py --load --build

The problematic part of code is in sdram_write_read_check_test_pattern in BIOS where it reads rddata into tst using csr_rd_buf_uint8 and then compares the values: https://github.com/enjoy-digital/litex/blob/6b8a35a2f8e3e90f505357c73f659774ade9a6c8/litex/soc/software/liblitedram/sdram.c#L347-L350. The code assumes #define DFII_PIX_DATA_BYTES DFII_PIX_DATA_SIZE*CONFIG_CSR_DATA_WIDTH/8 which means it will be 4 here (DFII_PIX_DATA_SIZE=1, CONFIG_CSR_DATA_WIDTH=32), but there actually are only 2 bytes per DFI phase.

However using DFII_PIX_DATA_BYTES=2 also fails because csr_rd_buf_uint8 works incorrectly for a CSR with length 2, because it copies only the higher part of the word (0x0000), without the data. You can use the following simple C file that demonstrates this main.zip (run with gcc main.c && ./a.out) which prints:

  buf[0..8] = { ff ff ff ff ff ff ff ff }
Read csr_rd_buf_uint8 #4:
 * csr_read_simple from @0x82000010 => 0x00009988
  buf[0..8] = { 00 00 99 88 ff ff ff ff }
Resetting buf
  buf[0..8] = { ff ff ff ff ff ff ff ff }
Read csr_rd_buf_uint8 #2:
 * csr_read_simple from @0x82000010 => 0x00009988
  buf[0..8] = { 00 00 ff ff ff ff ff ff }

Is the behavior of csr_rd_buf_uint8 correct?

enjoy-digital commented 3 years ago

Thanks @jedrzejboczar, I just also hit this issue while using ECC on some designs. The issue in fact was here as soon as the DFI data registers were using partial CSRs (ex with ECC, the DFII_PIX_DATA_BYTES in my case was 10, which is not a multiple of 4 with csr_data_width=32). Your case with DFII_PIX_DATA_BYTES=2 is similar.

The DFI_PIX_DATA_BYTES computation is fixed with https://github.com/enjoy-digital/litex/commit/b29a99cd0b5f322ae76bf1b91c60e73d21792271.

The CSR accessors are fixed with: https://github.com/enjoy-digital/litex/commit/bc77aa37f0bf27c026ff68fe1dc2d477aad7d34a

To investigate it, I reused your minimal repro, modified it a bit and also tested things in simulation with litex_sim. It attaching this just in case we need it later. Thanks a lot for your minimal repro, this has been a very convenient basis to investigate this issue.

I think we case close this issue. Please re-open if you test it and still see the issue.

main.zip litedram_issue_256.diff.txt