enjoy-digital / litedram

Small footprint and configurable DRAM core
Other
375 stars 120 forks source link

Memtest fails on ZCU104 board #200

Closed kowalewskijan closed 3 years ago

kowalewskijan commented 4 years ago

I used the latest code to build Litex for ZCU104 board but in the result I got a following error during memtest:

--========== Initialization ============--
Initializing SDRAM...
SDRAM now under software control
Write leveling:
Command/Clk scan:
|00000000|0000|0000|0000| best: -1
Data scan:
m0: |1111111111111111111111| delay: -1
m1: |1110000000000000000001| delay: -1
m2: |1111111111111111111111| delay: -1
m3: |1111100000000000000000| delay: -1
m4: |1111111111111111111111| delay: -1
m5: |1111111111111111111111| delay: -1
m6: |1111111111000000000000| delay: -1
m7: |1111111111111111111111| delay: -1
Read leveling:
m0, b00: |00000000000000000000000000000000| delays: -
m0, b01: |00000000000000000000000000000000| delays: -
m0, b02: |00000000000000000000000000000000| delays: -
m0, b03: |00000000000000000000000000000000| delays: -
m0, b04: |00000000000000000000000000000000| delays: -
m0, b05: |00000000000000000000000000000000| delays: -
m0, b06: |00000000000000000000000000000000| delays: -
m0, b07: |00000000000000000000000000000000| delays: -
m0, b08: |00000000000000000000000000000000| delays: -
m0, b09: |00000000000000000000000000000000| delays: -
m0, b10: |00000000000000000000000000000000| delays: -
m0, b11: |00000000000000000000000000000000| delays: -
m0, b12: |00000000000000000000000000000000| delays: -
m0, b13: |00000000000000000000000000000000| delays: -
m0, b14: |00000000000000000000000000000000| delays: -
m0, b15: |00000000000000000000000000000000| delays: -
best: m0, b00 delays: -
m1, b00: |00000000000000000000000000000000| delays: -
m1, b01: |00000000000000000000000000000000| delays: -
m1, b02: |00000000000000000000000000000000| delays: -
m1, b03: |00000000000000000000000000000000| delays: -
m1, b04: |00000000000000000000000000000000| delays: -
m1, b05: |00000000000000000000000000000000| delays: -
m1, b06: |00000000000000000000000000000000| delays: -
m1, b07: |00000000000000000000000000000000| delays: -
m1, b08: |00000000000000000000000000000000| delays: -
m1, b09: |00000000000000000000000000000000| delays: -
m1, b10: |00000000000000000000000000000000| delays: -
m1, b11: |00000000000000000000000000000000| delays: -
m1, b12: |00000000000000000000000000000000| delays: -
m1, b13: |00000000000000000000000000000000| delays: -
m1, b14: |00000000000000000000000000000000| delays: -
m1, b15: |00000000000000000000000000000000| delays: -
best: m1, b00 delays: -
m2, b00: |00000000000000000000000000000000| delays: -
m2, b01: |00000000000000000000000000000000| delays: -
m2, b02: |00000000000000000000000000000000| delays: -
m2, b03: |00000000000000000000000000000000| delays: -
m2, b04: |00000000000000000000000000000000| delays: -
m2, b05: |00000000000000000000000000000000| delays: -
m2, b06: |00000000000000000000000000000000| delays: -
m2, b07: |00000000000000000000000000000000| delays: -
m2, b08: |00000000000000000000000000000000| delays: -
m2, b09: |00000000000000000000000000000000| delays: -
m2, b10: |00000000000000000000000000000000| delays: -
m2, b11: |00000000000000000000000000000000| delays: -
m2, b12: |00000000000000000000000000000000| delays: -
m2, b13: |00000000000000000000000000000000| delays: -
m2, b14: |00000000000000000000000000000000| delays: -
m2, b15: |00000000000000000000000000000000| delays: -
best: m2, b00 delays: -
m3, b00: |00000000000000000000000000000000| delays: -
m3, b01: |00000000000000000000000000000000| delays: -
m3, b02: |00000000000000000000000000000000| delays: -
m3, b03: |00000000000000000000000000000000| delays: -
m3, b04: |00000000000000000000000000000000| delays: -
m3, b05: |00000000000000000000000000000000| delays: -
m3, b06: |00000000000000000000000000000000| delays: -
m3, b07: |00000000000000000000000000000000| delays: -
m3, b08: |00000000000000000000000000000000| delays: -
m3, b09: |00000000000000000000000000000000| delays: -
m3, b10: |00000000000000000000000000000000| delays: -
m3, b11: |00000000000000000000000000000000| delays: -
m3, b12: |00000000000000000000000000000000| delays: -
m3, b13: |00000000000000000000000000000000| delays: -
m3, b14: |00000000000000000000000000000000| delays: -
m3, b15: |00000000000000000000000000000000| delays: -
best: m3, b00 delays: -
m4, b00: |00000000000000000000000000000000| delays: -
m4, b01: |00000000000000000000000000000000| delays: -
m4, b02: |00000000000000000000000000000000| delays: -
m4, b03: |00000000000000000000000000000000| delays: -
m4, b04: |00000000000000000000000000000000| delays: -
m4, b05: |00000000000000000000000000000000| delays: -
m4, b06: |00000000000000000000000000000000| delays: -
m4, b07: |00000000000000000000000000000000| delays: -
m4, b08: |00000000000000000000000000000000| delays: -
m4, b09: |00000000000000000000000000000000| delays: -
m4, b10: |00000000000000000000000000000000| delays: -
m4, b11: |00000000000000000000000000000000| delays: -
m4, b12: |00000000000000000000000000000000| delays: -
m4, b13: |00000000000000000000000000000000| delays: -
m4, b14: |00000000000000000000000000000000| delays: -
m4, b15: |00000000000000000000000000000000| delays: -
best: m4, b00 delays: -
m5, b00: |00000000000000000000000000000000| delays: -
m5, b01: |00000000000000000000000000000000| delays: -
m5, b02: |00000000000000000000000000000000| delays: -
m5, b03: |00000000000000000000000000000000| delays: -
m5, b04: |00000000000000000000000000000000| delays: -
m5, b05: |00000000000000000000000000000000| delays: -
m5, b06: |00000000000000000000000000000000| delays: -
m5, b07: |00000000000000000000000000000000| delays: -
m5, b08: |00000000000000000000000000000000| delays: -
m5, b09: |00000000000000000000000000000000| delays: -
m5, b10: |00000000000000000000000000000000| delays: -
m5, b11: |00000000000000000000000000000000| delays: -
m5, b12: |00000000000000000000000000000000| delays: -
m5, b13: |00000000000000000000000000000000| delays: -
m5, b14: |00000000000000000000000000000000| delays: -
m5, b15: |00000000000000000000000000000000| delays: -
best: m5, b00 delays: -
m6, b00: |00000000000000000000000000000000| delays: -
m6, b01: |00000000000000000000000000000000| delays: -
m6, b02: |00000000000000000000000000000000| delays: -
m6, b03: |00000000000000000000000000000000| delays: -
m6, b04: |00000000000000000000000000000000| delays: -
m6, b05: |00000000000000000000000000000000| delays: -
m6, b06: |00000000000000000000000000000000| delays: -
m6, b07: |00000000000000000000000000000000| delays: -
m6, b08: |00000000000000000000000000000000| delays: -
m6, b09: |00000000000000000000000000000000| delays: -
m6, b10: |00000000000000000000000000000000| delays: -
m6, b11: |00000000000000000000000000000000| delays: -
m6, b12: |00000000000000000000000000000000| delays: -
m6, b13: |00000000000000000000000000000000| delays: -
m6, b14: |00000000000000000000000000000000| delays: -
m6, b15: |00000000000000000000000000000000| delays: -
best: m6, b00 delays: -
m7, b00: |00000000000000000000000000000000| delays: -
m7, b01: |00000000000000000000000000000000| delays: -
m7, b02: |00000000000000000000000000000000| delays: -
m7, b03: |00000000000000000000000000000000| delays: -
m7, b04: |00000000000000000000000000000000| delays: -
m7, b05: |00000000000000000000000000000000| delays: -
m7, b06: |00000000000000000000000000000000| delays: -
m7, b07: |00000000000000000000000000000000| delays: -
m7, b08: |00000000000000000000000000000000| delays: -
m7, b09: |00000000000000000000000000000000| delays: -
m7, b10: |00000000000000000000000000000000| delays: -
m7, b11: |00000000000000000000000000000000| delays: -
m7, b12: |00000000000000000000000000000000| delays: -
m7, b13: |00000000000000000000000000000000| delays: -
m7, b14: |00000000000000000000000000000000| delays: -
m7, b15: |00000000000000000000000000000000| delays: -
best: m7, b00 delays: -
SDRAM now under hardware control
Memtest bus failed: 256/256 errors
Memtest data failed: 524288/524288 errors
Memtest addr failed: 8192/8192 errors
Memory initialization failed

Versions:

Build cmd: python litex-boards/litex_boards/targets/zcu104.py --cpu-type vexriscv --build

I did some research and I found out that when I reverted just this commit memtest started to work.

enjoy-digital commented 4 years ago

Thanks for the report and investigation. This was introduced to avoid errors with upstream Yosys. I will have a closer look at the generated code with or without this commit.

kowalewskijan commented 4 years ago

I made some more tests with the latest LiteX and it looks like this commit could be just one element of a more complex bug. When I built the latest LiteX with and without mentioned commit memtest failed anyway. Only when I reverted changes to revisions mentioned in issue and then reverted the commit memtest worked again. So I think the problem is much more complex and mentioned commit is not the ultimate fix for it.

enjoy-digital commented 4 years ago

Thanks @kowalewskijan for the feedback, i also did some test on the KCU105 and had the same conclusion. I'm going to investigate more.

enjoy-digital commented 4 years ago

@kowalewskijan: i'm no longer able to reproduce the issue on the KCU105. With upstream LiteX/LiteDRAM on the KCU105, do you still have a Command/Clk scan reporting only zeroes? Can you also try to lower the sys_clk_freq to 100MHz to see if it's working?

enjoy-digital commented 4 years ago

OK thanks for the test/results. When testing on the KCU105, memtest was passing at 125MHz but the read leveling scan not good on one module. I'll investigate on this and will probably ask you do some tests on the ZCU104 once i'll have improved things on the KCU105.

For now you can use 100MHz clock on the ZCU104.

kowalewskijan commented 4 years ago

I am sorry but I deleted the original post because I had a doubt that I changed the clock correctly. I have still zeros using the latest code, but I think I kind of tricked Vivado of clock value instead of actually changing it. What I did:

Correct me if I'm wrong, but I actually tell Vivado that the clock has different value than it actually is?

enjoy-digital commented 4 years ago

@kowalewskijan: in fact to change the frequency, you just need to modify sys_clk_freq here: https://github.com/litex-hub/litex-boards/blob/master/litex_boards/targets/zcu104.py#L54 to 100e6.

kowalewskijan commented 4 years ago

Thanks, I changed sys_clk_freq to 100e6, but I had to change also pll.create_clkout(self.cd_clk500, 400e6, with_reset=False) from 500e6 to 400e6 to avoid error:

File "litex/litex/soc/cores/clock.py", line 153, in compute_config
    raise ValueError("No PLL config found")
ValueError: No PLL config found

This is my clock summary when I did changes mentioned above:

------------------------------------------------------------------------------------------------
| Clock Summary                                 
| -------------                                 
------------------------------------------------------------------------------------------------

Clock                   Waveform(ns)         Period(ns)      Frequency(MHz)
-----                   ------------         ----------      --------------
clk125_p                {0.000 4.000}        8.000           125.000         
  main_clkout1          {0.000 1.241}        2.483           402.778         
  pll4x_clk             {0.000 1.241}        2.483           402.778         
    pll4x_clk_DIV4_INV  {4.966 9.931}        9.931           100.694         
    sys_clk             {0.000 4.966}        9.931           100.694         

Still leveling and memtest fails. Only when I changed these constraints in platform which I mentioned in previous post, I got a success. I will investigate more.

jedrzejboczar commented 4 years ago

@enjoy-digital As noted in https://github.com/enjoy-digital/litedram/pull/204#issue-426572558 I encountered some problems while testing DDR4 SPD parser.

I did some more tests today with @kowalewskijan and the results are a bit confusing. First tests were build with the commit reverted. When changing nothing, only tCCD or only tFAW, leveling failed, but when both timings were changed to the values from SPD data, then the leveling succeeded (zcu104_logs.zip). Without reverting that commit, tfaw_tccd still failed.

We've also tested with full module parameters generation from SPD for both MTA4ATF51264HZ and KVR21SE15S84 and for the first one leveling worked. However, when modifying MTA4ATF51264HZ parameters directly in modules.py to match thoes from SPD data, this failed. And the only real difference in verilog was that one gateware had SPD data stored in ROM and the other one didn't.

So all these results seem a bit random to me and I am not sure if this will help much. I've pushed the changes to our repos, maybe they will be helpful: https://github.com/antmicro/litex-boards/tree/zcu104-spd - ZCU104 command line arguments https://github.com/antmicro/litedram/tree/zcu104-spd - module parameters

kowalewskijan commented 4 years ago

I did some more extensive tests. Here are results:

Test case No. RAM module Description Result
1. MTA4ATF51264HZ increased tRC by 1 cycle, rate=1:4 Passed
2. MTA4ATF51264HZ increased tRC by 1 cycle, rate=1:2 Failed (1 module failed - m3)
3. MTA4ATF51264HZ increased tRC by 1 cycle, rate=1:1 Failed (all modules failed)
4. MTA4ATF51264HZ rate=1:4 Failed (all modules failed)
5. MTA4ATF51264HZ rate=1:2 Failed (modules m2 and m3 failed)
6. MTA4ATF51264HZ rate=1:1 Passed
7. KVR21SE15S84 increased tRC by 1 cycle, rate=1:4 Failed (only m5 module passed)
8. KVR21SE15S84 increased tRC by 1 cycle, rate=1:2 Failed (only m5 module passed)
9. KVR21SE15S84 increased tRC by 1 cycle, rate=1:1 Failed (only m5 module passed)
10. KVR21SE15S84 rate=1:4 Failed (only m5 module passed)
11. KVR21SE15S84 rate=1:2 Failed (all modules failed)
12. KVR21SE15S84 rate=1:1 Failed (only m5 module passed)
13. KVR21SE15S84 increased tRC by 5 cycles, rate=1:4 Failed (only m5 module passed)
14. KVR21SE15S84 fine_refresh_rate=2x, rate=1:4 Failed (only m5 module passed)
15. KVR21SE15S84 speedgrade=-1, rate=1:4 Failed (only m5 module passed)

I investigated cycles calculation functions, code around timing controllers, BIOS software for leveling and memtest and I did some simulations but without any hint where the bug hides. But I believe it is associated with timings somehow. I got success with MTA4ATF51264HZ as mentioned in the table with this configuration

enjoy-digital commented 4 years ago

Thanks @kowalewskijan for the results, what's the actual speedgrade of the MTA4ATF51264HZ that is used, 2666? Have you also checked the speedgrade of the KVR21SE15S8?

For the rate, i don't think it's worth iterating on it since it should be 1:4 for DDR4.

During the calibration, we are generating simple access patterns to the controller and not really stressing the controller, so even with small errors on timings, the calibration could pass. (but the memtest would fail). So there is probably something else. I'll also do more tests on a Ultrascale boards when i'll have more time.

Just in case, could you try commenting out this: https://github.com/enjoy-digital/litedram/blob/master/litedram/phy/usddrphy.py#L534-L535 and see it you have the same behavior?

kowalewskijan commented 4 years ago

@enjoy-digital I can confirm that MTA4ATF51264HZ has 2666 and KVR21SE15S8 has 2133 speedgrade in my setup. I generated bitstreams with commented out lines you had mentioned. For both Kingston and Micron RAMs leveling and memtest failed for all modules.

enjoy-digital commented 3 years ago

As tested recently, and with the recent improvements, the ZCU104 is now calibrating correctly:

        __   _ __      _  __
       / /  (_) /____ | |/_/
      / /__/ / __/ -_)>  <
     /____/_/\__/\__/_/|_|
   Build your hardware, easily!

 (c) Copyright 2012-2020 Enjoy-Digital
 (c) Copyright 2007-2015 M-Labs

 BIOS built on Dec 14 2020 11:10:14
 BIOS CRC passed (b1311357)

 Migen git sha1: 11a297f
 LiteX git sha1: 649edd18

--=============== SoC ==================--
CPU:            VexRiscv @ 125MHz
BUS:            WISHBONE 32-bit @ 4GiB
CSR:            32-bit data
ROM:            32KiB
SRAM:           8KiB
L2:             8KiB
SDRAM:          1048576KiB 64-bit @ 1000MT/s (CL-9 CWL-9)

--========== Initialization ============--
Initializing SDRAM @0x40000000...
Switching SDRAM to software control.
Write leveling:
  Cmd/Clk scan (0-334)
  |000000  |0000  |0000  |0000| best: -1
  Setting Cmd/Clk delay to -1 taps.
  Data scan:
  m0: |1110000000000000000011| delay: -
  m1: |1110000000000000000011| delay: -
  m2: |1111100000000000000000| delay: -
  m3: |1111100000000000000000| delay: -
  m4: |1111111000000000000000| delay: -
  m5: |1111111100000000000000| delay: -
  m6: |1111111111000000000000| delay: -
  m7: |1111111100000000000000| delay: -
Write latency calibration:
m0:6 m1:6 m2:6 m3:6 m4:6 m5:6 m6:6 m7:6
Read leveling:
  m0, b0: |00000000000000000000000000000000| delays: -
  m0, b1: |00000000000000000000000000000000| delays: -
  m0, b2: |11000000000000000000000000000000| delays: 09+-09
  m0, b3: |00001111111111111111000000000000| delays: 180+-128
  m0, b4: |00000000000000000000001111111111| delays: 428+-83
  m0, b5: |00000000000000000000000000000000| delays: -
  m0, b6: |00000000000000000000000000000000| delays: -
  m0, b7: |00000000000000000000000000000000| delays: -
  best: m0, b03 delays: 180+-127
  m1, b0: |00000000000000000000000000000000| delays: -
  m1, b1: |00000000000000000000000000000000| delays: -
  m1, b2: |11000000000000000000000000000000| delays: 09+-09
  m1, b3: |00001111111111111111000000000000| delays: 181+-127
  m1, b4: |00000000000000000000001111111111| delays: 428+-84
  m1, b5: |00000000000000000000000000000000| delays: -
  m1, b6: |00000000000000000000000000000000| delays: -
  m1, b7: |00000000000000000000000000000000| delays: -
  best: m1, b03 delays: 182+-126
  m2, b0: |00000000000000000000000000000000| delays: -
  m2, b1: |00000000000000000000000000000000| delays: -
  m2, b2: |00000000000000000000000000000000| delays: -
  m2, b3: |01111111111111111000000000000000| delays: 132+-127
  m2, b4: |00000000000000000001111111111111| delays: 403+-109
  m2, b5: |00000000000000000000000000000000| delays: -
  m2, b6: |00000000000000000000000000000000| delays: -
  m2, b7: |00000000000000000000000000000000| delays: -
  best: m2, b03 delays: 130+-127
  m3, b0: |00000000000000000000000000000000| delays: -
  m3, b1: |00000000000000000000000000000000| delays: -
  m3, b2: |00000000000000000000000000000000| delays: -
  m3, b3: |01111111111111111000000000000000| delays: 138+-129
  m3, b4: |00000000000000000001111111111111| delays: 407+-105
  m3, b5: |00000000000000000000000000000000| delays: -
  m3, b6: |00000000000000000000000000000000| delays: -
  m3, b7: |00000000000000000000000000000000| delays: -
  best: m3, b03 delays: 139+-130
  m4, b0: |00000000000000000000000000000000| delays: -
  m4, b1: |00000000000000000000000000000000| delays: -
  m4, b2: |00000000000000000000000000000000| delays: -
  m4, b3: |11111111111110000000000000000000| delays: 101+-101
  m4, b4: |00000000000000001111111111111111| delays: 369+-126
  m4, b5: |00000000000000000000000000000000| delays: -
  m4, b6: |00000000000000000000000000000000| delays: -
  m4, b7: |00000000000000000000000000000000| delays: -
  best: m4, b04 delays: 368+-126
  m5, b0: |00000000000000000000000000000000| delays: -
  m5, b1: |00000000000000000000000000000000| delays: -
  m5, b2: |00000000000000000000000000000000| delays: -
  m5, b3: |11111111111100000000000000000000| delays: 92+-92
  m5, b4: |00000000000000011111111111111100| delays: 353+-125
  m5, b5: |00000000000000000000000000000000| delays: -
  m5, b6: |00000000000000000000000000000000| delays: -
  m5, b7: |00000000000000000000000000000000| delays: -
  best: m5, b04 delays: 351+-125
  m6, b0: |00000000000000000000000000000000| delays: -
  m6, b1: |00000000000000000000000000000000| delays: -
  m6, b2: |00000000000000000000000000000000| delays: -
  m6, b3: |11111111110000000000000000000000| delays: 78+-78
  m6, b4: |00000000000011111111111111110000| delays: 317+-127
  m6, b5: |00000000000000000000000000000001| delays: 495+-16
  m6, b6: |00000000000000000000000000000000| delays: -
  m6, b7: |00000000000000000000000000000000| delays: -
  best: m6, b04 delays: 319+-128
  m7, b0: |00000000000000000000000000000000| delays: -
  m7, b1: |00000000000000000000000000000000| delays: -
  m7, b2: |00000000000000000000000000000000| delays: -
  m7, b3: |11111111100000000000000000000000| delays: 65+-65
  m7, b4: |00000000000111111111111111100000| delays: 290+-133
  m7, b5: |00000000000000000000000000000111| delays: 483+-28
  m7, b6: |00000000000000000000000000000000| delays: -
  m7, b7: |00000000000000000000000000000000| delays: -
  best: m7, b04 delays: 292+-131
Switching SDRAM to hardware control.
Memtest at 0x40000000 (2MiB)...
  Write: 0x40000000-0x40200000 2MiB
   Read: 0x40000000-0x40200000 2MiB
Memtest OK
Memspeed at 0x40000000 (2MiB)...
  Write speed: 38MiB/s
   Read speed: 33MiB/s

--============== Boot ==================--
Booting from serial...
Press Q or ESC to abort boot completely.
sL5DdSMmkekro
             Timeout
No boot medium found

--============= Console ================--

litex> sdram_test
Memtest at 0x40000000 (32MiB)...
  Write: 0x40000000-0x42000000 32MiB
   Read: 0x40000000-0x42000000 32MiB
Memtest OK

litex>

Some improvements can still be done on the Cmd/Clk scan but this will be addressed separately.