enjoy-digital / litedram

Small footprint and configurable DRAM core
Other
382 stars 122 forks source link

LiteDRAM USPDDRPHY unable to meet DDR4 timing requirements? #303

Open jaccharrison opened 2 years ago

jaccharrison commented 2 years ago

Hi there,

I've been working with AntMicro's Rowhammer tester on a ZCU104 development board. This framework instantiates a DDR4 PHY with a clock frequency of 500 MHz. I was not seeing the outputs I expected from this framework, so I've done some troubleshooting. I have not ruled out the possibility that my unexpected behavior comes from the DDR clock being too slow. I've tried synthesizing with a higher clock frequency but it fails timing. One of the major reasons for the timing failure appears to be the use of the ISERDESE3 block in the LiteDRAM core.

According to the UltraScale Plus User Guide, ISERDESE3 blocks have a minimum clock period of 1.6ns, or 625 MHz. Thus, a DDR4 PHY that uses these blocks cannot even operate at the 666 MHz that is nominally required for the slowest DDR4 speed mode (1333 MT/s). The DDR4 ICs I'm using specify a maximum clock period of 1.8 ns for the 1333 MT/s mode, but I'd really like them to run at 1600 MT/s, where tck(max) is 1.5 ns (1.25 ns/800 MHz would be nominal).

The UltraScale Plus does have different I/O SERDES blocks that seem to be capable of running at higher frequencies. I could potentially try to help port the USP DDR4 PHY to use these higher-frequency I/O SERDES components, but before I start to seriously investigate this, I was wondering if others have feedback on this idea? Am I crazy? Am I missing something (I can't find anybody else talking about this, so I'm suspicious that I might be)? I suspect that the reason for a max clock period comes from a DLL/PLL on the DRAM -- can anybody confirm or refute this?

Any help or expertise would be greatly appreciated!

enjoy-digital commented 2 years ago

Hi @jaccharrison,

LiteDRAM will indeed not allow you to achieve maximum DDR4 speed but can do up to 1400MT/s with the right FPGA speedgrade (The current limitation is related to the max BUFG freq). The IOs are used in component mode and going higher in frequency would probably require switching to native mode which could require quite a bit of work and hasn't been planned yet.

I'm not directly involved in AntMicro's Rowhammer tester, but I think it would be interesting to get prebuilt bitstream from Antmicro first and try them on your system. This would rules out potential LiteX change issue/Vivado version issue. @kgugala: Do you think it would be possibly to share a validated ZCU104 bitstream?

It seems you also have memtest issues. I would be intersted to have the LIteX DDR4 calibration log to see how it behaves.

jaccharrison commented 2 years ago

Hi @enjoy-digital,

Thanks for the reply!

Could you explain how the design is able to operate at up to 1400 MT/s? I may be missing something here, but it seems to me that if the maximum clock period on the IOSERDESE3 components is 1.6 ns (at least for the UltraScale Plus), the maximum clock frequency should be 1/1.6e-9 = 625 MHz. Wouldn't 1400 MT/s operation require a 700 MHz clock?

A validated bitstream is certainly something we could try, though the bitstreams are coupled to the timing parameters of a particular DRAM module and it would be difficult to produce a validated bitstream unless I have the same DIMM as @kgugala. If this is something we'd like to try, I could supply my module geometry/timings class, which I believe is correct as values from SPD and the data sheet agree.

Yes, I do have memtest issues. They're intermittent, though, which is one of the reasons that I was wondering whether I'm violating timing parameters. When you say DDR4 calibration log, do you just mean the output that is printed when I run calibration in the BIOS? Or is there a different log that you're referring to?

Thanks again! I'll look into native-mode primitives.

Jacob

enjoy-digital commented 2 years ago

When I was testing this on a 2018.2 version of Vivado and on Virtex Ultrascale(+), the main limitation reported by the tools was that max BUFG frequency was reached. When looking at current datasheet, in indeed seems that ISERDESE3 are software limited to 625MHz clk, so would mean a 1250MT/s operation. I would need to do a test on a recent version of Vivado to compare.

For the log, the calibration that is done by the BIOS would be useful yes.

jaccharrison commented 2 years ago

Sorry for the delay. Below are two representative terminal transcripts on one of my DDR4 DIMMs.

What you can see from output 1 is the following:

  1. I load up the bios console and allow it to complete calibration and training. The test KO's.
  2. I immediately run two other sdram_tests. Both check the same memory range. The memory range includes with the range checked after the calibration routine. The first test passes -- no errors are detected. I run the same test again and find a single error.

In output 2:

  1. The post-calibration test passes
  2. I run several other tests in a row. All pass, all seems good. Then, I obtain two failing tests in a row. No part of my setup was changed -- I ran the tests one after another.

Usually I see only one error, if I see any. Rarely, I'll see two errors. Sometimes, I run the tests many times in a row without seeing any errors. So my error is intermittent.

I'd add that while I initially thought I was getting similar behavior on several different DIMMs, as I've gone back and documented my results more rigorously, I've only been able to reproduce the errors on one DIMM. I only have one copy of this particular DIMM unfortunately, and I don't have any other DIMMs with the same die as this DIMM.

I'm performing more tests and trying to isolate any cause for these failures. In particular, I'm about to write some scripts that repeatedly run memtests on my other DIMMs to try to see if I can get any memory test failures on other DIMMs. I'll update here if I find anything interesting from those other tests. In the meantime, please let me know if you see anything notable in the following BIOS outputs!

       / /  (_) /____ | |/_/
      / /__/ / __/ -_)>  <
     /____/_/\__/\__/_/|_|
   Build your hardware, easily!

 (c) Copyright 2012-2021 Enjoy-Digital
 (c) Copyright 2007-2015 M-Labs

 BIOS built on Mar 29 2022 16:24:07
 BIOS CRC passed (84183beb)

 Migen git sha1: 9a0be7a
 LiteX git sha1: 3012d7d6

--=============== SoC ==================--
CPU:        VexRiscv_Min @ 125MHz
BUS:        WISHBONE 32-bit @ 4GiB
CSR:        32-bit data
ROM:        64KiB
SRAM:       8KiB
L2:     0KiB
SDRAM:      1048576KiB 64-bit @ 1000MT/s (CL-9 CWL-9)

--========== Initialization ============--
Initializing SDRAM @0x40000000...
Switching SDRAM to software control.
Write leveling:
  tCK equivalent taps: 568
  Cmd/Clk scan (0-284)
  |00001  |000000111  |000000000  |000000000| best: 198
  Setting Cmd/Clk delay to 198 taps.
  Data scan:
  m0: |111111111100000000000000| delay: 00
  m1: |111111111111110000000000| delay: 00
  m2: |111111110000000000000000| delay: -
  m3: |111111111111111110000000| delay: 00
  m4: |011111111111111111100000| delay: 11
  m5: |000000000011111111111111| delay: 151
  m6: |000111111111111111111000| delay: 44
  m7: |000000111111111111111111| delay: 96
Write latency calibration:
m0:6 m1:6 m2:6 m3:6 m4:6 m5:6 m6:6 m7:6 
Read leveling:
  m0, b00: |00000000000000000000000000000000| delays: -
  m0, b01: |00000000000000000000000000000000| delays: -
  m0, b02: |00000000000000000000000000000000| delays: -
  m0, b03: |11111111111100000000000000000000| delays: 91+-91
  m0, b04: |00000000000000111111111111111100| delays: 344+-125
  m0, b05: |00000000000000000000000000000000| delays: 05+-08
  m0, b06: |00000000000000000000000000000000| delays: -
  m0, b07: |00000000000000000000000000000000| delays: -
  best: m0, b04 delays: 346+-125
  m1, b00: |00000000000000000000000000000000| delays: -
  m1, b01: |00000000000000000000000000000000| delays: -
  m1, b02: |00000000000000000000000000000000| delays: -
  m1, b03: |11111111000000000000000000000000| delays: 61+-61
  m1, b04: |00000000000111111111111110000000| delays: 285+-119
  m1, b05: |00000000000000000000000000000111| delays: 480+-31
  m1, b06: |00000000000000000000000000000000| delays: -
  m1, b07: |00000000000000000000000000000000| delays: -
  best: m1, b04 delays: 281+-119
  m2, b00: |00000000000000000000000000000000| delays: -
  m2, b01: |00000000000000000000000000000000| delays: -
  m2, b02: |00000000000000000000000000000000| delays: -
  m2, b03: |11111111111111100000000000000000| delays: 113+-113
  m2, b04: |00000000000000000111111111111111| delays: 388+-123
  m2, b05: |00000000000000000000000000000000| delays: -
  m2, b06: |00000000000000000000000000000000| delays: -
  m2, b07: |00000000000000000000000000000000| delays: -
  best: m2, b04 delays: 386+-120
  m3, b00: |00000000000000000000000000000000| delays: -
  m3, b01: |00000000000000000000000000000000| delays: -
  m3, b02: |00000000000000000000000000000000| delays: -
  m3, b03: |11111000000000000000000000000000| delays: 36+-36
  m3, b04: |00000000111111111111111000000000| delays: 239+-121
  m3, b05: |00000000000000000000000001111111| delays: 454+-58
  m3, b06: |00000000000000000000000000000000| delays: -
  m3, b07: |00000000000000000000000000000000| delays: -
  best: m3, b04 delays: 237+-121
  m4, b00: |00000000000000000000000000000000| delays: -
  m4, b01: |00000000000000000000000000000000| delays: -
  m4, b02: |00000000000000000000000000000000| delays: -
  m4, b03: |10000000000000000000000000000000| delays: 08+-08
  m4, b04: |00011111111111111100000000000000| delays: 164+-121
  m4, b05: |00000000000000000000011111111111| delays: 418+-94
  m4, b06: |00000000000000000000000000000000| delays: -
  m4, b07: |00000000000000000000000000000000| delays: -
  best: m4, b04 delays: 165+-124
  m5, b00: |00000000000000000000000000000000| delays: -
  m5, b01: |00000000000000000000000000000000| delays: -
  m5, b02: |00000000000000000000000000000000| delays: -
  m5, b03: |00000000000000000000000000000000| delays: -
  m5, b04: |11111111100000000000000000000000| delays: 66+-66
  m5, b05: |00000000000111111111111111000000| delays: 295+-118
  m5, b06: |00000000000000000000000000000111| delays: 488+-24
  m5, b07: |00000000000000000000000000000000| delays: -
  best: m5, b05 delays: 298+-120
  m6, b00: |00000000000000000000000000000000| delays: -
  m6, b01: |00000000000000000000000000000000| delays: -
  m6, b02: |00000000000000000000000000000000| delays: -
  m6, b03: |00000000000000000000000000000000| delays: -
  m6, b04: |01111111111111111000000000000000| delays: 138+-128
  m6, b05: |00000000000000000001111111111111| delays: 402+-109
  m6, b06: |00000000000000000000000000000000| delays: -
  m6, b07: |00000000000000000000000000000000| delays: -
  best: m6, b04 delays: 138+-127
  m7, b00: |00000000000000000000000000000000| delays: -
  m7, b01: |00000000000000000000000000000000| delays: -
  m7, b02: |00000000000000000000000000000000| delays: -
  m7, b03: |00000000000000000000000000000000| delays: -
  m7, b04: |11111111110000000000000000000000| delays: 77+-77
  m7, b05: |00000000000001111111111111110000| delays: 319+-126
  m7, b06: |00000000000000000000000000000011| delays: 496+-15
  m7, b07: |00000000000000000000000000000000| delays: -
  best: m7, b05 delays: 320+-122
Switching SDRAM to hardware control.
Memtest at 0x40000000 (2.0MiB)...
  Write: 0x40000000-0x40200000 2.0MiB     
   Read: 0x40000000-0x40200000 2.0MiB     
  bus errors:  0/256
  addr errors: 0/8192
  data errors: 1/524288
Memtest KO
Memory initialization failed

--============= Console ================--

litex> sdram_test
Memtest at 0x40000000 (32.0MiB)...
  Write: 0x40000000-0x42000000 32.0MiB    
   Read: 0x40000000-0x42000000 32.0MiB    
Memtest OK

litex> sdram_test
Memtest at 0x40000000 (32.0MiB)...
  Write: 0x40000000-0x42000000 32.0MiB    
   Read: 0x40000000-0x42000000 32.0MiB    
  bus errors:  0/256
  addr errors: 0/8192
  data errors: 1/8388608
Memtest KO

Here's a second run, captured after rebooting the board for the same module as above:

        __   _ __      _  __
       / /  (_) /____ | |/_/
      / /__/ / __/ -_)>  <
     /____/_/\__/\__/_/|_|
   Build your hardware, easily!

 (c) Copyright 2012-2021 Enjoy-Digital
 (c) Copyright 2007-2015 M-Labs

 BIOS built on Mar 29 2022 16:24:07
 BIOS CRC passed (84183beb)

 Migen git sha1: 9a0be7a
 LiteX git sha1: 3012d7d6

--=============== SoC ==================--
CPU:        VexRiscv_Min @ 125MHz
BUS:        WISHBONE 32-bit @ 4GiB
CSR:        32-bit data
ROM:        64KiB
SRAM:       8KiB
L2:     0KiB
SDRAM:      1048576KiB 64-bit @ 1000MT/s (CL-9 CWL-9)

--========== Initialization ============--
Initializing SDRAM @0x40000000...
Switching SDRAM to software control.
Write leveling:
  tCK equivalent taps: 568
  Cmd/Clk scan (0-284)
  |00001  |000000111  |000000000| best: 200
  Setting Cmd/Clk delay to 200 taps.
  Data scan:
  m0: |111111111100000000000000| delay: 00
  m1: |111111111111110000000000| delay: 00
  m2: |111111110000000000000000| delay: -
  m3: |111111111111111110000000| delay: 00
  m4: |011111111111111111110000| delay: 12
  m5: |000000000011111111111111| delay: 152
  m6: |000111111111111111111000| delay: 45
  m7: |000000011111111111111111| delay: 97
Write latency calibration:
m0:6 m1:6 m2:6 m3:6 m4:6 m5:6 m6:6 m7:6 
Read leveling:
  m0, b00: |00000000000000000000000000000000| delays: -
  m0, b01: |00000000000000000000000000000000| delays: -
  m0, b02: |00000000000000000000000000000000| delays: -
  m0, b03: |11111111111100000000000000000000| delays: 88+-88
  m0, b04: |00000000000000111111111111111000| delays: 340+-125
  m0, b05: |00000000000000000000000000000000| delays: 509+-08
  m0, b06: |00000000000000000000000000000000| delays: -
  m0, b07: |00000000000000000000000000000000| delays: -
  best: m0, b04 delays: 341+-125
  m1, b00: |00000000000000000000000000000000| delays: -
  m1, b01: |00000000000000000000000000000000| delays: -
  m1, b02: |00000000000000000000000000000000| delays: -
  m1, b03: |11111111000000000000000000000000| delays: 60+-60
  m1, b04: |00000000000111111111111110000000| delays: 281+-122
  m1, b05: |00000000000000000000000000001111| delays: 476+-36
  m1, b06: |00000000000000000000000000000000| delays: -
  m1, b07: |00000000000000000000000000000000| delays: -
  best: m1, b04 delays: 280+-121
  m2, b00: |00000000000000000000000000000000| delays: -
  m2, b01: |00000000000000000000000000000000| delays: -
  m2, b02: |00000000000000000000000000000000| delays: -
  m2, b03: |11111111111111100000000000000000| delays: 113+-113
  m2, b04: |00000000000000000111111111111111| delays: 385+-122
  m2, b05: |00000000000000000000000000000000| delays: -
  m2, b06: |00000000000000000000000000000000| delays: -
  m2, b07: |00000000000000000000000000000000| delays: -
  best: m2, b04 delays: 384+-122
  m3, b00: |00000000000000000000000000000000| delays: -
  m3, b01: |00000000000000000000000000000000| delays: -
  m3, b02: |00000000000000000000000000000000| delays: -
  m3, b03: |11111000000000000000000000000000| delays: 37+-37
  m3, b04: |00000000111111111111111000000000| delays: 240+-122
  m3, b05: |00000000000000000000000001111111| delays: 452+-60
  m3, b06: |00000000000000000000000000000000| delays: -
  m3, b07: |00000000000000000000000000000000| delays: -
  best: m3, b04 delays: 239+-123
  m4, b00: |00000000000000000000000000000000| delays: -
  m4, b01: |00000000000000000000000000000000| delays: -
  m4, b02: |00000000000000000000000000000000| delays: -
  m4, b03: |10000000000000000000000000000000| delays: 08+-08
  m4, b04: |00011111111111111100000000000000| delays: 165+-123
  m4, b05: |00000000000000000000011111111111| delays: 416+-96
  m4, b06: |00000000000000000000000000000000| delays: -
  m4, b07: |00000000000000000000000000000000| delays: -
  best: m4, b04 delays: 163+-124
  m5, b00: |00000000000000000000000000000000| delays: -
  m5, b01: |00000000000000000000000000000000| delays: -
  m5, b02: |00000000000000000000000000000000| delays: -
  m5, b03: |00000000000000000000000000000000| delays: -
  m5, b04: |11111111100000000000000000000000| delays: 64+-64
  m5, b05: |00000000000111111111111111000000| delays: 295+-120
  m5, b06: |00000000000000000000000000000111| delays: 487+-24
  m5, b07: |00000000000000000000000000000000| delays: -
  best: m5, b05 delays: 294+-117
  m6, b00: |00000000000000000000000000000000| delays: -
  m6, b01: |00000000000000000000000000000000| delays: -
  m6, b02: |00000000000000000000000000000000| delays: -
  m6, b03: |00000000000000000000000000000000| delays: -
  m6, b04: |01111111111111111000000000000000| delays: 135+-127
  m6, b05: |00000000000000000001111111111111| delays: 400+-111
  m6, b06: |00000000000000000000000000000000| delays: -
  m6, b07: |00000000000000000000000000000000| delays: -
  best: m6, b04 delays: 135+-128
  m7, b00: |00000000000000000000000000000000| delays: -
  m7, b01: |00000000000000000000000000000000| delays: -
  m7, b02: |00000000000000000000000000000000| delays: -
  m7, b03: |00000000000000000000000000000000| delays: -
  m7, b04: |11111111110000000000000000000000| delays: 78+-78
  m7, b05: |00000000000001111111111111110000| delays: 318+-121
  m7, b06: |00000000000000000000000000000001| delays: 495+-16
  m7, b07: |00000000000000000000000000000000| delays: -
  best: m7, b05 delays: 320+-122
Switching SDRAM to hardware control.
Memtest at 0x40000000 (2.0MiB)...
  Write: 0x40000000-0x40200000 2.0MiB     
   Read: 0x40000000-0x40200000 2.0MiB     
Memtest OK
Memspeed at 0x40000000 (Sequential, 2.0MiB)...
  Write speed: 16.0MiB/s
   Read speed: 13.1MiB/s

--============== Boot ==================--
Booting from serial...
Press Q or ESC to abort boot completely.
sL5DdSMmkekro
             Timeout
No boot medium found

--============= Console ================--

litex> sdram_test
Memtest at 0x40000000 (32.0MiB)...
  Write: 0x40000000-0x42000000 32.0MiB    
   Read: 0x40000000-0x42000000 32.0MiB    
Memtest OK

litex> sdram_test
Memtest at 0x40000000 (32.0MiB)...
  Write: 0x40000000-0x42000000 32.0MiB    
   Read: 0x40000000-0x42000000 32.0MiB    
Memtest OK

litex> sdram_test
Memtest at 0x40000000 (32.0MiB)...
  Write: 0x40000000-0x42000000 32.0MiB    
   Read: 0x40000000-0x42000000 32.0MiB    
Memtest OK

litex> sdram_test
Memtest at 0x40000000 (32.0MiB)...
  Write: 0x40000000-0x42000000 32.0MiB    
   Read: 0x40000000-0x42000000 32.0MiB    
Memtest OK

litex> sdram_test
Memtest at 0x40000000 (32.0MiB)...
  Write: 0x40000000-0x42000000 32.0MiB    
   Read: 0x40000000-0x42000000 32.0MiB    
  bus errors:  0/256
  addr errors: 0/8192
  data errors: 1/8388608
Memtest KO

litex> 
litex> sdram_test
Memtest at 0x40000000 (32.0MiB)...
  Write: 0x40000000-0x42000000 32.0MiB    
   Read: 0x40000000-0x42000000 32.0MiB    
  bus errors:  0/256
  addr errors: 0/8192
  data errors: 1/8388608
Memtest KO

litex> sdram_test
Memtest at 0x40000000 (32.0MiB)...
  Write: 0x40000000-0x42000000 32.0MiB    
   Read: 0x40000000-0x42000000 32.0MiB    
Memtest OK
enjoy-digital commented 2 years ago

Thanks @jaccharrison. The calibration logs look fine. The issue could be related to electrical settings that would need to be adjusted. You can use ddr4_mr_gen tool for this. The tool will generate commands that you can copy/past in the BIOS to adjust settings and you can then do asdram_cal +sdram_test. Be sure to use --cl=9 --cwl=9 as it is configured for your SoC.

tmichalak commented 2 years ago

@jaccharrison what is the exact DIMM module you are experiencing these intermittent failures with? We have done our tests with a MTA4ATF51264HZ module.

jaccharrison commented 2 years ago

Thanks @enjoy-digital for the tip -- I'll try that out and let you know what I find.

@tmichalak the DIMM I am using is from ATP, model number A4F04QD8BLPSE. The die are Samsung K4A4G085WD. If you'd like, I can provide data sheets for either or both. Let me know if there's anything you'd like me to try. I'm happy to help run tests, and, if you have tips for troubleshooting the PHY or DIMM, I'm eager to learn.

jaccharrison commented 2 years ago

Wanted to give a quick update -- tinkering with different termination resistances did not solve the problem. Apparently others may be experiencing this same issue. I'm continuing to troubleshoot and see if I can identify a cause and solution. I'll post here if I do.