Open jaccharrison opened 2 years ago
Hi @jaccharrison,
LiteDRAM will indeed not allow you to achieve maximum DDR4 speed but can do up to 1400MT/s with the right FPGA speedgrade (The current limitation is related to the max BUFG freq). The IOs are used in component mode and going higher in frequency would probably require switching to native mode which could require quite a bit of work and hasn't been planned yet.
I'm not directly involved in AntMicro's Rowhammer tester, but I think it would be interesting to get prebuilt bitstream from Antmicro first and try them on your system. This would rules out potential LiteX change issue/Vivado version issue. @kgugala: Do you think it would be possibly to share a validated ZCU104 bitstream?
It seems you also have memtest issues. I would be intersted to have the LIteX DDR4 calibration log to see how it behaves.
Hi @enjoy-digital,
Thanks for the reply!
Could you explain how the design is able to operate at up to 1400 MT/s? I may be missing something here, but it seems to me that if the maximum clock period on the IOSERDESE3 components is 1.6 ns (at least for the UltraScale Plus), the maximum clock frequency should be 1/1.6e-9 = 625 MHz. Wouldn't 1400 MT/s operation require a 700 MHz clock?
A validated bitstream is certainly something we could try, though the bitstreams are coupled to the timing parameters of a particular DRAM module and it would be difficult to produce a validated bitstream unless I have the same DIMM as @kgugala. If this is something we'd like to try, I could supply my module geometry/timings class, which I believe is correct as values from SPD and the data sheet agree.
Yes, I do have memtest issues. They're intermittent, though, which is one of the reasons that I was wondering whether I'm violating timing parameters. When you say DDR4 calibration log, do you just mean the output that is printed when I run calibration in the BIOS? Or is there a different log that you're referring to?
Thanks again! I'll look into native-mode primitives.
Jacob
When I was testing this on a 2018.2 version of Vivado and on Virtex Ultrascale(+), the main limitation reported by the tools was that max BUFG frequency was reached. When looking at current datasheet, in indeed seems that ISERDESE3 are software limited to 625MHz clk, so would mean a 1250MT/s operation. I would need to do a test on a recent version of Vivado to compare.
For the log, the calibration that is done by the BIOS would be useful yes.
Sorry for the delay. Below are two representative terminal transcripts on one of my DDR4 DIMMs.
What you can see from output 1 is the following:
In output 2:
Usually I see only one error, if I see any. Rarely, I'll see two errors. Sometimes, I run the tests many times in a row without seeing any errors. So my error is intermittent.
I'd add that while I initially thought I was getting similar behavior on several different DIMMs, as I've gone back and documented my results more rigorously, I've only been able to reproduce the errors on one DIMM. I only have one copy of this particular DIMM unfortunately, and I don't have any other DIMMs with the same die as this DIMM.
I'm performing more tests and trying to isolate any cause for these failures. In particular, I'm about to write some scripts that repeatedly run memtests on my other DIMMs to try to see if I can get any memory test failures on other DIMMs. I'll update here if I find anything interesting from those other tests. In the meantime, please let me know if you see anything notable in the following BIOS outputs!
/ / (_) /____ | |/_/
/ /__/ / __/ -_)> <
/____/_/\__/\__/_/|_|
Build your hardware, easily!
(c) Copyright 2012-2021 Enjoy-Digital
(c) Copyright 2007-2015 M-Labs
BIOS built on Mar 29 2022 16:24:07
BIOS CRC passed (84183beb)
Migen git sha1: 9a0be7a
LiteX git sha1: 3012d7d6
--=============== SoC ==================--
CPU: VexRiscv_Min @ 125MHz
BUS: WISHBONE 32-bit @ 4GiB
CSR: 32-bit data
ROM: 64KiB
SRAM: 8KiB
L2: 0KiB
SDRAM: 1048576KiB 64-bit @ 1000MT/s (CL-9 CWL-9)
--========== Initialization ============--
Initializing SDRAM @0x40000000...
Switching SDRAM to software control.
Write leveling:
tCK equivalent taps: 568
Cmd/Clk scan (0-284)
|00001 |000000111 |000000000 |000000000| best: 198
Setting Cmd/Clk delay to 198 taps.
Data scan:
m0: |111111111100000000000000| delay: 00
m1: |111111111111110000000000| delay: 00
m2: |111111110000000000000000| delay: -
m3: |111111111111111110000000| delay: 00
m4: |011111111111111111100000| delay: 11
m5: |000000000011111111111111| delay: 151
m6: |000111111111111111111000| delay: 44
m7: |000000111111111111111111| delay: 96
Write latency calibration:
m0:6 m1:6 m2:6 m3:6 m4:6 m5:6 m6:6 m7:6
Read leveling:
m0, b00: |00000000000000000000000000000000| delays: -
m0, b01: |00000000000000000000000000000000| delays: -
m0, b02: |00000000000000000000000000000000| delays: -
m0, b03: |11111111111100000000000000000000| delays: 91+-91
m0, b04: |00000000000000111111111111111100| delays: 344+-125
m0, b05: |00000000000000000000000000000000| delays: 05+-08
m0, b06: |00000000000000000000000000000000| delays: -
m0, b07: |00000000000000000000000000000000| delays: -
best: m0, b04 delays: 346+-125
m1, b00: |00000000000000000000000000000000| delays: -
m1, b01: |00000000000000000000000000000000| delays: -
m1, b02: |00000000000000000000000000000000| delays: -
m1, b03: |11111111000000000000000000000000| delays: 61+-61
m1, b04: |00000000000111111111111110000000| delays: 285+-119
m1, b05: |00000000000000000000000000000111| delays: 480+-31
m1, b06: |00000000000000000000000000000000| delays: -
m1, b07: |00000000000000000000000000000000| delays: -
best: m1, b04 delays: 281+-119
m2, b00: |00000000000000000000000000000000| delays: -
m2, b01: |00000000000000000000000000000000| delays: -
m2, b02: |00000000000000000000000000000000| delays: -
m2, b03: |11111111111111100000000000000000| delays: 113+-113
m2, b04: |00000000000000000111111111111111| delays: 388+-123
m2, b05: |00000000000000000000000000000000| delays: -
m2, b06: |00000000000000000000000000000000| delays: -
m2, b07: |00000000000000000000000000000000| delays: -
best: m2, b04 delays: 386+-120
m3, b00: |00000000000000000000000000000000| delays: -
m3, b01: |00000000000000000000000000000000| delays: -
m3, b02: |00000000000000000000000000000000| delays: -
m3, b03: |11111000000000000000000000000000| delays: 36+-36
m3, b04: |00000000111111111111111000000000| delays: 239+-121
m3, b05: |00000000000000000000000001111111| delays: 454+-58
m3, b06: |00000000000000000000000000000000| delays: -
m3, b07: |00000000000000000000000000000000| delays: -
best: m3, b04 delays: 237+-121
m4, b00: |00000000000000000000000000000000| delays: -
m4, b01: |00000000000000000000000000000000| delays: -
m4, b02: |00000000000000000000000000000000| delays: -
m4, b03: |10000000000000000000000000000000| delays: 08+-08
m4, b04: |00011111111111111100000000000000| delays: 164+-121
m4, b05: |00000000000000000000011111111111| delays: 418+-94
m4, b06: |00000000000000000000000000000000| delays: -
m4, b07: |00000000000000000000000000000000| delays: -
best: m4, b04 delays: 165+-124
m5, b00: |00000000000000000000000000000000| delays: -
m5, b01: |00000000000000000000000000000000| delays: -
m5, b02: |00000000000000000000000000000000| delays: -
m5, b03: |00000000000000000000000000000000| delays: -
m5, b04: |11111111100000000000000000000000| delays: 66+-66
m5, b05: |00000000000111111111111111000000| delays: 295+-118
m5, b06: |00000000000000000000000000000111| delays: 488+-24
m5, b07: |00000000000000000000000000000000| delays: -
best: m5, b05 delays: 298+-120
m6, b00: |00000000000000000000000000000000| delays: -
m6, b01: |00000000000000000000000000000000| delays: -
m6, b02: |00000000000000000000000000000000| delays: -
m6, b03: |00000000000000000000000000000000| delays: -
m6, b04: |01111111111111111000000000000000| delays: 138+-128
m6, b05: |00000000000000000001111111111111| delays: 402+-109
m6, b06: |00000000000000000000000000000000| delays: -
m6, b07: |00000000000000000000000000000000| delays: -
best: m6, b04 delays: 138+-127
m7, b00: |00000000000000000000000000000000| delays: -
m7, b01: |00000000000000000000000000000000| delays: -
m7, b02: |00000000000000000000000000000000| delays: -
m7, b03: |00000000000000000000000000000000| delays: -
m7, b04: |11111111110000000000000000000000| delays: 77+-77
m7, b05: |00000000000001111111111111110000| delays: 319+-126
m7, b06: |00000000000000000000000000000011| delays: 496+-15
m7, b07: |00000000000000000000000000000000| delays: -
best: m7, b05 delays: 320+-122
Switching SDRAM to hardware control.
Memtest at 0x40000000 (2.0MiB)...
Write: 0x40000000-0x40200000 2.0MiB
Read: 0x40000000-0x40200000 2.0MiB
bus errors: 0/256
addr errors: 0/8192
data errors: 1/524288
Memtest KO
Memory initialization failed
--============= Console ================--
litex> sdram_test
Memtest at 0x40000000 (32.0MiB)...
Write: 0x40000000-0x42000000 32.0MiB
Read: 0x40000000-0x42000000 32.0MiB
Memtest OK
litex> sdram_test
Memtest at 0x40000000 (32.0MiB)...
Write: 0x40000000-0x42000000 32.0MiB
Read: 0x40000000-0x42000000 32.0MiB
bus errors: 0/256
addr errors: 0/8192
data errors: 1/8388608
Memtest KO
Here's a second run, captured after rebooting the board for the same module as above:
__ _ __ _ __
/ / (_) /____ | |/_/
/ /__/ / __/ -_)> <
/____/_/\__/\__/_/|_|
Build your hardware, easily!
(c) Copyright 2012-2021 Enjoy-Digital
(c) Copyright 2007-2015 M-Labs
BIOS built on Mar 29 2022 16:24:07
BIOS CRC passed (84183beb)
Migen git sha1: 9a0be7a
LiteX git sha1: 3012d7d6
--=============== SoC ==================--
CPU: VexRiscv_Min @ 125MHz
BUS: WISHBONE 32-bit @ 4GiB
CSR: 32-bit data
ROM: 64KiB
SRAM: 8KiB
L2: 0KiB
SDRAM: 1048576KiB 64-bit @ 1000MT/s (CL-9 CWL-9)
--========== Initialization ============--
Initializing SDRAM @0x40000000...
Switching SDRAM to software control.
Write leveling:
tCK equivalent taps: 568
Cmd/Clk scan (0-284)
|00001 |000000111 |000000000| best: 200
Setting Cmd/Clk delay to 200 taps.
Data scan:
m0: |111111111100000000000000| delay: 00
m1: |111111111111110000000000| delay: 00
m2: |111111110000000000000000| delay: -
m3: |111111111111111110000000| delay: 00
m4: |011111111111111111110000| delay: 12
m5: |000000000011111111111111| delay: 152
m6: |000111111111111111111000| delay: 45
m7: |000000011111111111111111| delay: 97
Write latency calibration:
m0:6 m1:6 m2:6 m3:6 m4:6 m5:6 m6:6 m7:6
Read leveling:
m0, b00: |00000000000000000000000000000000| delays: -
m0, b01: |00000000000000000000000000000000| delays: -
m0, b02: |00000000000000000000000000000000| delays: -
m0, b03: |11111111111100000000000000000000| delays: 88+-88
m0, b04: |00000000000000111111111111111000| delays: 340+-125
m0, b05: |00000000000000000000000000000000| delays: 509+-08
m0, b06: |00000000000000000000000000000000| delays: -
m0, b07: |00000000000000000000000000000000| delays: -
best: m0, b04 delays: 341+-125
m1, b00: |00000000000000000000000000000000| delays: -
m1, b01: |00000000000000000000000000000000| delays: -
m1, b02: |00000000000000000000000000000000| delays: -
m1, b03: |11111111000000000000000000000000| delays: 60+-60
m1, b04: |00000000000111111111111110000000| delays: 281+-122
m1, b05: |00000000000000000000000000001111| delays: 476+-36
m1, b06: |00000000000000000000000000000000| delays: -
m1, b07: |00000000000000000000000000000000| delays: -
best: m1, b04 delays: 280+-121
m2, b00: |00000000000000000000000000000000| delays: -
m2, b01: |00000000000000000000000000000000| delays: -
m2, b02: |00000000000000000000000000000000| delays: -
m2, b03: |11111111111111100000000000000000| delays: 113+-113
m2, b04: |00000000000000000111111111111111| delays: 385+-122
m2, b05: |00000000000000000000000000000000| delays: -
m2, b06: |00000000000000000000000000000000| delays: -
m2, b07: |00000000000000000000000000000000| delays: -
best: m2, b04 delays: 384+-122
m3, b00: |00000000000000000000000000000000| delays: -
m3, b01: |00000000000000000000000000000000| delays: -
m3, b02: |00000000000000000000000000000000| delays: -
m3, b03: |11111000000000000000000000000000| delays: 37+-37
m3, b04: |00000000111111111111111000000000| delays: 240+-122
m3, b05: |00000000000000000000000001111111| delays: 452+-60
m3, b06: |00000000000000000000000000000000| delays: -
m3, b07: |00000000000000000000000000000000| delays: -
best: m3, b04 delays: 239+-123
m4, b00: |00000000000000000000000000000000| delays: -
m4, b01: |00000000000000000000000000000000| delays: -
m4, b02: |00000000000000000000000000000000| delays: -
m4, b03: |10000000000000000000000000000000| delays: 08+-08
m4, b04: |00011111111111111100000000000000| delays: 165+-123
m4, b05: |00000000000000000000011111111111| delays: 416+-96
m4, b06: |00000000000000000000000000000000| delays: -
m4, b07: |00000000000000000000000000000000| delays: -
best: m4, b04 delays: 163+-124
m5, b00: |00000000000000000000000000000000| delays: -
m5, b01: |00000000000000000000000000000000| delays: -
m5, b02: |00000000000000000000000000000000| delays: -
m5, b03: |00000000000000000000000000000000| delays: -
m5, b04: |11111111100000000000000000000000| delays: 64+-64
m5, b05: |00000000000111111111111111000000| delays: 295+-120
m5, b06: |00000000000000000000000000000111| delays: 487+-24
m5, b07: |00000000000000000000000000000000| delays: -
best: m5, b05 delays: 294+-117
m6, b00: |00000000000000000000000000000000| delays: -
m6, b01: |00000000000000000000000000000000| delays: -
m6, b02: |00000000000000000000000000000000| delays: -
m6, b03: |00000000000000000000000000000000| delays: -
m6, b04: |01111111111111111000000000000000| delays: 135+-127
m6, b05: |00000000000000000001111111111111| delays: 400+-111
m6, b06: |00000000000000000000000000000000| delays: -
m6, b07: |00000000000000000000000000000000| delays: -
best: m6, b04 delays: 135+-128
m7, b00: |00000000000000000000000000000000| delays: -
m7, b01: |00000000000000000000000000000000| delays: -
m7, b02: |00000000000000000000000000000000| delays: -
m7, b03: |00000000000000000000000000000000| delays: -
m7, b04: |11111111110000000000000000000000| delays: 78+-78
m7, b05: |00000000000001111111111111110000| delays: 318+-121
m7, b06: |00000000000000000000000000000001| delays: 495+-16
m7, b07: |00000000000000000000000000000000| delays: -
best: m7, b05 delays: 320+-122
Switching SDRAM to hardware control.
Memtest at 0x40000000 (2.0MiB)...
Write: 0x40000000-0x40200000 2.0MiB
Read: 0x40000000-0x40200000 2.0MiB
Memtest OK
Memspeed at 0x40000000 (Sequential, 2.0MiB)...
Write speed: 16.0MiB/s
Read speed: 13.1MiB/s
--============== Boot ==================--
Booting from serial...
Press Q or ESC to abort boot completely.
sL5DdSMmkekro
Timeout
No boot medium found
--============= Console ================--
litex> sdram_test
Memtest at 0x40000000 (32.0MiB)...
Write: 0x40000000-0x42000000 32.0MiB
Read: 0x40000000-0x42000000 32.0MiB
Memtest OK
litex> sdram_test
Memtest at 0x40000000 (32.0MiB)...
Write: 0x40000000-0x42000000 32.0MiB
Read: 0x40000000-0x42000000 32.0MiB
Memtest OK
litex> sdram_test
Memtest at 0x40000000 (32.0MiB)...
Write: 0x40000000-0x42000000 32.0MiB
Read: 0x40000000-0x42000000 32.0MiB
Memtest OK
litex> sdram_test
Memtest at 0x40000000 (32.0MiB)...
Write: 0x40000000-0x42000000 32.0MiB
Read: 0x40000000-0x42000000 32.0MiB
Memtest OK
litex> sdram_test
Memtest at 0x40000000 (32.0MiB)...
Write: 0x40000000-0x42000000 32.0MiB
Read: 0x40000000-0x42000000 32.0MiB
bus errors: 0/256
addr errors: 0/8192
data errors: 1/8388608
Memtest KO
litex>
litex> sdram_test
Memtest at 0x40000000 (32.0MiB)...
Write: 0x40000000-0x42000000 32.0MiB
Read: 0x40000000-0x42000000 32.0MiB
bus errors: 0/256
addr errors: 0/8192
data errors: 1/8388608
Memtest KO
litex> sdram_test
Memtest at 0x40000000 (32.0MiB)...
Write: 0x40000000-0x42000000 32.0MiB
Read: 0x40000000-0x42000000 32.0MiB
Memtest OK
Thanks @jaccharrison. The calibration logs look fine. The issue could be related to electrical settings that would need to be adjusted. You can use ddr4_mr_gen tool for this. The tool will generate commands that you can copy/past in the BIOS to adjust settings and you can then do asdram_cal
+sdram_test
. Be sure to use --cl=9 --cwl=9
as it is configured for your SoC.
@jaccharrison what is the exact DIMM module you are experiencing these intermittent failures with? We have done our tests with a MTA4ATF51264HZ
module.
Thanks @enjoy-digital for the tip -- I'll try that out and let you know what I find.
@tmichalak the DIMM I am using is from ATP, model number A4F04QD8BLPSE
. The die are Samsung K4A4G085WD
. If you'd like, I can provide data sheets for either or both. Let me know if there's anything you'd like me to try. I'm happy to help run tests, and, if you have tips for troubleshooting the PHY or DIMM, I'm eager to learn.
Wanted to give a quick update -- tinkering with different termination resistances did not solve the problem. Apparently others may be experiencing this same issue. I'm continuing to troubleshoot and see if I can identify a cause and solution. I'll post here if I do.
Hi there,
I've been working with AntMicro's Rowhammer tester on a ZCU104 development board. This framework instantiates a DDR4 PHY with a clock frequency of 500 MHz. I was not seeing the outputs I expected from this framework, so I've done some troubleshooting. I have not ruled out the possibility that my unexpected behavior comes from the DDR clock being too slow. I've tried synthesizing with a higher clock frequency but it fails timing. One of the major reasons for the timing failure appears to be the use of the ISERDESE3 block in the LiteDRAM core.
According to the UltraScale Plus User Guide, ISERDESE3 blocks have a minimum clock period of 1.6ns, or 625 MHz. Thus, a DDR4 PHY that uses these blocks cannot even operate at the 666 MHz that is nominally required for the slowest DDR4 speed mode (1333 MT/s). The DDR4 ICs I'm using specify a maximum clock period of 1.8 ns for the 1333 MT/s mode, but I'd really like them to run at 1600 MT/s, where tck(max) is 1.5 ns (1.25 ns/800 MHz would be nominal).
The UltraScale Plus does have different I/O SERDES blocks that seem to be capable of running at higher frequencies. I could potentially try to help port the USP DDR4 PHY to use these higher-frequency I/O SERDES components, but before I start to seriously investigate this, I was wondering if others have feedback on this idea? Am I crazy? Am I missing something (I can't find anybody else talking about this, so I'm suspicious that I might be)? I suspect that the reason for a max clock period comes from a DLL/PLL on the DRAM -- can anybody confirm or refute this?
Any help or expertise would be greatly appreciated!