enjoy-digital / litedram

Small footprint and configurable DRAM core
Other
382 stars 122 forks source link

litedram with vexriscv DDR4 SODIMM fails memtest (Xilinx VU9P + spd) #349

Open jersey99 opened 1 year ago

jersey99 commented 1 year ago

Hi All,

I am trying to bring up DDR4 on htg-940 board using litedram (I managed to get the spd dump over I2C). I feel like I have made some progress but seemed to have hit a dead-end for now. The memtest fails on about half the data (50 or 75% data errors). I would appreciate if some more experienced DDR4 personnel had a look at the memtest log or pin settings to see if they can give some feedback here.

litex> reboot                                                                                                                                                   

        __   _ __      _  __                                                                                                                                                     
       / /  (_) /____ | |/_/ 
      / /__/ / __/ -_)>  <                                                                                                                                                                                          
     /____/_/\__/\__/_/|_|                                                                                                                                                                                          
   Build your hardware, easily!  

 (c) Copyright 2012-2023 Enjoy-Digital         
 (c) Copyright 2007-2015 M-Labs
 BIOS built on Oct  5 2023 21:40:13
 BIOS CRC passed (b723a448)                                                                                                                                                                                             

 LiteX git sha1: 98eb27df                                                                                                                                                                                       

--=============== SoC ==================--
CPU:            VexRiscv @ 125MHz
BUS:            WISHBONE 32-bit @ 4GiB
CSR:            32-bit data                                                                                                                                                                                             
ROM:            128.0KiB                                                                                                                                                                                    
SRAM:           8.0KiB                                                                                                                                                                              
L2:             8.0KiB                                                                                                                                                                              
SDRAM:          8.0GiB 64-bit @ 1000MT/s (CL-9 CWL-9)
MAIN-RAM:       1.0GiB                                                                                                                                                                              

--========== Initialization ============-- 
Initializing SDRAM @0x40000000... 
Switching SDRAM to software control. 
Write leveling:                                                                                                                                                         
  tCK equivalent taps: 604                                                                                                                                                                                          
  Cmd/Clk scan (0-302)                                                                                                                                                                              
  |00011  |011111111  |111111111  |111111111| best: 188
  Setting Cmd/Clk delay to 188 taps.
  Data scan:                                                                                                                                                
  m0: |11111111111110000000000| delay: 00                     
  m1: |11111111111111100000000| delay: 00                               
  m2: |11111111111111110000000| delay: 00                               
  m3: |11111111111111110000000| delay: 00                               
  m4: |00001111111111111111111| delay: 57                                 
  m5: |00001111111111111111111| delay: 58                                 
  m6: |00000111111111111111111| delay: 66                                 
  m7: |00001111111111111111111| delay: 54
Write latency calibration:                                                                                                                                                                                          
m0:6 m1:6 m2:6 m3:6 m4:6 m5:6 m6:6 m7:6                                                                                                                                                                                  ?  ? ? ? ? ? ? ?                                           
Read leveling:                                                                                                                                                      
  m0, b00: |00000000000000000000000000000000| delays: -                                                                                                                                                                  
  m0, b01: |00000000000000000000000000000000| delays: -                                                                                                                                                                     m0, b02: |00000000000000000000000000000000| delays: -                                                                                                                                                                     m0, b03: |11111111111110000000000000000000| delays: 100+-100
  m0, b04: |00000000000000001111111111111111| delays: 371+-130
  m0, b05: |00000000000000000000000000000000| delays: -
  m0, b06: |00000000000000000000000000000000| delays: -
  m0, b07: |00000000000000000000000000000000| delays: -
  best: m0, b04 delays: 371+-130
  m1, b00: |00000000000000000000000000000000| delays: -
  m1, b01: |00000000000000000000000000000000| delays: -
  m1, b02: |00000000000000000000000000000000| delays: -
  m1, b03: |11111111111111000000000000000000| delays: 111+-111
  m1, b04: |00000000000000000111111111111111| delays: 387+-124
  m1, b05: |00000000000000000000000000000000| delays: -
  m1, b06: |00000000000000000000000000000000| delays: -
  m1, b07: |00000000000000000000000000000000| delays: -
  best: m1, b04 delays: 387+-124
  m2, b00: |00000000000000000000000000000000| delays: -
  m2, b01: |00000000000000000000000000000000| delays: -
  m2, b02: |00000000000000000000000000000000| delays: -
  m2, b03: |11111111100000000000000000000000| delays: 69+-69
  m2, b04: |00000000000011111111111111111000| delays: 319+-126
  m2, b05: |00000000000000000000000000000001| delays: 503+-08
  m2, b06: |00000000000000000000000000000000| delays: -
  m2, b07: |00000000000000000000000000000000| delays: -
  best: m2, b04 delays: 318+-127
  m3, b00: |00000000000000000000000000000000| delays: -
  m3, b01: |00000000000000000000000000000000| delays: -
  m3, b02: |00000000000000000000000000000000| delays: -
  m3, b03: |11111111000000000000000000000000| delays: 59+-59
  m3, b04: |00000000000111111111111111100000| delays: 294+-128
  m3, b05: |00000000000000000000000000000011| delays: 492+-18
  m3, b06: |00000000000000000000000000000000| delays: -
  m3, b07: |00000000000000000000000000000000| delays: -
  best: m3, b04 delays: 293+-127
  m4, b00: |00000000000000000000000000000000| delays: -
  m4, b01: |00000000000000000000000000000000| delays: -
  m4, b02: |00000000000000000000000000000000| delays: -
  m4, b03: |11111110000000000000000000000000| delays: 50+-50
  m4, b04: |00000000001111111111111111000000| delays: 274+-126
  m4, b05: |00000000000000000000000000000111| delays: 481+-30
  m4, b06: |00000000000000000000000000000000| delays: -
  m4, b07: |00000000000000000000000000000000| delays: -
  best: m4, b04 delays: 274+-126
  m5, b00: |00000000000000000000000000000000| delays: -
  m5, b01: |00000000000000000000000000000000| delays: -
  m5, b02: |00000000000000000000000000000000| delays: -
  m5, b03: |11111110000000000000000000000000| delays: 52+-52
  m5, b04: |00000000001111111111111111000000| delays: 273+-129
  m5, b05: |00000000000000000000000000001111| delays: 480+-30
  m5, b06: |00000000000000000000000000000000| delays: -
  m5, b07: |00000000000000000000000000000000| delays: -
  best: m5, b04 delays: 274+-129
  m6, b00: |00000000000000000000000000000000| delays: -
  m6, b01: |00000000000000000000000000000000| delays: -
  m6, b02: |00000000000000000000000000000000| delays: -
  m6, b03: |11000000000000000000000000000000| delays: 14+-14
  m6, b04: |00000111111111111111100000000000| delays: 202+-128
  m6, b05: |00000000000000000000000011111111| delays: 445+-65
  m6, b06: |00000000000000000000000000000000| delays: -
  m6, b07: |00000000000000000000000000000000| delays: -
  best: m6, b04 delays: 202+-127
  m7, b00: |00000000000000000000000000000000| delays: -
  m7, b01: |00000000000000000000000000000000| delays: -
  m7, b02: |00000000000000000000000000000000| delays: -
  m7, b03: |10000000000000000000000000000000| delays: 06+-06
  m7, b04: |00001111111111111111000000000000| delays: 184+-129
  m7, b05: |00000000000000000000000111111111| delays: 433+-77
  m7, b06: |00000000000000000000000000000000| delays: -
  m7, b07: |00000000000000000000000000000000| delays: -
  best: m7, b04 delays: 185+-129
Switching SDRAM to hardware control.
Memtest at 0x40000000 (2.0MiB)...
  Write: 0x40000000-0x40200000 2.0MiB     
   Read: 0x40000000-0x40200000 2.0MiB     
  bus errors:  0/256
  addr errors: 0/8192
  data errors: 262144/524288
Memtest KO
Memory initialization failed

--============= Console ================--
kevinsu20 commented 1 year ago

Hello, I happen to have this board as well. Can you tell me the commands you used and some configuration parameters? (The more detailed the better) I may be able to answer your question

jersey99 commented 1 year ago

Thanks @sususjysjy .. I have 4GB micron DDR installed. My sys_clk_freq (125MHz) I may need to try a different sys_clk_freq, or adjust the speedgrade timings accordingly, but I just use the spd.dump. How did you instantiate the core?

    ("ddram", 0,
     Subsignal("a", Pins(
         "BD40 BB35 BE40 BD34 BF40 BC39 BC34 BD39",
         "BD35 BE35 BA33 BF39 BD36 AV34"),  # AW33 AY33 AW36"),
         IOStandard("SSTL12_DCI")),
     Subsignal("ba", Pins("BA35 AY36"), IOStandard("SSTL12_DCI")),
     Subsignal("bg", Pins("BE36 BF37"), IOStandard("SSTL12_DCI")),
     Subsignal("we_n", Pins("AW33"), IOStandard("SSTL12_DCI")),   # A14
     Subsignal("cas_n", Pins("AY33"), IOStandard("SSTL12_DCI")),  # A15
     Subsignal("ras_n", Pins("AW36"), IOStandard("SSTL12_DCI")),  # A16
     Subsignal("act_n", Pins("BB38"), IOStandard("SSTL12_DCI")),
     # Subsignal("alert_n", Pins("BE38"), IOStandard("SSTL12_DCI")),
     Subsignal("cs_n", Pins("AY35 AV33"), IOStandard("SSTL12_DCI")), #  AW34 AU34
     #Subsignal("par", Pins("BF35"), IOStandard("SSTL12_DCI")),
     Subsignal("reset_n", Pins("BC38"), IOStandard("LVCMOS12")),
     Subsignal("cke", Pins("BE37 BF38"), IOStandard("SSTL12_DCI")),
     Subsignal("clk_p", Pins("BB36 BB37"), IOStandard("DIFF_SSTL12_DCI")),
     Subsignal("clk_n", Pins("BC36 BC37"), IOStandard("DIFF_SSTL12_DCI")),
     # Subsignal("cke", Pins("BE37"), IOStandard("SSTL12_DCI")),
     # Subsignal("clk_p", Pins("BB36"), IOStandard("DIFF_SSTL12")),
     # Subsignal("clk_n", Pins("BC36"), IOStandard("DIFF_SSTL12")),
     Subsignal("odt", Pins("AW35 AT34"), IOStandard("SSTL12_DCI")),
     Subsignal("dm", Pins("AH34 AJ27 AA32 AE31 BC31 AW29 BF32 AP31"), #  AT33
               IOStandard("POD12_DCI")),
     Subsignal("dq", Pins(
         "AF33 AG34 AH33 AJ33 AF34 AF32 AG32 AG31",
         "AK31 AG30 AJ29 AK28 AJ31 AJ30 AJ28 AG29",
         "Y33 W33 W30 AA34 Y32 W34 Y30 AB34",
         "AD34 AF30 AD33 AC32 AE30 AE33 AC34 AC33",
         "AY32 BA30 BB29 BB30 AY30 AY31 BA29 BB31",
         "AV31 AW31 AU30 AT29 AU32 AV32 AU31 AT30",
         "BD33 BE31 BD29 BF30 BE32 BE33 BC29 BE30",
         "AN29 AP29 AN31 AL30 AR30 AP30 AL29 AM31"),
         #"AM34 AP34 AM32 AP33 AL34 AN34 AL32 AR33"),
        IOStandard("POD12_DCI")),
     Subsignal("dqs_p", Pins("AH31 AH28 W31 AC31 BA32 AU29 BD30 AM29"), # AN32
               IOStandard("DIFF_POD12")),
     Subsignal("dqs_n", Pins("AH32 AH29 Y31 AD31 BB32 AV29 BD31 AM30"), # AN33
               IOStandard("DIFF_POD12")),
     Misc("SLEW=FAST"),
     ),
class MTA8ATF51264HZ(DDR4Module):
    # geometry
    ngroupbanks = 4
    ngroups     = 4
    nbanks      = ngroups * ngroupbanks
    nrows       = 32768
    ncols       = 1024
    # timings
    trefi = {"1x": 64e6/8192,   "2x": (64e6/8192)/2, "4x": (64e6/8192)/4}
    trfc  = {"1x": (None, 350), "2x": (None, 260),   "4x": (None, 160)}
    technology_timings = _TechnologyTimings(tREFI=trefi, tWTR=(4, 7.5), tCCD=(4, None), tRRD=(4, 4.9), tZQCS=(128, 80))
    speedgrade_timings = {
        "2133": _SpeedgradeTimings(tRP=15, tRCD=15, tWR=15, tRFC=trfc, tFAW=(20, 25), tRAS=33),
    }
    speedgrade_timings["default"] = speedgrade_timings["2133"]
        # sys_clk_freq = 125e6
        if not self.integrated_main_ram_size:
            self.ddrphy = usddrphy.USPDDRPHY(platform.request("ddram"),
                memtype          = "DDR4",
                sys_clk_freq     = sys_clk_freq,
                iodelay_clk_freq = sys_clk_freq * 4)
            if spd_dump is not None:
                ram_spd = parse_spd_hexdump(spd_dump)
                ram_module = SDRAMModule.from_spd_data(ram_spd, sys_clk_freq)
                print(f"configuring DDR4 from file: {spd_dump}")
            else:
                ram_module = MTA8ATF51264HZ(sys_clk_freq, "1:4")
            self.add_sdram("sdram",
                phy           = self.ddrphy,
                module        = ram_module,
                size          = 0x40000000,
                l2_cache_size = kwargs.get("l2_size", 8192)
            )
kevinsu20 commented 1 year ago

I have solved this problem and you need to change the module to MT40A512M16.

--=============== SoC ==================--
CPU:            VexRiscv SMP-LINUX @ 125MHz
BUS:            WISHBONE 32-bit @ 4GiB
CSR:            32-bit data
ROM:            64.0KiB
SRAM:           6.0KiB
L2:             2.0KiB
SDRAM:          4.0GiB 64-bit @ 1000MT/s (CL-9 CWL-9)
MAIN-RAM:       1.0GiB

--========== Initialization ============--
Initializing SDRAM @0x40000000...
Switching SDRAM to software control.
Write leveling:
  tCK equivalent taps: 420
  Cmd/Clk scan (0-210)
  |0111  |011111111  |100111011  |001111111| best: 74
  Setting Cmd/Clk delay to 74 taps.
  Data scan:
  m0: |11110000000000000111111111| delay: 271
  m1: |11111000000000000011111111| delay: 287
  m2: |11111100000000000001111111| delay: 300
  m3: |11111110000000000000111111| delay: 00
  m4: |11111111100000000000001111| delay: 00
  m5: |11111111110000000000000111| delay: 00
  m6: |11111111111100000000000001| delay: 00
  m7: |11111111111100000000000001| delay: 00
Write latency calibration:
m0:0 m1:0 m2:0 m3:6 m4:6 m5:6 m6:6 m7:6
Read leveling:
  m0, b00: |00000000000000000000000000000000| delays: -
  m0, b01: |00000000000000000000000000000000| delays: -
  m0, b02: |00000000000000000000000000000000| delays: -
  m0, b03: |11111110000000000000000000000000| delays: 48+-48
  m0, b04: |00000000011111111111000000000000| delays: 223+-80
  m0, b05: |00000000000000000000000111111111| delays: 435+-73
  m0, b06: |00000000000000000000000000000000| delays: -
  m0, b07: |00000000000000000000000000000000| delays: -
  best: m0, b04 delays: 222+-79
  m1, b00: |00000000000000000000000000000000| delays: -
  m1, b01: |00000000000000000000000000000000| delays: -
  m1, b02: |00000000000000000000000000000000| delays: -
  m1, b03: |11111110000000000000000000000000| delays: 51+-51
  m1, b04: |00000000001111111111000000000000| delays: 233+-82
  m1, b05: |00000000000000000000000111111111| delays: 437+-73
  m1, b06: |00000000000000000000000000000000| delays: -
  m1, b07: |00000000000000000000000000000000| delays: -
  best: m1, b04 delays: 233+-82
  m2, b00: |00000000000000000000000000000000| delays: -
  m2, b01: |00000000000000000000000000000000| delays: -
  m2, b02: |00000000000000000000000000000000| delays: -
  m2, b03: |11110000000000000000000000000000| delays: 25+-25
  m2, b04: |00000001111111111000000000000000| delays: 184+-83
  m2, b05: |00000000000000000000111111111100| delays: 389+-81
  m2, b06: |00000000000000000000000000000000| delays: -
  m2, b07: |00000000000000000000000000000000| delays: -
  best: m2, b04 delays: 185+-84
  m3, b00: |00000000000000000000000000000000| delays: -
  m3, b01: |00000000000000000000000000000000| delays: -
  m3, b02: |00000000000000000000000000000000| delays: -
  m3, b03: |11111100000000000000000000000000| delays: 40+-40
  m3, b04: |00000000111111111110000000000000| delays: 210+-83
  m3, b05: |00000000000000000000011111111111| delays: 418+-84
  m3, b06: |00000000000000000000000000000000| delays: -
  m3, b07: |00000000000000000000000000000000| delays: -
  best: m3, b04 delays: 211+-83
  m4, b00: |00000000000000000000000000000000| delays: -
  m4, b01: |00000000000000000000000000000000| delays: -
  m4, b02: |00000000000000000000000000000000| delays: -
  m4, b03: |11000000000000000000000000000000| delays: 11+-11
  m4, b04: |00000111111111100000000000000000| delays: 153+-86
  m4, b05: |00000000000000000011111111110000| delays: 360+-82
  m4, b06: |00000000000000000000000000000001| delays: 501+-09
  m4, b07: |00000000000000000000000000000000| delays: -
  best: m4, b04 delays: 152+-84
  m5, b00: |00000000000000000000000000000000| delays: -
  m5, b01: |00000000000000000000000000000000| delays: -
  m5, b02: |00000000000000000000000000000000| delays: -
  m5, b03: |10000000000000000000000000000000| delays: 08+-08
  m5, b04: |00000111111111100000000000000000| delays: 152+-80
  m5, b05: |00000000000000000111111111110000| delays: 355+-81
  m5, b06: |00000000000000000000000000000001| delays: 503+-08
  m5, b07: |00000000000000000000000000000000| delays: -
  best: m5, b05 delays: 354+-83
  m6, b00: |00000000000000000000000000000000| delays: -
  m6, b01: |00000000000000000000000000000000| delays: -
  m6, b02: |00000000000000000000000000000000| delays: -
  m6, b03: |00000000000000000000000000000000| delays: -
  m6, b04: |00011111111111000000000000000000| delays: 132+-85
  m6, b05: |00000000000000000111111111100000| delays: 339+-82
  m6, b06: |00000000000000000000000000000011| delays: 491+-20
  m6, b07: |00000000000000000000000000000000| delays: -
  best: m6, b04 delays: 132+-86
  m7, b00: |00000000000000000000000000000000| delays: -
  m7, b01: |00000000000000000000000000000000| delays: -
  m7, b02: |00000000000000000000000000000000| delays: -
  m7, b03: |00000000000000000000000000000000| delays: -
  m7, b04: |00111111111110000000000000000000| delays: 118+-89
  m7, b05: |00000000000000001111111111000000| delays: 334+-81
  m7, b06: |00000000000000000000000000000111| delays: 486+-25
  m7, b07: |00000000000000000000000000000000| delays: -
  best: m7, b04 delays: 118+-88
Switching SDRAM to hardware control.
Memtest at 0x40000000 (2.0MiB)...
  Write: 0x40000000-0x40200000 2.0MiB
   Read: 0x40000000-0x40200000 2.0MiB
Memtest OK
Memspeed at 0x40000000 (Sequential, 2.0MiB)...
  Write speed: 93.7MiB/s
   Read speed: 78.1MiB/s
jersey99 commented 1 year ago

Hi @sususjysjy, Thanks a lot for your help, for some reason, I still get the same error with the MT40A512M16 instantiated. Basically half the memory fails, at this point. I am thinking it has something to do with A17 / BA/ BG configuration. When I look at the datasheet for MT40A512M16, A17 seems to be used for some settings (and it clearly isn't connected on our board). Do you mind sharing your pin settings when you get a chance? Thanks in advance!

jersey99 commented 1 year ago

@sususjysjy I managed to get it to work with MT40A512M16, messing around with CS_N settings. I still don't quite understand why this works, MT40A512M16 is an 8GB part, and the Memory I have on board is 4GB. Some settings magic I need to figure out. Another thing that stumps me is how this doesn't work with SPD dump. I would love to discuss this further with any one who has something to say here. Meanwhile I will keep poking.