enjoy-digital / litedram

Small footprint and configurable DRAM core
Other
365 stars 115 forks source link

memtest fails on arty depending on read leveling outcome #296

Closed acomodi closed 2 years ago

acomodi commented 2 years ago

While verifying that https://github.com/enjoy-digital/litedram/pull/295 did not cause any regression on HW for the current available platforms, I wanted to test some of them, and I bumped into a memory test error for the digilent_arty platform.

To check whether the changed cause this behavior I switched to the latest upstream HEAD.

The behavior I see is that, depending on different outcomes of the read leveling step, which succeeds all the times, the memory test can fail:

Non working situation:

--=============== SoC ==================--                                                                                                                                                                                                                                                
CPU:            VexRiscv @ 50MHz                                                                                                                                                                                                                                                          
BUS:            WISHBONE 32-bit @ 4GiB                                                                                                                                                                                                                                                    
CSR:            32-bit data                                                                                                                                                                                                                                                               
ROM:            128KiB                                                                                                                                                                                                                                                                    
SRAM:           8KiB                                                                                                                                                                                                                                                                      
L2:             8KiB                                                                                                                                                                                                                                                                      
SDRAM:          262144KiB 16-bit @ 400MT/s (CL-7 CWL-5)                                                                                                                                                                                                                                   

--========== Initialization ============--                                                                                                                                                                                                                                                
Initializing SDRAM @0x40000000...                                                                                                                                                                                                                                                         
Switching SDRAM to software control.                                                                                                                                                                                                                                                      
Write latency calibration:                             
m0:0 m1:4                                              
Read leveling:                                         
  m0, b00: |00000000000000000000000000000000| delays: -
  m0, b01: |11111111111111111111111111110000| delays: 14+-14
  m0, b02: |00000000000000000000000000000011| delays: 00+-02
  m0, b03: |00000000000000000000000000000000| delays: -
  m0, b04: |00000000000000000000000000000000| delays: -
  m0, b05: |00000000000000000000000000000000| delays: -
  m0, b06: |00000000000000000000000000000000| delays: -
  m0, b07: |00000000000000000000000000000000| delays: -
  best: m0, b01 delays: 14+-14            
  m1, b00: |00000000000000000000000000000000| delays: -
  m1, b01: |00000000000000000000000000000000| delays: -
  m1, b02: |00000000000000000000000000000000| delays: -
  m1, b03: |00000000000000000000000000000000| delays: -
  m1, b04: |00000000000000000000000000000000| delays: -
  m1, b05: |11111111111111111111111111110000| delays: 14+-14          
  m1, b06: |00000000000000000000000000000001| delays: 01+-02
  m1, b07: |00000000000000000000000000000000| delays: -
  best: m1, b05 delays: 14+-14
Switching SDRAM to hardware control.
Memtest at 0x40000000 (2.0MiB)...
  Write: 0x40000000-0x40200000 2.0MiB     
   Read: 0x40000000-0x40200000 2.0MiB     
  bus errors:  128/256                
  addr errors: 3968/8192                                              
  data errors: 262142/524288          
Memtest KO                     
Memory initialization failed  

Working situation:

--=============== SoC ==================--
CPU:            VexRiscv @ 50MHz
BUS:            WISHBONE 32-bit @ 4GiB
CSR:            32-bit data
ROM:            128KiB
SRAM:           8KiB
L2:             8KiB
SDRAM:          262144KiB 16-bit @ 400MT/s (CL-7 CWL-5)

--========== Initialization ============--
Initializing SDRAM @0x40000000...
Switching SDRAM to software control.
Write latency calibration:
m0:0 m1:0 
Read leveling:
  m0, b00: |00000000000000000000000000000000| delays: -
  m0, b01: |11111111111111111111111111110000| delays: 14+-14
  m0, b02: |00000000000000000000000000000011| delays: 00+-02
  m0, b03: |00000000000000000000000000000000| delays: -
  m0, b04: |00000000000000000000000000000000| delays: -
  m0, b05: |00000000000000000000000000000000| delays: -
  m0, b06: |00000000000000000000000000000000| delays: -
  m0, b07: |00000000000000000000000000000000| delays: -
  best: m0, b01 delays: 14+-14
  m1, b00: |00000000000000000000000000000000| delays: -
  m1, b01: |11111111111111111111111111100000| delays: 13+-13
  m1, b02: |00000000000000000000000000000011| delays: 31+-02
  m1, b03: |00000000000000000000000000000000| delays: -
  m1, b04: |00000000000000000000000000000000| delays: -
  m1, b05: |00000000000000000000000000000000| delays: -
  m1, b06: |00000000000000000000000000000000| delays: -
  m1, b07: |00000000000000000000000000000000| delays: -
  best: m1, b01 delays: 13+-13
Switching SDRAM to hardware control.
Memtest at 0x40000000 (2.0MiB)...
  Write: 0x40000000-0x40200000 2.0MiB     
   Read: 0x40000000-0x40200000 2.0MiB     
Memtest OK
Memspeed at 0x40000000 (Sequential, 2.0MiB)...
  Write speed: 17.5MiB/s
   Read speed: 23.4MiB/s

--============== Boot ==================--
Booting from serial...
Press Q or ESC to abort boot completely.
sL5DdSMmkekro
             Timeout
No boot medium found

--============= Console ================--

litex> 

The above is obtained with the same exact bitstream, only by rebooting the BIOS several times.

It is possible that the read leveling step is not robust enough to find the correct delays.

Versions:

How to reproduce

The same behavior can be seen at 100 MHz as well

enjoy-digital commented 2 years ago

Thanks @acomodi, from the log, this seems related to the fact that write_latency_calibration has been enabled by default on Artix7 after looking at https://github.com/enjoy-digital/litedram/issues/293. But we should probably do the opposite and disable it by default (while still allowing to enable it on boards where it's required). I'll have a look at this.

enjoy-digital commented 2 years ago

@acomodi: I've been able to reproduce the issue and as I was suspecting no longer have it with https://github.com/enjoy-digital/litedram/commit/4c1ce026e93b6aaf07a4b2734bcf61c68e0160d8. Can you also confirm on your hardware?

acomodi commented 2 years ago

@enjoy-digital Confirmed, thanks. I cannot see anymore the faulty behavior, and the issue is fixed. Closing the issue.