enjoy-digital / litex

Build your hardware, easily!
Other
2.89k stars 555 forks source link

LiteDRAM fails to initialize on specific cpu variant #1432

Closed gsomlo closed 1 year ago

gsomlo commented 1 year ago

I'm running into a strange issue where LiteDRAM fails to initialize the memory, but only when I pick a specific variant of my (rocket) CPU:

litex-boards/litex_boards/targets/digilent_nexys_video.py --build \
    --cpu-type rocket --cpu-variant fulld --sys-clk-freq 50e6 \
    --with-ethernet --with-sdcard \
    --with-sata --sata-gen 1 --with-sata-pll-refclk

I needed to run an experiment where I only have one FPU-enabled core at 128bit memory bus width (fulld variant), on a digilent_nexys_video board. When I send the bitstream to the board, I get the following:

        __   _ __      _  __
       / /  (_) /____ | |/_/
      / /__/ / __/ -_)>  <
     /____/_/\__/\__/_/|_|
   Build your hardware, easily!

 (c) Copyright 2012-2022 Enjoy-Digital
 (c) Copyright 2007-2015 M-Labs

 BIOS built on Sep 17 2022 12:01:30
 BIOS CRC passed (5c6d0b13)

 LiteX git sha1: c77129cf

--=============== SoC ==================--
CPU:            RocketRV64[imac] @ 50MHz
BUS:            WISHBONE 32-bit @ 4GiB
CSR:            32-bit data
ROM:            128KiB
SRAM:           8KiB
SDRAM:          524288KiB 16-bit @ 400MT/s (CL-7 CWL-5)

--========== Initialization ============--
Ethernet init...
Initializing SDRAM @0x80000000...
Switching SDRAM to software control.
Read leveling:
  m0, b00: |00000000000000000000000000000000| delays: -
  m0, b01: |01111111111111111111111111111110| delays: 16+-15
  m0, b02: |00000000000000000000000000000000| delays: -
  m0, b03: |00000000000000000000000000000000| delays: -
  m0, b04: |00000000000000000000000000000000| delays: -
  m0, b05: |00000000000000000000000000000000| delays: -
  m0, b06: |00000000000000000000000000000000| delays: -
  m0, b07: |00000000000000000000000000000000| delays: -
  best: m0, b01 delays: 16+-15
  m1, b00: |00000000000000000000000000000000| delays: -
  m1, b01: |01111111111111111111111111111111| delays: 16+-15
  m1, b02: |00000000000000000000000000000000| delays: -
  m1, b03: |00000000000000000000000000000000| delays: -
  m1, b04: |00000000000000000000000000000000| delays: -
  m1, b05: |00000000000000000000000000000000| delays: -
  m1, b06: |00000000000000000000000000000000| delays: -
  m1, b07: |00000000000000000000000000000000| delays: -
  best: m1, b01 delays: 16+-15
Switching SDRAM to hardware control.
Memtest at 0x80000000 (2.0MiB)...
  Write: 0x80000000-0x80200000 2.0MiB     
   Read: 0x80000000-0x80200000 2.0MiB     
  bus errors:  0/256
  addr errors: 0/8192
  data errors: 262144/524288
Memtest KO
Memory initialization failed

When I build using a 4-core variant (full4d):

litex-boards/litex_boards/targets/digilent_nexys_video.py --build \
    --cpu-type rocket --cpu-variant full4d --sys-clk-freq 50e6 \
    --with-ethernet --with-sdcard \
    --with-sata --sata-gen 1 --with-sata-pll-refclk

everything works fine:

        __   _ __      _  __
       / /  (_) /____ | |/_/
      / /__/ / __/ -_)>  <
     /____/_/\__/\__/_/|_|
   Build your hardware, easily!

 (c) Copyright 2012-2022 Enjoy-Digital
 (c) Copyright 2007-2015 M-Labs

 BIOS built on Sep 17 2022 08:10:02
 BIOS CRC passed (5f412ba0)

 LiteX git sha1: c77129cf

--=============== SoC ==================--
CPU:            RocketRV64[imac] @ 50MHz
BUS:            WISHBONE 32-bit @ 4GiB
CSR:            32-bit data
ROM:            128KiB
SRAM:           8KiB
SDRAM:          524288KiB 16-bit @ 400MT/s (CL-7 CWL-5)

--========== Initialization ============--
Ethernet init...
Initializing SDRAM @0x80000000...
Switching SDRAM to software control.
Read leveling:
  m0, b00: |00000000000000000000000000000000| delays: -
  m0, b01: |00111111111111111111111111111110| delays: 16+-14
  m0, b02: |00000000000000000000000000000000| delays: -
  m0, b03: |00000000000000000000000000000000| delays: -
  m0, b04: |00000000000000000000000000000000| delays: -
  m0, b05: |00000000000000000000000000000000| delays: -
  m0, b06: |00000000000000000000000000000000| delays: -
  m0, b07: |00000000000000000000000000000000| delays: -
  best: m0, b01 delays: 16+-14
  m1, b00: |00000000000000000000000000000000| delays: -
  m1, b01: |01111111111111111111111111111111| delays: 16+-15
  m1, b02: |00000000000000000000000000000000| delays: -
  m1, b03: |00000000000000000000000000000000| delays: -
  m1, b04: |00000000000000000000000000000000| delays: -
  m1, b05: |00000000000000000000000000000000| delays: -
  m1, b06: |00000000000000000000000000000000| delays: -
  m1, b07: |00000000000000000000000000000000| delays: -
  best: m1, b01 delays: 16+-15
Switching SDRAM to hardware control.
Memtest at 0x80000000 (2.0MiB)...
  Write: 0x80000000-0x80200000 2.0MiB
   Read: 0x80000000-0x80200000 2.0MiB
Memtest OK
Memspeed at 0x80000000 (Sequential, 2.0MiB)...
  Write speed: 37.4MiB/s
   Read speed: 54.0MiB/s

Memory also initializes fine if I use e.g., linuxd (although that one, not having an FPU, is not useful for the test I'm trying to run).

All these variants (full4d, fulld, and linux*d) have the exact same pinout (128bit memory bus, 64bit mmio bus), and only differ on the inside (number of cores, presence or absence of a hardware FPU). Therefore the LiteDRAM block should be the same.

Obviously, the variants differ in size, but I'm reliably getting a working LiteDRAM on any other variant except fulld, and a reliably failing one on fulld.

So I'm wondering if there's a plausible explanation that so far escapes me...

Thanks in advance for any clue!

gsomlo commented 1 year ago

Interestingly, I have a similar problem using fulld on the lambdaconcept_ecpix5 board (built with the yosys/trellis/nextpnr toolchain):

  litex-boards/litex_boards/targets/lambdaconcept_ecpix5.py --build \
    --cpu-type rocket --cpu-variant fulld --sys-clk-freq 50e6 \
    --with-ethernet --with-sdcard \
    --yosys-flow3 --nextpnr-timingstrict

After ensuring that timing is satisfied, the result is:

        __   _ __      _  __
       / /  (_) /____ | |/_/
      / /__/ / __/ -_)>  <
     /____/_/\__/\__/_/|_|
   Build your hardware, easily!

 (c) Copyright 2012-2022 Enjoy-Digital
 (c) Copyright 2007-2015 M-Labs

 BIOS built on Sep 17 2022 16:44:59
 BIOS CRC passed (3e48a3c6)

 LiteX git sha1: c77129cf

--=============== SoC ==================--
CPU:            RocketRV64[imac] @ 50MHz
BUS:            WISHBONE 32-bit @ 4GiB
CSR:            32-bit data
ROM:            128KiB
SRAM:           8KiB
SDRAM:          524288KiB 16-bit @ 200MT/s (CL-6 CWL-5)

--========== Initialization ============--
Ethernet init...
Initializing SDRAM @0x80000000...
Switching SDRAM to software control.
Read leveling:
  m0, b00: |11100000| delays: 01+-01
  m0, b01: |00000000| delays: -
  m0, b02: |00000000| delays: -
  m0, b03: |00000000| delays: -
  best: m0, b00 delays: 01+-01
  m1, b00: |11100000| delays: 01+-01
  m1, b01: |00000000| delays: -
  m1, b02: |00000000| delays: -
  m1, b03: |00000000| delays: -
  best: m1, b00 delays: 01+-01
Switching SDRAM to hardware control.
Memtest at 0x80000000 (2.0MiB)...
  Write: 0x80000000-0x80200000 2.0MiB     
   Read: 0x80000000-0x80200000 2.0MiB     
  bus errors:  0/256
  addr errors: 0/8192
  data errors: 262144/524288
Memtest KO
Memory initialization failed

Also, works fine if I build using linuxd or linux2d (can't try full4d, since thre's not enough room on the ecp5-85k chip for that).

So it's only linuxd that's common across all of these: the boards and toolchains are different (except for the 128-bit width of the LiteDRAM port).

Also, different port widths (e.g., full on the 64-bit wide nexys4ddr) are unaffected.

I'd be very surprised if this was actually a bug in the rocket chip: the only difference between instantiating different variants is:

        class LitexFullDConfig extends Config(
          new WithNBigCores(1) ++
          new WithMemoryDataBits(128) ++
          new BaseLitexConfig
        )

for the fulld variant that fails, as opposed to, e.g.:

        class LitexLinuxDConfig extends Config(
          new WithNMedCores(1) ++
          new WithMemoryDataBits(128) ++
          new BaseLitexConfig
        )
...
        class LitexFull4DConfig extends Config(
          new WithNBigCores(4) ++
          new WithMemoryDataBits(128) ++
          new BaseLitexConfig
        )

for linuxd or full4d, respectively. I'm strongly hoping there's something we can tweak in either LiteDRAM's gateware or in the read/write leveling "training" code before I'd have to dive into debugging Chisel source code :)

@enjoy-digital, what do you think?

gsomlo commented 1 year ago

on the nexys_video I also tried building with

Dolu1990 commented 1 year ago

may it be the wishbone down sizer changes done recently ?

enjoy-digital commented 1 year ago

@Dolu1990: Thanks, that's also what I think, I'm going to have a look...

enjoy-digital commented 1 year ago

@gsomlo: Just reproduced with: litex_sim --cpu-type=rocket --cpu-variant=full --with-sdram --opt-level=O0 - > memtest succeeds. litex_sim --cpu-type=rocket --cpu-variant=fulld --with-sdram --opt-level=O0-> memtest fails.

enjoy-digital commented 1 year ago

@gsomlo: When reverting LiteX to https://github.com/enjoy-digital/litex/commit/e451a87617e54d63ba51694e25af4065a01b5790, I also see the issue, so no longer think it's related to the recent DownConverter changes. Do you remember if this has been working?

roryt12 commented 1 year ago

If the issue is only with fulld and fullq , then take a look in litex/litex/soc/cores/cpu/rocket/core.py, lines 54 and 56, should be "freechips.rocketchip.system.LitexFullDConfig" and "freechips.rocketchip.system.LitexFullQConfig" (missing D and Q). At least this way I can boot a fulld now on qmtech_wukong

gsomlo commented 1 year ago

@roryt12 -- thanks so much for spotting this, I suspected something of this nature might have happened, but staring at the CPU_VARIANTS array I couldn't see it :) This should now be fixed with commit 162a0a4c.

@enjoy-digital: I have not tended to test the width conversion between a rocket variant and LiteDRAM -- the whole point of having so many (e.g., d(ouble), q(uad)) variants is to have a straight-through 1:1 linkage between a specific board's LiteDRAM width and the CPU. fulld and full4d (and linux*d) were supposed to have the same width, and the bug spotted by @roryt12 was why I was seeing the original weirdness I was complaining about.

At this point, width conversion might still exhibit buggy behavior, but that's an orthogonal problem to my own typo :)

We can keep this issue open to address the remaining width conversion problems, or we can close it and open a new, dedicated one -- fine by me either way.

enjoy-digital commented 1 year ago

@roryt12: Good catch! @gsomlo: We have tested width conversion with on NaxRiscv and it was working correctly, but this was a recent change I was thinking of. We can probably close this issue and open another if there are width conversion issues with Rocket.

gsomlo commented 1 year ago

OK, I shall endeavour to run some cross-width tests (where width conversion is added on purpose, e.g. 256-bit linuxq/fullq or 64-bit linux/full on the 128-bit LiteDRAM port width on the nexys_video board. I'll see if there are any remaining problems now that cpu mem-port width is actually correct. It's not really something I expect to get a lot of usage in practice, but would be good to root out any bugs in the source. If it turns out there's still anything wrong with that, I'll open a new issue. Thanks!