enjoy-digital / litex

Build your hardware, easily!
Other
2.99k stars 569 forks source link

Anything I can do to help to get ddr3 working on tang primer 20k? #1649

Closed bentwire closed 10 months ago

bentwire commented 1 year ago

I'm interested in getting litex to work fully with the tang primer 20k, so I can try some riscv rust programming on it. Without the dram things are kinda limited (the dram is way overkill, but I want to put a framebuffer in there as well...) so I would like to get that working first...

I'm really new to migen and all this, and FPGAs in general, but I would really like to help, even if its just testing various configurations.

Litex is pretty amazing!

enjoy-digital commented 1 year ago

Hi @bentwire,

thanks for the help. @trabucayre has been working on this and created a simulation environment for it. He provided me a patch that already improve things (without fully fixing the issue): litedram_gowin.patch.txt

I've not been able to test yet, but you can provide feedback without and with this patch applied if you want, it should reduce number of reported errors.

The next steps are probably to continue doing test with the hardware and with the simulation environment.

bentwire commented 1 year ago

@enjoy-digital I should be able to test this weekend. I will let you know! Thanks!

bentwire commented 1 year ago

Wow its much better!

    __   _ __      _  __
   / /  (_) /____ | |/_/
  / /__/ / __/ -_)>  <
 /____/_/\__/\__/_/|_|

Build your hardware, easily!

(c) Copyright 2012-2023 Enjoy-Digital (c) Copyright 2007-2015 M-Labs

BIOS built on Mar 17 2023 23:06:13 BIOS CRC passed (41cdeebb)

LiteX git sha1: 6ee39b47

--=============== SoC ==================-- CPU: VexRiscv @ 50MHz BUS: WISHBONE 32-bit @ 4GiB CSR: 32-bit data ROM: 128.0KiB SRAM: 8.0KiB SDRAM: 256.0MiB 16-bit @ 200MT/s (CL-6 CWL-5) MAIN-RAM: 256.0MiB

--========== Initialization ============-- Initializing SDRAM @0x40000000... Switching SDRAM to software control. Read leveling: m0, b00: |00000000| delays: - m0, b01: |00000000| delays: - m0, b02: |00000000| delays: - m0, b03: |00000000| delays: - best: m0, b02 delays: - m1, b00: |00000000| delays: - m1, b01: |00000000| delays: - m1, b02: |00000000| delays: - m1, b03: |00000000| delays: - best: m1, b02 delays: - Switching SDRAM to hardware control. Memtest at 0x40000000 (2.0MiB)... Write: 0x40000000-0x40200000 2.0MiB
Read: 0x40000000-0x40200000 2.0MiB
bus errors: 0/256 addr errors: 0/8192 data errors: 131136/524288 Memtest KO Memory initialization failed

--============= Console ================--

litex> litex> litex> litex> litex> sdram_test Memtest at 0x40000000 (8.0MiB)... Write: 0x40000000-0x40800000 8.0MiB
Read: 0x40000000-0x40800000 8.0MiB
bus errors: 0/256 addr errors: 0/8192 data errors: 524672/2097152 Memtest KO

litex> sdram_test Memtest at 0x40000000 (8.0MiB)... Write: 0x40000000-0x40800000 8.0MiB
Read: 0x40000000-0x40800000 8.0MiB
bus errors: 0/256 addr errors: 0/8192 data errors: 524445/2097152 Memtest KO

litex>

I will try at 48MHz clock (I've been using 50MHz) and see if its any dufferent.

bentwire commented 1 year ago

It seems a little less stable at 48MHz maybe? 2-3 runs probably aren't really enough to tell...

    __   _ __      _  __
   / /  (_) /____ | |/_/
  / /__/ / __/ -_)>  <
 /____/_/\__/\__/_/|_|

Build your hardware, easily!

(c) Copyright 2012-2023 Enjoy-Digital (c) Copyright 2007-2015 M-Labs

BIOS built on Mar 17 2023 23:12:20 BIOS CRC passed (04cd2d2a)

LiteX git sha1: 6ee39b47

--=============== SoC ==================-- CPU: VexRiscv @ 48MHz BUS: WISHBONE 32-bit @ 4GiB CSR: 32-bit data ROM: 128.0KiB SRAM: 8.0KiB SDRAM: 256.0MiB 16-bit @ 192MT/s (CL-6 CWL-5) MAIN-RAM: 256.0MiB

--========== Initialization ============-- Initializing SDRAM @0x40000000... Switching SDRAM to software control. Read leveling: m0, b00: |00000000| delays: - m0, b01: |00000000| delays: - m0, b02: |00000000| delays: - m0, b03: |00000000| delays: - best: m0, b02 delays: - m1, b00: |00000000| delays: - m1, b01: |00000000| delays: - m1, b02: |00000000| delays: - m1, b03: |00000000| delays: - best: m1, b02 delays: - Switching SDRAM to hardware control. Memtest at 0x40000000 (2.0MiB)... Write: 0x40000000-0x40200000 2.0MiB
Read: 0x40000000-0x40200000 2.0MiB
bus errors: 0/256 addr errors: 0/8192 data errors: 131072/524288 Memtest KO Memory initialization failed

--============= Console ================--

litex> sdram_test Memtest at 0x40000000 (8.0MiB)... Write: 0x40000000-0x40800000 8.0MiB
Read: 0x40000000-0x40800000 8.0MiB
bus errors: 0/256 addr errors: 0/8192 data errors: 524289/2097152 Memtest KO

litex> sdram_test Memtest at 0x40000000 (8.0MiB)... Write: 0x40000000-0x40800000 8.0MiB
Read: 0x40000000-0x40800000 8.0MiB
bus errors: 2/256 addr errors: 0/8192 data errors: 524292/2097152 Memtest KO

litex> sdram_test Memtest at 0x40000000 (8.0MiB)... Write: 0x40000000-0x40800000 8.0MiB
Read: 0x40000000-0x40800000 8.0MiB
bus errors: 0/256 addr errors: 0/8192 data errors: 524289/2097152 Memtest KO

litex>

It does not work at all at 60MHz. (100% bus errors)

bentwire commented 1 year ago

Another update... What core I use seems to matter. It works better with the Vex than the pico RV:

--=============== SoC ==================-- CPU: PicoRV32 @ 48MHz BUS: WISHBONE 32-bit @ 4GiB CSR: 32-bit data ROM: 128.0KiB SRAM: 8.0KiB SDRAM: 256.0MiB 16-bit @ 192MT/s (CL-6 CWL-5) MAIN-RAM: 256.0MiB

--========== Initialization ============-- Initializing SDRAM @0x40000000... Switching SDRAM to software control. Read leveling: m0, b00: |00000000| delays: - m0, b01: |00000000| delays: - m0, b02: |00000000| delays: - m0, b03: |00000000| delays: - best: m0, b02 delays: - m1, b00: |00000000| delays: - m1, b01: |00000000| delays: - m1, b02: |00000000| delays: - m1, b03: |00000000| delays: - best: m1, b00 delays: - Switching SDRAM to hardware control. Memtest at 0x40000000 (2.0MiB)... Write: 0x40000000-0x40200000 2.0MiB
Read: 0x40000000-0x40200000 2.0MiB
bus errors: 64/256 addr errors: 0/8192 data errors: 524282/524288 Memtest KO Memory initialization failed

--============= Console ================--

litex> litex> litex> litex> sdram_test Memtest at 0x40000000 (8.0MiB)... Write: 0x40000000-0x40800000 8.0MiB
Read: 0x40000000-0x40800000 8.0MiB
bus errors: 64/256 addr errors: 0/8192 data errors: 2097129/2097152 Memtest KO

litex> sdram_test Memtest at 0x40000000 (8.0MiB)... Write: 0x40000000-0x40800000 8.0MiB
Read: 0x40000000-0x40800000 8.0MiB
bus errors: 64/256 addr errors: 0/8192 data errors: 2097129/2097152 Memtest KO

trabucayre commented 1 year ago

Sorry to have missed this issue and thanks! Your tests are based on the patch I have sent to @enjoy-digital ?

The results difference between picoRV and Vex is maybe related to the placement for some ressources. In ddr3-tang-primer the PLL is contrained. Unfortunately I have tried to inject this constraint but without significant effects.

With simulation (took a really long time due to PLL and DQS calibrate/lock) it seems write part looks ok (just the preamble to disable (and maybe reducing delay but I have to recheck)).

One thing where I'm not clear is about DQS for read mode: RBURST behaviour it's not very clear (again a simulation is required to see delays to apply...)

trabucayre commented 1 year ago

I have also do:

bentwire commented 1 year ago

@trabucayre Thank you for the response! I did notice that in the examples from gowin they want to use the 1:4 clock ratio and the MEM_{I,O}SER8 primitives. I wonder if the DDR stuff does not work right at 1:2 for these particular RAM chips (The micron and others)?

Just a guess, not sure why they chose that ratio.

Again thanks for all the hard work. This is an awesome project!

trabucayre commented 1 year ago

Moving (or trying) to an 1:4 ratio is an idea I have. Don't know why all examples uses this ratio (Gowin's IP generator uses 1:2 by default and ref design archive from gowin website provides both 1:2 and 1:4 solution). Current implementation is almost fully based on lattice (sic) application note (and module) because primitives same really similar but maybe this assumption is not true for all of them...

bentwire commented 1 year ago

@trabucayre

Ok cool. Let me know if there are any patches you want me to test. I've been reading the gowin IO docs but they aren't very detailed... They kind of explain how to connect the DQS block to the SERDES but don't fully explain all the settings...

trabucayre commented 1 year ago

I like a lot the "they aren't very detailed" ;-) It's an understatement This why current implementation is based on the, similar, lattice's application note.

I have started to add 1:4 but all timings have to be check before having something usable/publishable. My idea is to adapt code according to nphases: existing behaviour remains "working" but it's possible to choice between 1:2 and 1:4.

bentwire commented 1 year ago

Yeah, pretty big understatement! Thanks for the updates!

Can't wait to test your changes!

bentwire commented 1 year ago

@trabucayre Have anything for me to test yet? Just checking in to see how things are going. :)

trabucayre commented 1 year ago

Currently nothing stable to be tested. I have pushed 1:4 support:

read: TBD

e-yes commented 1 year ago

Hi, @trabucayre A bit better, but still sdram_init fails. I don't know if it would be any useful, just my observation

litex> mem_test 0x40000000 1
Memtest at 0x40000000 (1B)...
  Write: 0x40000000-0x40000000 0B   
   Read: 0x40000000-0x40000000 0B   
Memtest OK

litex> mem_test 0x40000000 2
Memtest at 0x40000000 (2B)...
  Write: 0x40000000-0x40000000 0B   
   Read: 0x40000000-0x40000000 0B   
Memtest OK

litex> mem_test 0x40000000 3
Memtest at 0x40000000 (3B)...
  Write: 0x40000000-0x40000000 0B   
   Read: 0x40000000-0x40000000 0B   
Memtest OK

litex> mem_test 0x40000000 4
Memtest at 0x40000000 (4B)...
  Write: 0x40000000-0x40000004 4B   
   Read: 0x40000000-0x40000004 4B   
  bus errors:  2/2
  addr errors: 0/1
  data errors: 1/1
Memtest KO

May be it will give an idea.

e-yes commented 1 year ago

check your DDR3 part nums, surprise!

enjoy-digital commented 1 year ago

Interesting @e-yes, do you have more info?

bentwire commented 1 year ago

It isn't the part on the schematic... Its an SK Hynix chip, H5TQ1G63EFR, and I don't think the timings are exactly the same as the Micron part.

enjoy-digital commented 1 year ago

@e-yes: Was it what you wanted to say? Or have you been able to get it working with a specific part number (if so, can you share it). The initial prototype from Sipeed and the production version are not using the same DDR chip, so current part number use in LiteX-Board is probably the one from the prototype (or with similar geometry/timings).

e-yes commented 1 year ago

@enjoy-digital absolutely. Hynix is at my side. I have tried different timings, and improved myself in DDR3 (I'm a rookie). No luck yet.

bentwire commented 1 year ago

@e-yes What timings have you tried so far? I may join you :)

nedos commented 1 year ago

It isn't the part on the schematic... Its an SK Hynix chip, H5TQ1G63EFR, and I don't think the timings are exactly the same as the Micron part.

I can confirm this is the part on my production board that I just got off of AliExpress as well. I took a look at the geometry, and it first glance I think it was fine. Also AFAIK the timings are running it at 800MHz with very conervative timings. Maybe it just needs to be refreshed more often at those timings?

Would be great to get this running. I think price/performance wise this is hands-down the best FPGA board on the market at the moment.

nedos commented 1 year ago

@trabucayre I just tried your 4phases branch and also rebasing your commit onto master and when I run python3 -m litex_boards.targets.sipeed_tang_primer_20k --sys-clk-freq 48e6 --build I get:

Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/student/litex/litex-boards/litex_boards/targets/sipeed_tang_primer_20k.py", line 257, in <module>
    main()
  File "/home/student/litex/litex-boards/litex_boards/targets/sipeed_tang_primer_20k.py", line 246, in main
    builder.build(**parser.toolchain_argdict)
  File "/home/student/litex/litex/litex/soc/integration/builder.py", line 367, in build
    vns = self.soc.build(build_dir=self.gateware_dir, **kwargs)
  File "/home/student/litex/litex/litex/soc/integration/soc.py", line 1322, in build
    return self.platform.build(self, *args, **kwargs)
  File "/home/student/litex/litex/litex/build/gowin/platform.py", line 43, in build
    return self.toolchain.build(self, *args, **kwargs)
  File "/home/student/litex/litex/litex/build/generic_toolchain.py", line 81, in build
    v_output = platform.get_verilog(self.fragment, name=build_name, **kwargs)
  File "/home/student/litex/litex/litex/build/gowin/platform.py", line 37, in get_verilog
    return GenericPlatform.get_verilog(self, *args,
  File "/home/student/litex/litex/litex/build/generic_platform.py", line 463, in get_verilog
    return verilog.convert(fragment, platform=self, **kwargs)
  File "/home/student/litex/litex/litex/gen/fhdl/verilog.py", line 630, in convert
    verilog += _print_combinatorial_logic_synth(f, ns)
  File "/home/student/litex/litex/litex/gen/fhdl/verilog.py", line 485, in _print_combinatorial_logic_synth
    r += _print_node(ns, _AT_NONBLOCKING, 1, g[1])
  File "/home/student/litex/litex/litex/gen/fhdl/verilog.py", line 300, in _print_node
    return "".join(_print_node(ns, at, level, n, target_filter) for n in node)
  File "/home/student/litex/litex/litex/gen/fhdl/verilog.py", line 300, in <genexpr>
    return "".join(_print_node(ns, at, level, n, target_filter) for n in node)
  File "/home/student/litex/litex/litex/gen/fhdl/verilog.py", line 296, in _print_node
    return _tab*level + _print_expression(ns, node.l)[0] + assignment + _print_expression(ns, node.r)[0] + ";\n"
  File "/home/student/litex/litex/litex/gen/fhdl/verilog.py", line 262, in _print_expression
    return _print_slice(ns, node)
  File "/home/student/litex/litex/litex/gen/fhdl/verilog.py", line 225, in _print_slice
    assert (node.stop - node.start) >= 1
AssertionError

I think there's a chance the other Hynix H5XXX part + your patch may work and wanted to try them together.

trabucayre commented 1 year ago

Thanks to point this issue. I have updated my branch/fork with a fix I have simply forget to push. To use 1:4 you have to modify this line to replace 1:2 by 1:4. But delays have to be updated.

nedos commented 1 year ago

I'm still getting the exact same issue with your litedram branch. It's not building:

INFO:SoC:Auto-Resizing ROM rom from 0x20000 to 0x6d24.
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/student/litex/litex-boards/litex_boards/targets/sipeed_tang_primer_20k.py", line 258, in <module>
    main()
  File "/home/student/litex/litex-boards/litex_boards/targets/sipeed_tang_primer_20k.py", line 247, in main
    builder.build(**parser.toolchain_argdict)
  File "/home/student/litex/litex/litex/soc/integration/builder.py", line 367, in build
    vns = self.soc.build(build_dir=self.gateware_dir, **kwargs)
  File "/home/student/litex/litex/litex/soc/integration/soc.py", line 1322, in build
    return self.platform.build(self, *args, **kwargs)
  File "/home/student/litex/litex/litex/build/gowin/platform.py", line 43, in build
    return self.toolchain.build(self, *args, **kwargs)
  File "/home/student/litex/litex/litex/build/generic_toolchain.py", line 81, in build
    v_output = platform.get_verilog(self.fragment, name=build_name, **kwargs)
  File "/home/student/litex/litex/litex/build/gowin/platform.py", line 37, in get_verilog
    return GenericPlatform.get_verilog(self, *args,
  File "/home/student/litex/litex/litex/build/generic_platform.py", line 463, in get_verilog
    return verilog.convert(fragment, platform=self, **kwargs)
  File "/home/student/litex/litex/litex/gen/fhdl/verilog.py", line 630, in convert
    verilog += _print_combinatorial_logic_synth(f, ns)
  File "/home/student/litex/litex/litex/gen/fhdl/verilog.py", line 485, in _print_combinatorial_logic_synth
    r += _print_node(ns, _AT_NONBLOCKING, 1, g[1])
  File "/home/student/litex/litex/litex/gen/fhdl/verilog.py", line 300, in _print_node
    return "".join(_print_node(ns, at, level, n, target_filter) for n in node)
  File "/home/student/litex/litex/litex/gen/fhdl/verilog.py", line 300, in <genexpr>
    return "".join(_print_node(ns, at, level, n, target_filter) for n in node)
  File "/home/student/litex/litex/litex/gen/fhdl/verilog.py", line 296, in _print_node
    return _tab*level + _print_expression(ns, node.l)[0] + assignment + _print_expression(ns, node.r)[0] + ";\n"
  File "/home/student/litex/litex/litex/gen/fhdl/verilog.py", line 262, in _print_expression
    return _print_slice(ns, node)
  File "/home/student/litex/litex/litex/gen/fhdl/verilog.py", line 225, in _print_slice
    assert (node.stop - node.start) >= 1
AssertionError
nedos commented 1 year ago

I tried a bunch of permuations to no avail. AFIAIK the memory is 1Gb and not 4Gb. So the Geometry will need adapting for sure. Also I'm not sure why the assertion happens. I commented it out for now.

trabucayre commented 1 year ago

It's weird: I have retried my branch and I have no more this issue. I have also pushed a commit in litex-boards to allows user to select between 1:2 or *1:4 * using --ddr_rate argument.

bentwire commented 1 year ago

It's weird: I have retried my branch and I have no more this issue. I have also pushed a commit in litex-boards to allows user to select between 1:2 or 1:4 using --ddr_rate argument.

It compiles for me. Timing is still off somehow but I got it to compile. Need to check the docs for the actual chip on the board to get the timings.

verilogzhou commented 1 year ago

The real chip is H5TQ1G63EFR, and the BIOS output SDRAM size should not be 256.0MiB, but 128.0MiB.

nedos commented 1 year ago

@verilogzhou did changing the size fix it for you?

DatanoiseTV commented 1 year ago

Received my board as well, same issue. With 1/4th it is bus errors: 192/256, addr erros: 0/8192, data errors: 393216/524288.

e-yes commented 1 year ago

What I have tried so far:

class H5TQ1G63EFR(DDR3Module):
    # geometry
    nbanks = 8
    nrows  = 8192
    ncols  = 1024
    # timings
    technology_timings = _TechnologyTimings(tREFI=64e6/8192, tWTR=(4, 7.5), tCCD=(4, None), tRRD=(4, 6), tZQCS=(64, 80))
    speedgrade_timings = {
        "1600":  _SpeedgradeTimings(tRP=13.75, tRCD=13.75, tWR=15, tRFC=(160, None), tFAW=(None, 40), tRAS=35),
    }
    speedgrade_timings["default"] = speedgrade_timings["1600"]

Don't pay too much attention to timings, this is the intermediate version from backup. I've tried 800 timings as well, but no luck:/

Along with @trabucayre changes.

DatanoiseTV commented 1 year ago

Here is the schematic. Maybe the pinout is incorrect?

Update: Why are the adress lines A14 and A15 disconnected?

nedos commented 1 year ago

I went through the pinout. It looks correct. Did anyone try the sipeed examples just to make sure it actually works in the gowin toolchain?

sftwninja commented 1 year ago

I have tried the DDR3 tests against the Gowin toolchain within the Sipeed example repo, and I can confirm it does work.

nedos commented 1 year ago

I'll maybe have time to try this out this weekend. But this pure gowin example exists: LicheeTang20K_DDR_Test. At the very least we can cross check the resulting constraint file against that and check if it's insantiating the macro correctly.

nedos commented 1 year ago

I have tried the DDR3 tests against the Gowin toolchain within the Sipeed example repo, and I can confirm it does work.

Ah perfect. Good to know.

nedos commented 1 year ago

I tried @trabucayre's litedram branch and litex-boards. It compiled. I tried various timings, but to no avail. Also I tried the snippet from @e-yes. The memory now shows up as 128MiB, as it should. As a side note I think the Gowin example uses the following timings. I tried these roughly. Anyway sadly still nothing.

nedos commented 1 year ago

I think someone mentioned constraints. I was checking it out today, here's the resulting code versus the sipeed example:

Litex:

IO_LOC "ddram_a[0]" F7;
IO_PORT "ddram_a[0]" IO_TYPE=SSTL15;

Sipeed/Gowin Toolchian:

IO_LOC "ddr_addr[0]" F7;
IO_PORT "ddr_addr[0]" IO_TYPE=SSTL15 PULL_MODE=NONE DRIVE=8 BANK_VCCIO=1.5;

I'd see if the drive strength in particular makes a difference. I may have sometime to try again this weekend.

enjoy-digital commented 1 year ago

Hi,

Thanks for the tests @nedos. Just for info, we've planned to work together on this in end August/early September with @trabucayre if this is not solved before.

trabucayre commented 1 year ago

As mentionned previously: DQSmust be correctly calibrate for read mode (RCLKSELand READ particulary) and maybe RBURST must be not used as enable but a correct delay (number of cycles) must be used. I have tried to analyze (sim) DQS + IDES_MEM to see behavior (since gowin didn't provides (correct) documentation). As soon I'm free to focus on this topic I have to redo/clean simulations and share it.

nickoe commented 1 year ago

Here is the schematic. Maybe the pinout is incorrect?

Update: Why are the adress lines A14 and A15 disconnected?

Those appear to refer to the balls T7 and M7 which are no connect according to the 96 ball diagram in the H5TQ1G63EFR datasheet. So I guess that is ok as well.

I am also looking forward to seeing the DDR3 work, although I am probably of no help at this point, but I did not find an answer to @DatanoiseTV 's question in the comments, so I decided to state it here.

EDIT: Also, note that the schematic connects the A13 (DDR3 ball T3, C8 on FPGA), which is not used for the H5TQ1G63EFR which is 64M x 16, so I think that is a possible error in the configuration. I have no idea if that would case stuff to behave as it does though.

trabucayre commented 1 year ago

We have pushed a fix to compensate extra cycle introduced by gowin primitives. Now memtest pass successfully. We have now to find if the incorrect memory size is related to model.py or due to erroneous computation.

bentwire commented 1 year ago

This seems to be working great! Thanks for all the hard work!

kevinsu20 commented 1 year ago

i'm so confused that the same issue occurred in tang_mega_138k , Is it only the ddr of Gowin that has this problem?

kevinsu20 commented 1 year ago

Can you tell me how to solve it? Because I am currently starting the new board 138k of Gowin and have also encountered a problem with mem initialization failure

enjoy-digital commented 10 months ago

We can probaby close this issue now that it's implemented. It still seems there are some variations between boards but we can discuss this in https://github.com/enjoy-digital/litex/issues/1848. For the Tang Mega 138K, the development/support is still not finished and things can be discussed in https://github.com/litex-hub/litex-boards/issues/549. Any help/support for this work is welcome.