enjoy-digital / litedram

Small footprint and configurable DRAM core
Other
365 stars 115 forks source link

frontend/bist: replicate LFSR output to fill the DRAM port #321

Closed michalsieron closed 1 year ago

michalsieron commented 1 year ago

Rework LFSR module so it is an actual LFSR and can work with larger states. The old LFSR module, when initialized with n_state of 1024, produced an equation so large, that Vivado complained about too many tokens on the same line. Also, by using LFSR with n_state of 1024, we can fill an entire transfer to the SDRAM.

I took LFSR coefficients from here: https://datacipy.cz/lfsr_table.pdf

Signed-off-by: Michal Sieron msieron@antmicro.com

michalsieron commented 1 year ago

I rebased it onto the master so the CI can run again with 69b401dea1730b29a3a567c43d430021931e3fc9 fix

michalsieron commented 1 year ago

https://github.com/enjoy-digital/litedram/actions/runs/3883939233/jobs/6625838369#step:9:3882

/usr/lib/gcc-cross/riscv64-linux-gnu/11/../../../../riscv64-linux-gnu/bin/ld: bios.elf section `.data' will not fit in region `rom'
/usr/lib/gcc-cross/riscv64-linux-gnu/11/../../../../riscv64-linux-gnu/bin/ld: region `rom' overflowed by 100 bytes
collect2: error: ld returned 1 exit status
make: *** [/home/runner/work/litedram/litex/litex/soc/software/bios/Makefile:72: bios.elf] Error 1

@enjoy-digital It looks like the CI fails, because ROM size is too small (32 KiB). Can we increase it, like you did here? Enabling LTO should also help.

enjoy-digital commented 1 year ago

@michalsieron: Sure, you can increase it to a larger value.

enjoy-digital commented 1 year ago

@michalsieron: We could eventually improve/increase the LFSR size for boards with high controller's data_width but this should be configurable and default value kept low for small devices. So maybe we could define some power of 2 values: 32/64/128/256/512/1024, allow selection and defaults to 32. It's also not useful to allow configuration > data_width or address_width.

enjoy-digital commented 1 year ago

I just had a closer look at the code and there is an issue with the implementation: An proper LFSR should shift n_out bits per cycle while the new implementation is only 1 bit per cycle. The previous implementation is correct and used in various use-cases (ex to generate PRBS in LiteX) and has been tested against other different PRBS generators/checkers (ex against XIlinx PRBS checker/generator SerDes primitives or with various JESD204B DAC/ADC).

You'll have trouble getting timing closure with high data widths, so for a LFSR > 32, I would recommend looking at some existing implementations for 40Gb/100Gb ethernet and see how to have multiple generator/checker in parallel, for example:

http://fpgasrus.com/prbs.html "Well that's great, but what if we want to go even faster like in a PRBS for a 40G link with an SFI5 interface, and still be able to make timing. It all comes down to one sentance at the heart of the explanation. A sub-sampled PRBS has the same pattern, but with a phase shift. So if a PRBS stream was going by and you only grabbed every other bit and checked the pattern, you would find you have the same pattern, but you would be at a different point in the pattern. If you take every 3rd bit, or every 57th bit, still the same thing. So what that means is we can run several smaller PRBS generators in parallel, and along as we get the alignment right, they will mux to a higher bit rate PRBS of the same pattern."

michalsieron commented 1 year ago

I reverted my previous changes to the LFSR. Now it just replicates the output of the 31-bit PRBS to fill the DRAM port.

Additionally, I modified LFSR's output to be comb instead of sync. This one is so the output right after reset is not 0. Otherwise, first transfer after every generator reset consists of only 0s. This is especially bad for sdram_hw_test with burst_length 1, as it just fill memory with zeros.

enjoy-digital commented 1 year ago

Thanks, this should be better/enough for now yes. This is merged.