Implement LiteDRAMNativePortUpConverter with mode="both"

jedrzejboczar commented 4 years ago

Closes https://github.com/enjoy-digital/litedram/issues/191

This PR implements the up-converter as described in https://github.com/enjoy-digital/litedram/issues/191. It combines subsequent port_from commands to lower the number of port_to commands. It allows incomplete transfers and automatically advances to next port_to command when command address or type (r/w) changes. It is also possible to have gaps, e.g. for 1:4 conversion we can write/read [0x00, 0x02, 0x03], .... As you suggested, I used port_from.cmd.last instead of port_from.flush to indicate the end of a burst. cmd.last must be 1 if the last command does not form a "complete" port_to command.

I added unit tests for the mentioned cases and they all pass, but I seem to have some problem during memtest when running in litex_sim. I will further investigate if this is caused by the up-converter or there is some incompatibility with wishbone2native, so we may yet wait with merging this PR.

If you have any suggestions I can modify the implementation. To be able to handle all the cases I had to insert FIFOs on write and read data paths not to loose data during transfers with gaps.

jedrzejboczar commented 4 years ago

I was testing the changes in litex_sim and I had to change the up-converter logic a bit.

There was a problem during writes because write commands were being queued and sent to the controller faster than the write converter could generate new data. This led to situations where the controller was sending 2 wdata.ready in subsequent cycles, but I could not find a way to match that in every situation. To fix that, now the up-converter sends write command only after next_cmd. I also needed to remove any buffering on cmd_buffer. I wonder if there is a better way to do that.

I fixed a problem with LiteDRAMWishbone2Native which was holding old data valid after it has been accepted, leading to the up-converter sending wrong data. I also forced wishbone2native to send all commands with cmd.last as it has to get the data before ack anyway. We should probably rewrite wishbone2native to make use of the up-converter capability to group commands.

enjoy-digital commented 4 years ago

@jedrzejboczar: thanks, i'll try to have a closer look tomorrow. Is the current implementation working? This would indeed be a good idea to simplify LiteDRAMWishbone2Native with the new capabilities of the up/down converters and just make it only do the bridge between a Wishbone and Native interfaces of the same data width.

jedrzejboczar commented 4 years ago

It should be working now. I added unit tests for the special cases and I was able to use the up-converter in litex_sim (by forcing crossbar to use different port data width and disabling L2 cache).

I've just tested it with L2 cache and there was a minor problem, but https://github.com/enjoy-digital/litedram/pull/192/commits/b0bde294c0a440116292a4267e6a33cf73edc7ad fixes it. We should probably remove the data width conversion from LiteDRAMWishbone2Native as you say. I'll work on this now, but the current change set should be working as it is.

enjoy-digital commented 4 years ago

Great, thanks.

jedrzejboczar commented 4 years ago

I've rewritten LiteDRAMWishbone2Native to use LiteDRAMNativePortConverter for the data width conversion.

Because wishbone won't present new address/data before it receives ack, LiteDRAMWishbone2Native sends all read commands with cmd.last=1, effectively always sending only one chunk.

I wanted to at least avoid that for writes, so when writing LiteDRAMWishbone2Native uses cmd.last=0. But because we don't know whether the command is the last one or not at the moment of sending, I made use of port.flush which is simply set high when wishbone transfer ends. If this is not ok, then we can just use cmd.last=1 for all commands, or maybe we could buffer incoming write commands to send them 1 cycle later and insert cmd.last depending on wishbone.cyc, but this complicates the converter a bit.

jedrzejboczar commented 4 years ago

I have tested the changes on the Mercury XU5 board and I'm getting ~50% errors during memtest. I need to further investigate this problem, so we should wait with merging this PR now.

enjoy-digital commented 4 years ago

@jedrzejboczar: ok thanks for the feedback. Maybe you could try to use a similar configuration with litex_sim than with the Mercury XU5 board and see if you are able to reproduce the issue?

jedrzejboczar commented 4 years ago

I tried to run litex_sim with exactly the same PHY settings as for Mercury board, but unfortunatelly there are no errors and memtest succeeds. I also ran with DFITimingsChecker, but it reports no issues.

enjoy-digital commented 4 years ago

@jedrzejboczar: strange, have you also tested with upstream LiteDRAM to be sure it's still working ? (IIRC you had issue with 125MHz clock freq on the Mercury XU5).

jedrzejboczar commented 4 years ago

Yes, I had to lower the frequency to make the leveling work. I haven't tested with all the changes from master, I can try rebasing onto master to test it, but I don't see any changes that could influence this problem.

enjoy-digital commented 4 years ago

OK, i just wanted to be sure the issue is related to the changes and not something else. I'm going to do a test on Arty.

jedrzejboczar commented 4 years ago

Thanks, I'll try to do some more test tomorrow.

enjoy-digital commented 4 years ago

It's working fine on Arty.

jedrzejboczar commented 4 years ago

Hmm, strange. Are you sure that the LiteDRAMNativePortUpConverter gets instantiated? Because by default on Arty there will be L2 cache of width 128, the same as the native port will have. You could for example force crossbar to create port of different width on line https://github.com/enjoy-digital/litex/blob/3391398a5fccd5fc6cae36231a11d34f032dc3c5/litex/soc/integration/soc.py#L1050.

enjoy-digital commented 4 years ago

Indeed, i haven't checked if the adapter was used or not. I'll do more tests tomorrow.

jedrzejboczar commented 4 years ago

@enjoy-digital I still haven't been able to reproduce this problem in the simulation. I've been able to test this code on Arty, forcing memory port to 32 bits (so with a 32:128 up-converter), and everything worked without any errors.

For some reason the case with up-converter fails on XU5. Only every 4th write is being read correctly. Reading is probably correct, but there is a problem when writing data:

litex> mr 0x40000000 0x30
Memory dump:
0x40000000  00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00  ................
0x40000010  00 00 00 00 00 00 00 00 00 00 00 00 05 00 00 00  ................
0x40000020  00 00 00 00 00 00 00 00 00 00 00 00 09 00 00 00  ................

litex> mw 0x40000000 0xaabbccdd 1

litex> mr 0x40000000 0x30
Memory dump:
0x40000000  dd cc bb aa 00 00 00 00 00 00 00 00 00 00 00 00  ................
0x40000010  00 00 00 00 00 00 00 00 00 00 00 00 05 00 00 00  ................
0x40000020  00 00 00 00 00 00 00 00 00 00 00 00 09 00 00 00  ................

litex> mw 0x40000000 0xaabbccdd 4

litex> mr 0x40000000 0x30
Memory dump:
0x40000000  00 00 00 00 00 00 00 00 00 00 00 00 dd cc bb aa  ................
0x40000010  00 00 00 00 00 00 00 00 00 00 00 00 05 00 00 00  ................
0x40000020  00 00 00 00 00 00 00 00 00 00 00 00 09 00 00 00  ................

litex> mw 0x40000000 0xaabbccdd 3

litex> mr 0x40000000 0x30
Memory dump:
0x40000000  00 00 00 00 00 00 00 00 dd cc bb aa 00 00 00 00  ................
0x40000010  00 00 00 00 00 00 00 00 00 00 00 00 05 00 00 00  ................
0x40000020  00 00 00 00 00 00 00 00 00 00 00 00 09 00 00 00  ................

litex> mw 0x40000000 0xaabbccdd 7

litex> mr 0x40000000 0x30
Memory dump:
0x40000000  00 00 00 00 00 00 00 00 00 00 00 00 dd cc bb aa  ................
0x40000010  00 00 00 00 00 00 00 00 dd cc bb aa 00 00 00 00  ................
0x40000020  00 00 00 00 00 00 00 00 00 00 00 00 09 00 00 00  ................

A single write overwrites all the 128 bits, and when doing multiple writes, the last one overwrites the previous ones. It seems like there is a problem with data masking and the whole burst gets written to the memory.

enjoy-digital commented 4 years ago

@jedrzejboczar: thanks for the update. I'll have a closer look at it soon.

jedrzejboczar commented 4 years ago

One more thing, reading through the datasheet of MT40A256M16 it is written that the data mask feature is controlled by MR5[10], so in our case it is disabled. It also seems that data mask is inverted, i.e. high DM means that given byte is being written and low DM means that given byte gets masked and ignored ("WRITE Burst Operation", p. 225). I tried enabling data mask in MR5 and tried running with both inverted and not inverted DM, but with no success.

enjoy-digital commented 3 years ago

Sorry for the delay to merge to review and merge this PR. I've been able to do some tests on hardware on various boards so we can merge it. It will also be useful for VexRiscv SMP that has just been merged in LiteX. I created #212 to review the potential DM issue you saw.

enjoy-digital / litedram

Implement LiteDRAMNativePortUpConverter with mode="both" #192