Closed jedrzejboczar closed 3 years ago
I was testing the changes in litex_sim
and I had to change the up-converter logic a bit.
There was a problem during writes because write commands were being queued and sent to the controller faster than the write converter could generate new data. This led to situations where the controller was sending 2 wdata.ready
in subsequent cycles, but I could not find a way to match that in every situation. To fix that, now the up-converter sends write command only after next_cmd
. I also needed to remove any buffering on cmd_buffer
. I wonder if there is a better way to do that.
I fixed a problem with LiteDRAMWishbone2Native
which was holding old data valid
after it has been accepted, leading to the up-converter sending wrong data. I also forced wishbone2native to send all commands with cmd.last
as it has to get the data before ack
anyway. We should probably rewrite wishbone2native to make use of the up-converter capability to group commands.
@jedrzejboczar: thanks, i'll try to have a closer look tomorrow. Is the current implementation working? This would indeed be a good idea to simplify LiteDRAMWishbone2Native
with the new capabilities of the up/down converters and just make it only do the bridge between a Wishbone and Native interfaces of the same data width.
It should be working now. I added unit tests for the special cases and I was able to use the up-converter in litex_sim
(by forcing crossbar to use different port data width and disabling L2 cache).
I've just tested it with L2 cache and there was a minor problem, but https://github.com/enjoy-digital/litedram/pull/192/commits/b0bde294c0a440116292a4267e6a33cf73edc7ad fixes it. We should probably remove the data width conversion from LiteDRAMWishbone2Native
as you say. I'll work on this now, but the current change set should be working as it is.
Great, thanks.
I've rewritten LiteDRAMWishbone2Native
to use LiteDRAMNativePortConverter
for the data width conversion.
Because wishbone won't present new address/data before it receives ack, LiteDRAMWishbone2Native
sends all read commands with cmd.last=1
, effectively always sending only one chunk.
I wanted to at least avoid that for writes, so when writing LiteDRAMWishbone2Native
uses cmd.last=0
. But because we don't know whether the command is the last one or not at the moment of sending, I made use of port.flush
which is simply set high when wishbone transfer ends. If this is not ok, then we can just use cmd.last=1
for all commands, or maybe we could buffer incoming write commands to send them 1 cycle later and insert cmd.last
depending on wishbone.cyc
, but this complicates the converter a bit.
I have tested the changes on the Mercury XU5 board and I'm getting ~50% errors during memtest. I need to further investigate this problem, so we should wait with merging this PR now.
@jedrzejboczar: ok thanks for the feedback. Maybe you could try to use a similar configuration with litex_sim
than with the Mercury XU5 board and see if you are able to reproduce the issue?
I tried to run litex_sim
with exactly the same PHY settings as for Mercury board, but unfortunatelly there are no errors and memtest succeeds. I also ran with DFITimingsChecker
, but it reports no issues.
@jedrzejboczar: strange, have you also tested with upstream LiteDRAM to be sure it's still working ? (IIRC you had issue with 125MHz clock freq on the Mercury XU5).
Yes, I had to lower the frequency to make the leveling work. I haven't tested with all the changes from master, I can try rebasing onto master to test it, but I don't see any changes that could influence this problem.
OK, i just wanted to be sure the issue is related to the changes and not something else. I'm going to do a test on Arty.
Thanks, I'll try to do some more test tomorrow.
It's working fine on Arty.
Hmm, strange. Are you sure that the LiteDRAMNativePortUpConverter
gets instantiated? Because by default on Arty there will be L2 cache of width 128, the same as the native port will have. You could for example force crossbar to create port of different width on line https://github.com/enjoy-digital/litex/blob/3391398a5fccd5fc6cae36231a11d34f032dc3c5/litex/soc/integration/soc.py#L1050.
Indeed, i haven't checked if the adapter was used or not. I'll do more tests tomorrow.
@enjoy-digital I still haven't been able to reproduce this problem in the simulation. I've been able to test this code on Arty, forcing memory port to 32 bits (so with a 32:128 up-converter), and everything worked without any errors.
For some reason the case with up-converter fails on XU5. Only every 4th write is being read correctly. Reading is probably correct, but there is a problem when writing data:
litex> mr 0x40000000 0x30
Memory dump:
0x40000000 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 ................
0x40000010 00 00 00 00 00 00 00 00 00 00 00 00 05 00 00 00 ................
0x40000020 00 00 00 00 00 00 00 00 00 00 00 00 09 00 00 00 ................
litex> mw 0x40000000 0xaabbccdd 1
litex> mr 0x40000000 0x30
Memory dump:
0x40000000 dd cc bb aa 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x40000010 00 00 00 00 00 00 00 00 00 00 00 00 05 00 00 00 ................
0x40000020 00 00 00 00 00 00 00 00 00 00 00 00 09 00 00 00 ................
litex> mw 0x40000000 0xaabbccdd 4
litex> mr 0x40000000 0x30
Memory dump:
0x40000000 00 00 00 00 00 00 00 00 00 00 00 00 dd cc bb aa ................
0x40000010 00 00 00 00 00 00 00 00 00 00 00 00 05 00 00 00 ................
0x40000020 00 00 00 00 00 00 00 00 00 00 00 00 09 00 00 00 ................
litex> mw 0x40000000 0xaabbccdd 3
litex> mr 0x40000000 0x30
Memory dump:
0x40000000 00 00 00 00 00 00 00 00 dd cc bb aa 00 00 00 00 ................
0x40000010 00 00 00 00 00 00 00 00 00 00 00 00 05 00 00 00 ................
0x40000020 00 00 00 00 00 00 00 00 00 00 00 00 09 00 00 00 ................
litex> mw 0x40000000 0xaabbccdd 7
litex> mr 0x40000000 0x30
Memory dump:
0x40000000 00 00 00 00 00 00 00 00 00 00 00 00 dd cc bb aa ................
0x40000010 00 00 00 00 00 00 00 00 dd cc bb aa 00 00 00 00 ................
0x40000020 00 00 00 00 00 00 00 00 00 00 00 00 09 00 00 00 ................
A single write overwrites all the 128 bits, and when doing multiple writes, the last one overwrites the previous ones. It seems like there is a problem with data masking and the whole burst gets written to the memory.
@jedrzejboczar: thanks for the update. I'll have a closer look at it soon.
One more thing, reading through the datasheet of MT40A256M16 it is written that the data mask feature is controlled by MR5[10], so in our case it is disabled. It also seems that data mask is inverted, i.e. high DM means that given byte is being written and low DM means that given byte gets masked and ignored ("WRITE Burst Operation", p. 225). I tried enabling data mask in MR5 and tried running with both inverted and not inverted DM, but with no success.
Sorry for the delay to merge to review and merge this PR. I've been able to do some tests on hardware on various boards so we can merge it. It will also be useful for VexRiscv SMP that has just been merged in LiteX. I created #212 to review the potential DM issue you saw.
Closes https://github.com/enjoy-digital/litedram/issues/191
This PR implements the up-converter as described in https://github.com/enjoy-digital/litedram/issues/191. It combines subsequent
port_from
commands to lower the number ofport_to
commands. It allows incomplete transfers and automatically advances to nextport_to
command when command address or type (r/w) changes. It is also possible to have gaps, e.g. for 1:4 conversion we can write/read[0x00, 0x02, 0x03], ...
. As you suggested, I usedport_from.cmd.last
instead ofport_from.flush
to indicate the end of a burst.cmd.last
must be 1 if the last command does not form a "complete"port_to
command.I added unit tests for the mentioned cases and they all pass, but I seem to have some problem during
memtest
when running inlitex_sim
. I will further investigate if this is caused by the up-converter or there is some incompatibility with wishbone2native, so we may yet wait with merging this PR.If you have any suggestions I can modify the implementation. To be able to handle all the cases I had to insert FIFOs on write and read data paths not to loose data during transfers with gaps.