bespoke-silicon-group / basejump_stl

BaseJump STL: A Standard Template Library for SystemVerilog
http://bjump.org/
Other
528 stars 99 forks source link

Dramsim3 multiple reads proper fix #620

Open dpetrisko opened 1 year ago

dpetrisko commented 1 year ago

Fixes bug Introduced by the bugfix in #619.

This PR adds a more general solution for multiple reads returning in the same cycle. For the dramsim3 controller, different addresses return in different cycles (I think? I couldn't find any guarantee that this is true), but requests to the same address return twice in the same cycle. The requesters are independent so they each expect a response. So this solution enqueues all reads to an in-order queue, which is drained at the verilog's convenience.

Changes are restricted to bsg_dramsim3.cpp to avoid modifications to dramsim3 code or bsg verilog

tommydcjung commented 1 year ago

This change might be functionally correct, but creates some discrepancies in timing. In dramsim3, when there are N reads to the same address in the queue, it collapses them as one read (but it calls read_done() callback N times, and I don't know if that's a feature or bug). With this change, verilog will return those N requests over N cycles, while dramsim3 cpp will internally process the next read or write requests at the same time, which creates conflicts on the data pins. The timing stats recorded internally in dramsim3 would not match the external behavior in dramsim3 verilog. Also, in hbm2, you can't send requests to the same bank group every cycle, because of tCCD_L = 2. But with this change, it would appear that those read requests to the same address are being issued and completed every cycle, which adds additional discrepancy. I don't think it's that difficult to understand and fix the dramsim3 source code, although it might require some efforts to verify it. I have fixed some bugs and added new features in the past myself. I think I understand the bug in dramsim3, and I might offer some help to fix it. But if we choose to go with this route, I would add a parameter to turn it off.

dpetrisko commented 1 year ago

Thanks for the feedback. It is a bug because a DRAM controller, emulated or otherwise, should not silently drop requests in any condition. Notably it is our interface code which is dropping, not DRAMSim3, which has a particular behavior incompatible with our C++ adapter.

Can you clarify what is the behavior you’re expecting from a fix? It is unclear to me that the dramsim3 behavior is incorrect. Since requests are queued in the controller and drain serially, any modification to the queue in dramsim3 will disturb accurate timing. The reason that this bug has not shown before is that dramsim3 does not return data for different addresses in the same cycle, which could happen for an arbitrary access pattern. Presumably this is emulating some internal buffering of returns so that they do not come out at the same time, or alternatively scheduling the control signals so that they do not align on output. In this case, dramsim3 is telling us that same-address coalescing is being done by sending multiple read dones.

In the HBM case, the particular timing parameters are handled by the controller. So the semantics are that the controller recognizes the repeated address and does not send the second request, but instead returns the data twice. This avoids the bank conflict and seems to be what dramsim3 is telling us is the modeled behavior in this circumstance.

If this is the case, then I’m not sure what else can be done to provide more correspondence to what would happen in a real controller. We need to come up with a solution which is globally enabled, preserves the current behavior if there are no same address reads and fixes the bug that currently exists if there are same address reads.

tommydcjung commented 1 year ago

I agree that the current handling of multiple read on the same address by dramsim3 is more efficient way than issuing those commands individually, and we should keep that feature. If we can say that the maximum number of reads on the same address is bound by some finite number (which is probably the case) so that this additional internal buffering for the returned data doesn't grow infinitely large, it's more realistic.

dpetrisko commented 1 year ago

There does exist a limit for total queue capacity: https://github.com/umd-memsys/DRAMsim3/blob/29817593b3389f1337235d63cac515024ab8fd6e/src/controller.cc#L151

Each read occupies that queue. So presumably the same return space could be used for either type of read? The controller could just do an address lookup over the buffered results and that will match the return data however many times is required? Presumably if there is internal reordering, something along those lines is happening already