Closed ptpan closed 3 years ago
Thanks for your pull request Peitian. Reducing throughput across all users by 2X of the endpoint clearly would not be desirable. Perhaps instead, if your module cannot meet the SLA of responding within a cycle, that you not assert the yumi signal to accept the next request? Longer term, the better answer for your module is to have a parameter that can insert a FIFO for the endpoint standard to hold excess reply metadatas and feeding them into the existing register, allowing more outstanding transactions. We would have liked to provide this feature for you, but we are very short on verification bandwidth and under RTL freeze (postsynth freeze as well). If you have spare bandwidth you might be able to create a version for the CGRA.
Thanks for your pull request Peitian. Reducing throughput across all users by 2X of the endpoint clearly would not be desirable. Perhaps instead, if your module cannot meet the SLA of responding within a cycle, that you not assert the yumi signal to accept the next request? Longer term, the better answer for your module is to have a parameter that can insert a FIFO for the endpoint standard to hold excess reply metadatas and feeding them into the existing register, allowing more outstanding transactions. We would have liked to provide this feature for you, but we are very short on verification bandwidth and under RTL freeze (postsynth freeze as well). If you have spare bandwidth you might be able to create a version for the CGRA.
Got it. I will add this support to the CGRA RTL.
This PR fixes an issue where back-to-back incoming requests at an endpoint standard could lead to responses having corrupted
pkt_type
field. It becomes noticeable when the design talking to the endpoint has a delay of at least two cycles between an incoming request and its response.There is already an assertion on this issue in the non-synthesizable part of the endpoint (https://github.com/bespoke-silicon-group/bsg_manycore/blob/4a1edf5eb1f7dc99e64cede3d12a793f13efaa42/v/bsg_manycore_endpoint_standard.v#L318-L336), but it might not be sufficient since the delay-annotated simulation can have different cycle-level behaviors from the RTL simulation. Therefore, this PR adds extra logic to the endpoint so that no incoming requests can appear valid until the response to the current in-flight request has been sent out.
However, the specific implementation in this PR means the endpoint cannot achieve full throughput (one incoming request, one outgoing response every cycle). To implement the ideal pipelined behavior,
in_v_o
(i.e., whether the incoming request is valid) has to combinationally depend onreturning_v_i
(i.e., whether the outgoing response is valid). This violates the assumption of the valid-yumi protocol wherevalid
must be asserted towards the beginning of a cycle.Let me know if I messed anything up in the implementation or the description above. I don't how exactly how to run your manycore regression test suite, so a quick regression to make sure nothing breaks would be super useful...