bespoke-silicon-group / bsg_manycore

Tile based architecture designed for computing efficiency, scalability and generality
Other
228 stars 58 forks source link

Fixing endpoint standard behavior under back-to-back incoming requests #534

Closed ptpan closed 3 years ago

ptpan commented 3 years ago

This PR fixes an issue where back-to-back incoming requests at an endpoint standard could lead to responses having corrupted pkt_type field. It becomes noticeable when the design talking to the endpoint has a delay of at least two cycles between an incoming request and its response.

There is already an assertion on this issue in the non-synthesizable part of the endpoint (https://github.com/bespoke-silicon-group/bsg_manycore/blob/4a1edf5eb1f7dc99e64cede3d12a793f13efaa42/v/bsg_manycore_endpoint_standard.v#L318-L336), but it might not be sufficient since the delay-annotated simulation can have different cycle-level behaviors from the RTL simulation. Therefore, this PR adds extra logic to the endpoint so that no incoming requests can appear valid until the response to the current in-flight request has been sent out.

However, the specific implementation in this PR means the endpoint cannot achieve full throughput (one incoming request, one outgoing response every cycle). To implement the ideal pipelined behavior, in_v_o (i.e., whether the incoming request is valid) has to combinationally depend on returning_v_i (i.e., whether the outgoing response is valid). This violates the assumption of the valid-yumi protocol where valid must be asserted towards the beginning of a cycle.

Let me know if I messed anything up in the implementation or the description above. I don't how exactly how to run your manycore regression test suite, so a quick regression to make sure nothing breaks would be super useful...

taylor-bsg commented 3 years ago

Thanks for your pull request Peitian. Reducing throughput across all users by 2X of the endpoint clearly would not be desirable. Perhaps instead, if your module cannot meet the SLA of responding within a cycle, that you not assert the yumi signal to accept the next request? Longer term, the better answer for your module is to have a parameter that can insert a FIFO for the endpoint standard to hold excess reply metadatas and feeding them into the existing register, allowing more outstanding transactions. We would have liked to provide this feature for you, but we are very short on verification bandwidth and under RTL freeze (postsynth freeze as well). If you have spare bandwidth you might be able to create a version for the CGRA.

ptpan commented 3 years ago

Thanks for your pull request Peitian. Reducing throughput across all users by 2X of the endpoint clearly would not be desirable. Perhaps instead, if your module cannot meet the SLA of responding within a cycle, that you not assert the yumi signal to accept the next request? Longer term, the better answer for your module is to have a parameter that can insert a FIFO for the endpoint standard to hold excess reply metadatas and feeding them into the existing register, allowing more outstanding transactions. We would have liked to provide this feature for you, but we are very short on verification bandwidth and under RTL freeze (postsynth freeze as well). If you have spare bandwidth you might be able to create a version for the CGRA.

Got it. I will add this support to the CGRA RTL.