Open patschu11 opened 3 months ago
@patschu11 Hi Patrick thank you so much for the contribution!
Before digging into the detailed stuff (i.e., coding style, naming, etc), I have a very high-level comment on the PR:
placeUsingMILP
)? The buffer pass seems to have most of the functionalities that you have re-implemented in the PR (e.g., units and channels inside a CFDFC, etc.).
https://github.com/EPFL-LAP/dynamatic/blob/709f10362d2d5887042f2a8a48bc0cd8d064a8e5/lib/Transforms/BufferPlacement/HandshakePlaceBuffers.cpp#L194-L198@patschu11 Hi Patrick thank you so much for the contribution!
Before digging into the detailed stuff (i.e., coding style, naming, etc), I have a very high-level comment on the PR:
- Why is the LSQ sizing a separate pass instead of inside the buffer placement pass (e.g., after instantiating all buffers in
placeUsingMILP
)? The buffer pass seems to have most of the functionalities that you have re-implemented in the PR (e.g., units and channels inside a CFDFC, etc.). https://github.com/EPFL-LAP/dynamatic/blob/709f10362d2d5887042f2a8a48bc0cd8d064a8e5/lib/Transforms/BufferPlacement/HandshakePlaceBuffers.cpp#L194-L198
Hi Jiahui,
Although on a high level some functionalities that the buffer placement implements are similar, the data structures (e.g. CFDFC class) and corresponding algorithms are not sufficient for LSQ sizing. Only knowing the units and channels of a CFDFC would not have been enough, since there are some special features needed (e.g. inserting pseudo nodes into the graph that do not correspond to any Operation). Therefore it was easier and cleaner to implement the LSQ sizing creating a dedicated data structure for the graph instead of the CFDFC stuff from buffer placement. Implementing these features in the existing buffer placement structures would have necessitated a lot of changes, which could have impacted other features and the people working on them. We discussed this in the beginning with Lucas and decided to make it a dedicated pass since barely any functionality or information could be re-used from buffer placement. Then it is also possible for the user to decide if they want to use the LSQ sizing pass or tune the size manually instead. Also, the list of BBs and II for every CFDFC will be needed by other passes in the future as well, as far as I know, therefore it made sense to us, to encode this information as attributes, instead of building a wrapper around buffer placement.
Thanks for the contribution and for tagging me on this. I will try to take a look at this early this week, probably focusing more on the parts of the "regular flow" that are affected, as opposed to the new pass itself.
This is good to merge for me (modulo the two cosmetic changes to make following my answers above) assuming someone else looked at the algorithm being implemented. @Jiahui17 did you or someone else had a look? If so feel free to approve and merge, and thanks for the contribution @patschu11!
Hi, Patrick! Thanks again for the contribution
I added a few more comments/questions
This PR adds a pass, which sizes the Load and Store Queues, to the minimal depth, which still allows optimal throughput. This is done by analyzing the timing of LSQ loads/stores for each CFDFC separately and using the maximum queue sizes of any CFDFC. The pass is run between the handshake-canonicalization and the handshake-hw conversion.
Important side notes
Overview
The following things were implemented:
lsqDepth = #handshake<lsqDepth[3, 5]>
LSQGenerationInfo
class has been modified to read thelsqDepth
attribute. If the attribute does not exist, the default size (16) will be used for both the load and store queue.Arguments
The pass needs the compontents.json file as a timingDB, which is passed similarly to buffer placement. Furthermore, it is possible to specify different collision scenarios to the pass, by the
collisions=none/half/full
argument. These assume one of the following cases.Details
Algorithm data structure
The sizing algorithm works on the different CFDFCs of the kernel, determined in buffer placement. The CFDFCs are represented, as an adjacency list (AdjListGraph made of AdjListNodes), since most of the algorithm steps are variations of path-finding between different nodes.
Detailed steps of the algorithm
The sizing algorithm inside the LSQ Sizing pass has the following steps:
The implementation details are described in the semester thesis report (Automated Load-Store Queue Sizing in Dynamatic based on MLIR)
Results
The results match the one of the paper, namely the algorithm can find the optimal LSQ size for the steady-state, given that the correct sequence of IIs is provided, either by buffer placement or manually. The detailed results can be found in the semester thesis report: (Automated Load-Store Queue Sizing in Dynamatic based on MLIR)