Open rpirayadi opened 3 days ago
I very strongly support this modification because I believe CfToHandshake
mixes up in the same pass completely orthogonal concerns: converting a CDFG into a circuit (what we often refer informally as FPGA'18 or FPL'22+FPGA'23) and connecting the appropriate memory system to achieve correct execution. As @rpirayadi suggests, there are many ways to achieve the second: many depend on the actual elastic circuit (e.g., the so-called ICFPT'19) and some may modify the elastic circuit (e.g., techniques @rpirayadi plans to work on that would add elastic circuitry to remove LSQs in favour of normal memory connections). I understand that there are some technical reasons for the current implementation but I think we need to find a workaround for such technical constraints because they force us into a fundamentally wrong software organization where orthogonal concerns are combined.
It would be extremely useful to hear from @lucas-rami about (1) whether the nonconnected annotated handshake::GeneralLoadOp
memory operation is the right way to go and (2) how many things will possibly break doing this (@rpirayadi suggests not that many--is that optimistic?)
While I appreciate the effort in explaining this proposal in some details, I am deeply against this change. I think it is based on multiple misconceptions of how one works with memory interfaces in Dynamatic. Not only that, but it fails to actually explain in a meaningful way how this would work were it to be implemented.
Before I address the actual proposal, I want to put forward one design decision that I made in the past and for which I think there is a good argument to be made against (perhaps this will also clear up some things about how memory ports work). I decided at some point that load/store ports to MCs and LSQs should be different MLIR operations (for loads, handshake::MCLoadPort
and handshake::LSQLoadPort
) even though they have exactly the same operands/results and, from an abstract circuit perspective, behave in the same way. This choice was motivated by the fact that the RTL components we have for load and store ports are not the same depending on whether they connect to an MC or LSQ (why that is still escapes my understanding, but that it is not the topic). I now think that this discrimination at the Handshake level does not bring anything useful to the IR (it is anyway possible to determine which type of interface any port connects to by looking at adjacent operations in the IR) but it does create some light code duplication in places and is generally confusing. I would therefore be in favor of seeing it removed i.e., having a single handshake::LoadOp
port for all loads and the same for stores without changing how memory interfaces are represented or when they are placed. If this is something you agree with, I can dedicate some time to it in the near future.
Now onto the actual proposal. I will first address the misconceptions in the problem statement one-by-one, then comment on the proposed solution, and finally give my conclusion as to why this is not a desirable change.
As a result, the decision to choose between the memory controller and LSQ must be made in this pass.
To make this clear, the decision is made here with the amount of information available at this stage of the compilation because I want to produce a circuit with a valid memory network straight away. There is no loss of information in the placement process, on the contrary most of that information is now more clearly laid out in the CDFG as opposed to attributes. Absolutely nothing prevents you from going back on this decision later on once more information is available; it is in fact fairly easy to do so thanks to memory interface creation/replacement helpers I have introduced over time which are already used both from CfToHandshake
and from HandshakeReplaceMemoryInterfaces
.
Moreover, the
handshake::LSQLoadOp
has a port namedLSQLoadPort
which needs to be connected to an LSQ, which forces us to instantiate the LSQ at this point.
Incorrect,LSQLoadPort
has nothing to do with the IR itself. It is a view into an operand/result range of an handshake::LSQOp
that corresponds to a load port (to help client code parse the many ports attached to an interface). It is therefore semantically attached to a memory interface, not a port, which is why it is queried from the former and not the latter. You would still need this even with your proposal.
This transformation is implemented in the Handshake MLIR because it requires the handshake structure.
Not really. We made the design decision to do it here but the analysis part is perfectly doable pre-Handshake. In legacy Dynamatic, this analysis was done at what we would call the CF level today.
This process combines two logically distinct actions: (1) translating from CF to the Handshake representation and (2) connecting the memory operations to the chosen memory structure.
I think this distinction is very artificial and, in the end, arbitrary. Memory interfaces are part of Handshake and are required in a functional dataflow circuit, so instantiating functional ones can (and should imo) be considered part of the translation. These interfaces also change what your control network looks like; by not placing them, you will also find yourself with a half-baked control network at the end of (1). One may put forth that we also do not place buffers or forks/sinks in CfToHandshake
as a contradiction to what I am arguing for. However, these are very different beasts in my view because they do not non-trivially impact the CDFG; forks/sinks are trivially deducible at any point in the flow (and their temporary non-existence facilitates a number of downstream transformations) and buffers, in addition to not modifying the CDFG beyond a simple channel cut, are not placeable in any performance-oriented way before basically all circuit transformations have been applied (due to latency/throughput changes).
However, this pass also modifies memory connectivity (namely by removing LSQs or splitting a single LSQ into multiple LSQs). This highlights that these connections should ideally not have been established in the earlier transformation.
Thus, it makes more sense to postpone the connection to memory controller or LSQ until our knowledge is complete.
This is a straight up non sequitur. Having to rewrite the IR is not indicative of a bad design, it is the entire reason we have an IR in the first place and a toolchain to transform it (MLIR). Modern compilers are based on the principle of iterative refinement of the IR, where each refinement (i.e., compiler pass) is targeted at a specific aspect of the IR to analyze/optimize/transform. Pushing your argument to its logical limit, we should just go from C to RTL without an IR because we do not want to do something that we will have to go back on later. (As a sidenote, we never split LSQs into multiple LSQs. We may however split one into a master MC and a slave LSQ.)
A possible solution is to introduce a new type of load (and also store) operation possibly named
handshake::GeneralLoadOp
, which would eventually be translated into eitherhandshake::MCLoadOp
orhandshake::LSQLoadOp
based on the annotated information. Another approach could be to introduce an entirely new dialect between CF and Handshake which holds the general operation. This would solve the inconsistency of having two types of Handshake representation (ie. one before the final translation and one after).
This is much too vague to have a meaningful discussion about. What are the operands and results of those new "abstract ports"? How do they connect to the rest of the circuit before you actually place memory interfaces (where do load/store addresses go, where does loaded data come from)? Who produces the Handshake function's memory end control output before memory interfaces exist? MLIR will not let you exit a pass with an IR that has bits and pieces hanging around. If it looks like these new operations do not fit in Handshake and also do not fit in CF, as you seem to suggest, then it is perhaps because they do not actually represent anything meaningful at any IR level. In addition, dialects are meant to group operations/types/attributes which live at roughly the same abstraction level, so creating a new dialect for this couple of operations seems undesirable given that there is no clear definition of the kind of abstraction this dialect would represent.
As it is abundantly clear I am very much against this proposal and will remain as such until someone can actually showcase a use case where this redesign would allow someone to do something significantly more easily than with the current design. As I have mentioned, the system is made so that one can replace memory interfaces whenever one wants, and I am sure this could be made even easier with some lightweight code refactoring.
Furthermore, I believe the proposed change does not increase modularity in any useful way. The IR coming out of an hypothetical "CFToHandhsakeNoMemInterface
" pass would be missing part of its control network in addition to the memory network (no memory interfaces -> no control signals to/from them -> incomplete set of control signals in the circuit) i.e., memory interfaces are not exactly orthogonal to the rest of the circuit creation process. Many downstream passes will therefore prefer to see a circuit with memory interfaces present, even if they do not really care for the memory network's topology itself. Your "ConnectMemoryInterfaces
" pass would need to run before these passes, which themselves might desire to run before any "AdditionalMemoryAnalysis
". However, following your proposal the latter would require that memory interfaces have not already been placed, so now you need to write and run another pass that removes memory interfaces and goes back to the still unclear GeneralLoadOp
representation of memory ports. Let me put this graphically because I can see this being hard to track from text. From top to bottom, problematic chronological order of passes.
<passes from source to CF>
CFToHandhsakeNoMemInterface (your point 1 in the first paragraph of the problem def.)
ConnectMemoryInterfaces (your point 2, needed to legalize the IR for the next passes. Together with the previous pass, this is now exactly CfToHandhsake)
PassThatWantsAFullControlNetwork1
<...>
PassThatWantsAFullControlNetworkN
RemoveMemoryInterfaces (needed to legalize the IR for the next pass)
AdditionalMemoryAnalysis (e.g., existing `HandshakeAnalyzeLSQUsage`)
ConnectMemoryInterfaces (once one is done with all memory analysis stuff)
<passes down to RTL>
Note that any argument of the kind "nobody should want to run a pass between CFToHandhsakeNoMemInterface
and my last AdditionalMemoryAnalysis
pass so that I don't have to do this back and forth on port representation" would directly decrease modularity, and as such go against your stated goal.
I hope that this makes clear the implementation effort that would be associated with this change and puts its usefulness in perspective considering that you can basically do whatever you want today with the existing system. Also note the following.
AdditionalMemoryAnalysis
" pass understands that there are already unoptimized but functional memory interfaces in the circuit, like the HandshakeAnalyzeLSQUsage
pass easily does, as well as potentially previously placed annotations on memory ports that carry extra information not already encoded in the CDFG.AdditionalMemoryAnalysis
" kind need to run sequentially, you do not have to actually replace memory interfaces at the end of each pass. Much like HandshakeAnalyzeLSQUsage
only re-annotates existing memory ports and relies on the HandshakeReplaceMemoryInterfaces
to actually change the interfaces, you can run all your analysises sequentially---iteratively refining port annotations---and then call the replacement pass once at the end.tl;dr. Modulo much clearer evidence of actual limitations in the current design and a more concrete implementation plan for its replacement I am not going to approve of such contribution if given the power to.
I will try to end on a more positive note and point out that, if you want to go ahead regardless of my insights, you can do so in a non-destructive way for the existing flow. CfToHandshake
is made so that you can reuse parts of the transformation and ignore/modify others. Without much code duplication, you can therefore create an alternative pass that uses most parts of the original and do whatever you want with memories in it. Then you are free to implement any downstream pass that will gladly accept whatever IR you spit out of your alternative and finally connect memory interfaces once you are done so that you can run the rest of the Dynamatic flow unchanged. I am much more likely to approve of such experimental contributions, as they allow to try out a different design without hurting the regular flow and simply at the cost of person-hours if the alternative turns out to be sub-optimal.
@lucas-rami: thanks for the extensive and timely answer. Most certainly, we do not understand much of the entrails of Dynamatic and we appreciate the insights--in fact, this issue was entirely designed to elicit your reaction.... :-)
Before I move to a more concrete part of this comment let me cite this:
The IR coming out of an hypothetical "CFToHandhsakeNoMemInterface" pass would be missing part of its control network in addition to the memory network (no memory interfaces -> no control signals to/from them -> incomplete set of control signals in the circuit) i.e., memory interfaces are not exactly orthogonal to the rest of the circuit creation process.
Indeed, we briefly discussed this among ourselves but I think ultimately pushed this under the carpet. I think you are somehow right there: whoever messes up with the memory connectivity (LSQs and MCs) most likely needs to mess up also with the so-called control network....
Now, let me try to summarize the issue as succinctly as I can and please be patient with the many stupidities I may say along the way:
The distinction handshake::MCLoadPort
and handshake::LSQLoadPort
is unwise because the operation is the same--who and how the request is served is irrelevant to the operation (at least until much later), much as we have an 'Add' and we do not distinguish on what is the right implementation, RippleAdd
or CLAdd
(the comparison is not perfect, but somehow not fundamentally wrong either). It seems that you agree on this and you would even be available to help to remove this.
The fact that everything from *LoadPort
components to actual memory is connected very early (at CFToHandshake
time) is perhaps significant at semantic/MLIR level (I feel like I would challenge this, but it is actually pointless here) but it is certainly insignificant at a practical level if one could easily (i) rip off every existing connection and (ii) rebuild that connectivity (semi-)automatically. I take that this is what you signify when pointing to the helpers in MemoryInterfaces. Let me tentatively say this: we are qualitatively in agreement if (a) what you connect "too early" (for our taste, but who cares...) in CFToHandshake
is simply the result of a call to (some) function(s) and (b) the reconnection later is simply the result of a call to some other function ripping off the previous connection and a call to the same function(s) to reconnect. Is this, with some imagination to compensate for my potentially inept description, the qualitative case we are in or we agree we should be in? If we are, clearly we have nothing to change (except perhaps for the handshake::MCLoadPort
and handshake::LSQLoadPort
unification, perhaps).
I think the simplest use-case of what @rpirayadi wants to do may help here: he does not care who has created a circuit and how (FPGA'18 or FPL'22+FPGA'23 or anything) but, instead of inserting an LSQ, he wants to create a virtual 0-bit data dependency between the result of a predecessor (in program order) access and the next one, so that the accesses are kept in order and the LSQ unnecessary. Of course, this comes to a great damage to performance--and ultimately @rpirayadi's goal will be to do better--but it is a perfect example of a later pass (later than CFToHandshake
and later than ISFPT'19) that he might want to write--I hope it is clear what I mean here and if it is not, please ask for more details. I now think this may be (largely) compatible with the existing code base if one is to ignore what has been implemented in terms of memory interfaces and could simply get the existing one ripped off and recreated. (1) Would the pass implementing ISFPT'19 be a right example for this where @rpirayadi could see this "rip off and recreate automatically" in action? (2) If so, how would one "communicate" to the helper function constructing the memory connectivity that the handshake::LSQLoadPort
should now be an handshake::MCLoadPort
? (I hope you see what I mean, for even if there is only a universal handshake::LoadPort
someone has to make a decision on whether it is an LSQ port or an MC port, after all. Would this an annotation to leave on the handshake::LoadPort
node or what?
If we sort of agree, is ISFPT'19 an example of a pass @rpirayadi should get direct inspiration from for his pass? If we do not agree, what would be the best way for @rpirayadi to go about the kind of (many different) passes he would like to develop?
Thanks for the quick reply and and for clarifying your use-case, it helps me see what was the intended goal of the change :) I think we generally agree!
The distinction handshake::MCLoadPort and handshake::LSQLoadPort is unwise because the operation is the same [...] It seems that you agree on this and you would even be available to help to remove this.
Exactly. This is a largely mechanical change to implement, and it will especially have a positive impact on anyone wanting to change memory interfaces after some have already been placed. It would most likely take me 3/4 hours to implement, and I can do my best to schedule that at some point in the near-future before much of that implementation effort has started.
but it is certainly insignificant at a practical level if one could easily (i) rip off every existing connection and (ii) rebuild that connectivity (semi-)automatically
Yes, that is basically what I was trying to point out. I think the existing system is very workable for the needs you have (if I understand correctly) and changing it carries yet-undefined risks that I am not sure outweigh the benefits, which as I have pointed out I am not convinced of either.
Is this, with some imagination to compensate for my potentially inept description, the qualitative case we are in or we agree we should be in? If we are, clearly we have nothing to change (except perhaps for the handshake::MCLoadPort and handshake::LSQLoadPort unification, perhaps).
Yes, replacing the memory interfaces is a matter of calling a few functions and there are clear examples of this in both the FPGA'18 and ISFPT'19 passes. I am sure there are opportunities for lightly refactoring the existing code that would make this even easier (e.g., some reusable parts of the interface replacement may still be private to a pass and so would need to be exposed publicly), and I can provide some guidance if needed at some point.
(1) Would the pass implementing ISFPT'19 be a right example for this where @rpirayadi could see this "rip off and recreate automatically" in action?
Yes, when taken together with the HandshakeReplaceMemoryInterfaces
pass. The HandshakeAnalyzeLSQUsage
pass (ISFPT'19) simply re-annotates memory ports to encode the dependence analysis's result (without changing the CDFG yet). HandshakeReplaceMemoryInterfaces
then reads these annotations (without caring where they came from) and replaces existing memory interfaces with new ones according to those. The IR you get back has memory interfaces which now match memory port annotations. If no annotation encoded the need for an LSQ then your output IR will not have LSQs, even if the input IR contained LSQs.
(2) If so, how would one "communicate" to the helper function constructing the memory connectivity that the handshake::LSQLoadPort should now be an handshake::MCLoadPort? (I hope you see what I mean, for even if there is only a universal handshake::LoadPort someone has to make a decision on whether it is an LSQ port or an MC port, after all. Would this an annotation to leave on the handshake::LoadPort node or what?
These *Port
types are somewhat unrelated to all those passes; they are neither annotations or operations, just helpers when writing code that deals with memory interfaces. The actual annotation (attribute in MLIR jargon) that the ISFPT'19 pass sets is called MemInterfaceAttr
(documentation somewhat outdated) and allows to encode whether the port should ultimately go to an MC or LSQ. The *Port
types (which will get simpler once the handshake::MCLoadOp
/handshake::LSQLoadOp
nonsense is gone) are just data structures that can be queried by the user from a memory interface to better understand what it connects to (the nature and number of ports, their groupings, etc.). They won't be a problem in this effort.
To conclude, the ISFPT'19 pass and the memory replacement pass are the best places to look for inspiration on this. These are relatively short passes as well, so they should not be too hard to parse. I hope that clarified a bit more my thoughts on all of this. Don't hesitate to let me know in case of questions.
@lucas-rami, thanks for the comments and the confirmation that we are essentially on the right track to understand things.
Exactly. This is a largely mechanical change to implement, and it will especially have a positive impact on anyone wanting to change memory interfaces after some have already been placed. It would most likely take me 3/4 hours to implement, and I can do my best to schedule that at some point in the near-future before much of that implementation effort has started.
Maybe we should wait for other stakeholders (@lana555 in particular) to approve, but from my side I would be grateful if you could enact this change as early as possible. I am sure that @rpirayadi would be happy to help, but I guess it is more efficient if you do it yourself--your choice.
In hindsight, I have the feeling that we have been induced in a misunderstanding by the handshake::MCLoadOp
/handshake::LSQLoadOp
"nonsense", for your description "[...] re-annotates memory ports to encode the dependence analysis's result (without changing the CDFG yet). HandshakeReplaceMemoryInterfaces
then reads these annotations (without caring where they came from) and replaces existing memory interfaces with new ones according to those" matches completely what we (or at least I) had in mind except for what looks to me an irrelevant detail.
It seems to me that the only difference from the suggestions by @rpirayadi and what you say is there already is minor: we had in mind a flow
keep changing the annotations (attributes) and then add the interface only once at the end
whereas what is implemented already is a
keep changing the annotations (attributes) and keep generating a tentative interface until it won't change anymore
The moment there is a HandshakeReplaceMemoryInterfaces
that is the functional equivalent of what we would have called a HandshakeCreateMemoryInterfaces
, I see any difference as irrelevant to our needs and I agree we should change nothing to the overall architecture (I understand what you say about the light refactoring and I think @rpirayadi would not have any problem to do that).
@lucas-rami: Please do tell me if I am missing the point somehow or if my representation of the situation is too simplistic in any way.
@rpirayadi: Do you agree? Can you start looking into HandshakeAnalyzeLSQUsage
and HandshakeReplaceMemoryInterfaces
to see how similar they are to what you plan to do? My sense is that you want to write some replacement of HandshakeAnalyzeLSQUsage
that also changes somehow the circuit besides modifying the annotations on the memory ports, and then HandshakeReplaceMemoryInterfaces
should work for you (almost?) as is.
@lana555: Do you agree with all this? Do you agree on the removal of the "the handshake::MCLoadOp
/handshake::LSQLoadOp
nonsense"? Any other concern?
Yes, it makes sense to remove handshake::MCLoadOp/handshake::LSQLoadOp, as long as it is changed consistently across the flow (I suppose there are some minor changes to do in several passes down to RTL generation...).
Sorry for the pedantic question, but I am assuming @lucas-rami has offered to take care of this change in all of the released code base (I am not sure how to say this properly--I guess I mean whatever is in main
), right? If this is not the case, we should be more careful before we launch ourselves into this.
I read most of the thread and I want to give another motivation for why we should unite LSQLoadOp and MCLoadOp:
Currently, MCLoad has a transparent buffer slot between memory -> circuit but LSQLoad doesn't have it; this implementation has a deadlock risk (documented in #136)
Thanks @Jiahui17!
Could we agree that future memory operations will have the absolute minimum number of embedded buffer slots that makes sense functionally (perhaps zero) and that the responsibility of avoid deadlocks is somewhere else (e.g., automatically happening in the buffering pass or in an appropriate pass right before implementation or in any other place where this buffer is made explicit)? I think it would make for a more clean design....
@paolo-ienne I agree. In particular here, moving the buffer into the memory controller may make sense.
It is up to the memory controller to figure out how to safely share the load port (here I mean memory controller -> BRAM), and it is not the circuit's job to put buffers here and there to avoid deadlocks.
First, I want to thank @lucas-rami for his thoughtful responses. I also want to let you know I am ready to help for conducting changes in unifying LSQLoadOp
and MCLoadOp
, if you want.
Second, in answer to @paolo-ienne, I looked again at HandshakeAnalyzeLSQUsage
and HandshakeReplaceMemoryInterfaces
. The thing I want to do, does not have much analysis more than what is already done, but instead it consists of changing the circuit and adding some components. However, beside this somehow irrelevant point, I think I can use HandshakeReplaceMemoryInterfaces
as is.
There is still something unclear for me which I think may be irrelevant to the rest of the discussion, but I'll just mention it briefly. That is what raised this discussion in the first place. I need to run the Fast Token Delivery (FPGA'22) after doing my changes. However, the current version @pcineverdies is implementing is implemented in the CFToHandshake
pass (actually he is writing his own version of CFToHandshake
). That is what caused us to think it is better to break CFToHandshake
into smaller passes so we can somehow reuse the functionality. And the first thing that came to our mind was to separate these two parts.
(1) translating from CF to the Handshake representation and (2) connecting the memory operations to the chosen memory structure.
But, as @lucas-rami suggest breaking this would result in getting some meaningless intermediate result. It's still unclear to me how we can reuse the functionality of Fast Token Delivery. I would appreciate any comments regarding this concern.
I would say that #177 is the good location to have a discussion around Fast Token Delivery! (in terms of requirements, necessities, issues...)
@paolo-ienne
Please do tell me if I am missing the point somehow or if my representation of the situation is too simplistic in any way.
That sounds good to me!
but I am assuming @lucas-rami has offered to take care of this change in all of the released code base (I am not sure how to say this properly--I guess I mean whatever is in main), right?
Yes I will merge this on main
. I will actually make a regular PR for the sake of notifying relevant stakeholders. I will try to land this during the week, otherwise next week.
@Jiahui17
Currently, MCLoad has a transparent buffer slot between memory -> circuit but LSQLoad doesn't have it; this implementation has a deadlock risk (documented in https://github.com/EPFL-LAP/dynamatic/issues/136)
Thanks for bringing this up, this is super relevant and I had semi-forgotten about it. When I make this refactoring we will end up with a single timing model for load ports and a single one for store ports, at least temporarily, which obviously doesn't align with our two different RTL implementations for each port type. Is it a good time to also unify these ports? Perhaps by making the implementation with a buffer inside the unique one for both MCs and LSQs (at least until someone figures out a cleaner solution, like putting them in the memory interfaces themselves)?
@rpirayadi
But, as @lucas-rami suggest breaking this would result in getting some meaningless intermediate result. It's still unclear to me how we can reuse the functionality of Fast Token Delivery. I would appreciate any comments regarding this concern.
Thanks for providing more context, it helps me understand what the end goal is. I agree with @pcineverdies that the bulk of this conversation should probably go on #177, but just to provide some insight now that I have a clearer idea of what you want to do. You don't have to break down CfToHandshake
(or any future alternative version) into multiple actual passes---which I am not convinced is doable in any meaningful way---to be able to reuse code. With some smart design choices, you could potentially factor part of what the pass does as an independent piece of code that transforms the IR in a specific way and "call" that piece of code from whatever pass you are implementing. Note that while that may look easy to do from a conceptual perspective the reality of implementation often makes factoring out such a processing step much more difficult than it may look at first sight, so I invite everyone involved in these inter-operating pieces of code to discuss this among yourselves and try to determine what is realistically doable.
@lucas-rami: Thanks!
We believe there is an issue in how the memory operations are currently handled in Dynamatic. This issue is discussed in the following section. A possible solution is proposed afterward, and finally, the impact this change has on Dynamatic is discussed.
Problem definition
Firstly, keep in mind that the issue we would like to highlight exists for both types of memory operations. However, for simplicity, we focus on load operations in the description. Currently, in the pass
CfToHandshake
memory operations are converted frommemref::LoadOp
to eitherhandshake::MCLoadOp
orhandshake::LSQLoadOp
. As a result, the decision to choose between the memory controller and LSQ must be made in this pass. Moreover, thehandshake::LSQLoadOp
has a port namedLSQLoadPort
which needs to be connected to an LSQ, which forces us to instantiate the LSQ at this point. This process combines two logically distinct actions: (1) translating from CF to the Handshake representation and (2) connecting the memory operations to the chosen memory structure. We believe the two actions (ie. translation and connection) should ideally be separated into two different passes to maintain a clean and modular design.One of the manifestations of this issue is the current implementation of Shrink it or Shred it!(ICFPT'19). This transformation is implemented in the Handshake MLIR because it requires the handshake structure. However, this pass also modifies memory connectivity (namely by removing LSQs or splitting a single LSQ into multiple LSQs). This highlights that these connections should ideally not have been established in the earlier transformation.
In conclusion, it appears that the decision on how to connect memory operations to memory controller or LSQ is premature. Thus, it makes more sense to postpone the connection to memory controller or LSQ until our knowledge is complete.
Possible solution
A possible solution is to introduce a new type of load (and also store) operation possibly named
handshake::GeneralLoadOp
, which would eventually be translated into eitherhandshake::MCLoadOp
orhandshake::LSQLoadOp
based on the annotated information. Another approach could be to introduce an entirely new dialect between CF and Handshake which holds the general operation. This would solve the inconsistency of having two types of Handshake representation (ie. one before the final translation and one after).Impact
Implementing the proposed solution would have some impacts. Every pass that modifies the connectivity of memory operations should now annotate all the required information in
handshake::GeneralLoadOp
which would eventually be translated intohandshake::MCLoadOp
andhandshake::LSQLoadOp
. The advantage is that this transformation is done uniformly, regardless of the changes done by the previous passes. Previous passes can alter the result simply by modifying the annotation on the operation.One of the passes that are affected, is
--lower-cf-to-handshake
, since it needs to convert CF operations intohandshake::GeneralLoadOp
. To instantiate the LSQ we need the dominance information which in turn requires the control flow information, so one of the things that should be preserved and carried through is the control flow information. Other required modifications are in--handshake-analyze-lsq-usage
and--handshake-replace-memory-interfaces
which implement Shrink it or Shred it!. The modification in these two passes mainly consists of not connecting the load operation into different ports, but instead annotating where it should be connected inside the operation. Finally, we believe that the proposed solution leaves all the other passes unchanged.Given this proposed approach and its potential impacts, we would appreciate your thoughts and opinions on this solution.