Open calebmkim opened 1 year ago
Thank you for the super awesome writeup @calebmkim!
One thing to note is that sometimes, we might sometimes want different "labels" for different tokens.
This is something I've been pondering as well–we use the controlID stuff in slightly, but importantly, different ways in various passes like TDCC (which, AFAIK, doesn't even use ControlID). I think a good question is what would be a good representation/set of node ID that works for every pass?
ControlIDshould take in a control block, and internally build a full CFG.
We can even do one step further and change ControlID::new
to take full ownership of the ir::Control
struct. This means that instead of indirectly referencing to nodes in a modified ir::Control
, the user must "detach" the control program and give it to the Analysis. This way, we can make queries like "give me an iterator over the control program in the order of the control flow edges".
Once a pass is done with it's CFG-based computation, we can use a ControlID::take
function to return the original control program.
The big thing to note about this design point is that it doesn't let you add anything to the CFG, just query it and modify the groups underneath it. However, it is still powerful for analyses. Maybe we can cleverly design a Visitor
-like interface for CFG-based passes that gives us control-flow order of things as well as the ability to modify it
One thing to add to this proposal is some notion of data flow/dependency: this could be helpful if we want to do some sort of "unsharing" in order to increase parallelization.
Yes! In fact, the current unsharing pass does something similar in computing a CFG but does so in a hacky way. @EclecticGriffin might have more insight on how exactly it works.
CDFGs, from what I understand, have basic blocks (i.e., blocks of instructions that you know they are just going to always just execute sequentially
I think we can use groups in place of basic blocks. Also, it is not necessary for us to build a CDFG for the kinds of passes we perform today but maybe useful to think about in the future.
Yes indeed; thanks for the really useful writeup, @calebmkim!
I think you did a good job mapping out what it would take to approach either of these two ideas. Perhaps it goes without saying, but this discussion also implicates the idea of "flattening" the entire IR so that integer IDs are already available (they are the way you identify control nodes), as in #1183 and friends.
I think it would be good to direct the discussion toward deciding what we will do: one of these, both, or some kind of intermediate point? Maybe a way to make that decision would be to resolve this question from @rachitnigam:
I think a good question is what would be a good representation/set of node ID that works for every pass?
Could we do an audit of the way in which these IDs are use so we can confidently describe what all these passes want from a set of IDs?
Overview Currently, the
ControlID pass
provides methods that can label the control stmts with u64's, and we can query for the label of a given control stmt. We want to improve this analysis to provide a better interface for the user.There are two main ideas discussed in #1358. The first is a more minor change to improve the interface of ControlID, the second is a more major change to get
ControlID
to actually build a full CDFG.Improve ControlID Interface to Give Tokens instead of u64's
Par(u64)
, see: New Type Idiom) instead of u64. This way, we can build an interface in which the users of the ControlID Analysis never actually look at the u64's inside the tokens.ParThread(u64)
to signify the control stmt is a direct child of a par block, but other times, we might want the same control statement to be wrapped as aSeq(u64)
(or whatever that control statement actually is).ParThread(u64)
token for a control statement that isn't actually the direct child/thread of a par block, then we will probably want to produce an errorControlID should produce a CFG
ControlID
should take in a control block, and internally build a full CFG. Then, users should just be able to make queries about the CFG, whichControlID
should also support by internally analyzing the CFG that it has built.1225 has already discussed this same idea. For each possible control stmt (invoke, enable, par, seq, while, if), the corresponding graph will have exactly one entry node and one exit node. Also, syncs are given their own nodes, which makes sense. It also include labels for edges, which the program should take based on whether the conditional is true or not.