flexflow / FlexFlow

FlexFlow Serve: Low-Latency, High-Performance LLM Serving
https://flexflow.readthedocs.io
Apache License 2.0
1.6k stars 219 forks source link

Per-external-function docstrings for `lib/substitutions` #1294

Open lockshaw opened 5 months ago

lockshaw commented 5 months ago

Add doxygen docstrings to external facing functions and structs in lib/substitutions.

lockshaw commented 5 months ago

@Bob-Chen222 If you have questions about how substitutions works (which you probably should :slightly_smiling_face:), feel free to ask @wmdi or @lockshaw either in this issue or in slack.

As much as possible try to keep the documentation high-level: try to focus not on what the function does concretely (which anyone can figure out by reading the implementation), but what it does in the context of the larger library/system. For example, instead of documenting that get_attribute gets an attribute from an op-attrs object, explain how this is used to perform more fine-grained matching and to perform basic computations of tensor sizes, etc. in the context of the substitution matching functionality provided by lib/substitutions.

Bob-Chen222 commented 5 months ago

Hi Colin @lockshaw, thank you for your comment. I do have some questions to ask:

I will spend more time checking the repo in the following week after several midterms. Thanks for your help!

lockshaw commented 5 months ago

what is the difference between an open graph and a graph? Or what is the definition of the open graph?

Open graphs allow edges from and to nodes not themselves within the graph. For example,

is an open graph that is not also a closed graph. Of course every closed graph is an open graph, just ones where the number of input and output edges is zero.

My understanding of GraphPattern is that we should find the same Pattern in a sub-PCG. If this is the case, I am unsure why we only require node_assginment and edge_assginment as injections instead of bijections.

I'll leave this to @wmdi as she's worked on this code more recently.

not sure what does outputlabelled mean in OutputLabelledOpenMultiDiGraph or how the labeling works in general

Labelling is how we assign meanings to the graph structures. In FlexFlow's graph library nodes in graphs are just "nodes", and edges are just "edges" (unlike graph libraries like NetworkX where nodes at least can be arbitrary objects). A "labelling" is thus just a mapping from nodes, edges, etc. to other objects.

For example, intuitively a standard machine learning computation graph has a labelling of nodes to operators and edges to tensors. However, this is actually not quite the case: for example, in the following graph

a labelling of edges as tensors would mean that "B" and "C" would be different tensors, which is not the case as they're the same tensor being passed to two different operators. Thus, the labelling we want is actually a labelling of node outputs, not edges, as demonstrated in image below, which shows the same graph as the previous one but with output labels instead of edges labels:

when is the deadline to finish open issues? Is the deadline rolling every week, or should we finish it ASAP?

For the documentation issues there's not quite as hard of a deadline since they usually aren't blocking, but I will check in on them each week at the Wednesday meeting. In general I'd recommend (especially as you're getting started) starting by filing a PR that has documentation for a small subset of the issue so you can start getting feedback asap, rather than waiting until you've documented everything and then needing to do a huge amount of revising to address style changes and other comments.

wmdi commented 5 months ago

My understanding of GraphPattern is that we should find the same Pattern in a sub-PCG. If this is the case, I am unsure why we only require node_assginment and edge_assginment as injections instead of bijections.

I am not sure which node_assignment or edge_assignment you are mentioning. For those in MultiDiGraphPatternMatch, they are bidict which are actually bijections.

Bob-Chen222 commented 5 months ago

Oh, sorry for not providing enough context. I am referring to lib/substitutions/readme.md in which there is an injection between node from GraphPattern and PCG. But now I see why it is an injection. Thanks!

Screenshot 2024-02-12 at 6 20 22 PM
Bob-Chen222 commented 4 months ago

Hi Colin @lockshaw, I have updated four pieces of documentation for lib/substitutions. I tried to explain the function in a bigger picture, but I am unsure whether this is enough. I will modify them based on your reviews and continue adding documentation for the rest of the functions. At the same time, I am confused about AttributeConstraints. From my understanding, they are just a bunch of conditions attributes an operator needs to satisfy. However, I failed to understand its role in the library. Could you also explain a bit about that? Thanks!

lockshaw commented 4 months ago

Attribute constraints exist because just matching graph topology is not sufficient for what we need to do. For example, let's considering fusing two Dense layers into a single Dense layer: we need to (1) check that they're Dense layers, (2) that their weight dimensions line up properly, and (3) use that information to compute the resulting operator. Attribute constraints provide the mechanism for us to actually reason about what the nodes in our substitution are: without it the substitution library would just treat all nodes as identical black boxes.