Open skeqiqevian opened 1 year ago
Below is the discussion we (I mean, 100% written by @gilbo) had, which previously existed as a comment in dataflow.py
. Putting it here for a better reference in the future.
Big Question: How do we represent the result of dataflow analysis?
Option 1: Take LoopIR and transform into a similar but different IR which includes dataflow annotations and "desugars"/eliminates constructs we don't care about for analysis purposes (e.g. just inline all windowing)
Option 2: Leave the input LoopIR as is, and create auxiliary data structures which allows us to "lookup" dataflow results for different variables at different points in the code. For instance, use Python dictionaries to hold the annotations
We are taking Option1.
New compilation flow is as follows:
pyAST --> UAST
--[type checking, inference, bounds checking]--> [LoopIR]
--[backend checks]--> C-code as strings
[LoopIR] --[Primitive Transformation]--> [LoopIR]
As part of Primitive Transformation:
[LoopIR] --[Dataflow analysis]--> "AST annotated w/dataflow results"
--[modified new_eff_analysis]--> AExpr/Query (Discharged to SMT)
Below are Gilbert's notes to explain Abstract Interpretation to me.
A Lattice is a set X
equipped with
(X, <, ^, v)
Satisfying the following axioms:
Consequences
Examples:
Def. Lattice Homomorphism:
Propagate abstract values through statements in any order. This will compute a fixed-point assignment of variables to abstract values at every program point. This algorithm will terminate if the abstract lattice has a finite height.
P(V_1 x V_2 x ...)
I have a concrete domain of values V.
Define the "Concrete" lattice for V to be Powerset(V)
(note every powerset is a lattice, with subset as partial order,
union as join, and intersection as meet)
An abstraction of V (eqv. Powerset(V)) is a lattice A s.t. we have
two lattice homomorphisms:
abstraction (abs : P(V) -> A)
concretization (conc : A -> P(V))
satisfying a property: namely that they are "adjoint" in the following sense
abs(conc(x)) = x
conc(abs(x)) <= x
A product lattice of lattices A and B is given as
top = (top,top)
bot = (bot,bot)
(a0,b0) <= (a1,b1) iff. a0 <= a1 and b0 <= b1
(a0,b0) ^ (a1,b1) = (a0^a1, b0^b1)
(a0,b0) v (a1,b1) = (a0 v a1, b0 v b1)
Can we form a product abstraction? Well, we have some V which A abstracts and some W which B abstracts? No, we're actually interested in the case where A and B both abstract V. So, what we are actually interested in isn't a totally arbitrary product...
Product Abstraction (on a common concrete domain V) Given abstractions (A,absA,concA) and (B,absB,concB) both of V, Construct a product abstraction:
absAB(X) = ???
concAB(a,b) = concA(a) ^ concB(b)
conclusion 1: any (bot,_)
or (_,bot)
must be bot in A x B!
i.e. we cannot simply use a product lattice
conjecture: this occurs for any (singleton) set, not just bottom
Major Conclusion:
Btw, suppose I have P(X) the power set of X.
Using "type theory" notation, I can also talk about the set of
all functions X -> Bool, or all predicates on X
(e.g. in Coq one would write X -> Prop)
All of these are (ignoring stupid logical foundation issues)
essentially the same. Why? Well, consider some S in P(X)
S is a subset of X; define f_S(x) = "x in S"
.
Similarly, using set comprehension and given f : X -> Bool, define S_f = { x | f(x) }
(one more thing for fun) X -> P(Y) ~== X -> Y -> Bool ~== X x Y -> Bool ~== P(X x Y) ~== Rel X Y
A program is a control-flow-graph, and btw, let's fission each basic block into invidividual SSA statements (including unique variable names, and phi nodes)
A program point is basically an edge between statements in this CFG In some sense, program points are the real "states" and the statements are transitions.
More precisely the state of my machine is (PC (i.e. program point), stack (i.e. variable environment mapping))
Abstract States (PC, stack but values from A instead of from V)
How do I abstract a statement (i.e. function) y = f(x)
Well, first x is now a abstract value, so concretize it to a set of values
Then, we know how to map each individual value with f
This produces a set of values for y
Then re-abstract this set
In other words...
_y = _f(_x) = abs({ f(x) | x in conc(_x) })
That's a definition; we need to work it out for any given
choice of language (i.e. statements/functions f_i) and
choice of abstract lattice (i.e. A, abs, conc)
How do we abstract multiple incoming edges to some program point? (i.e. how do we abstract phi nodes) Answer: phi is join. done.
Abs = V U {top, bot} -- only decision we've made Conc = P(V) abs : Conc -> Abs conc : Abs -> Conc s.t. abs(conc(x)) = x conc(abs(X)) >= X
What is the specific definition of conc
?
conc(v) = {v} for v in V
conc(bot) = {}
conc(top) = V
Also, we have a limited version of dataflow analysis in LoopIR_dataflow.py
, which usage probably should be replaced by DataflowIR once it's done.
Discussion topic
InlinedCall
as part of DataflowIR. If it is just for error message it should be part of srcinfo
.x : R[n+1]
for i in seq(0, n):
x[i] = sin(i)
for i in seq(0, n):
x[i+1] = sin(i+1)
To fuse these two loops together, we need to know that value written to x[1]
in the first loop and the second loop are exactly the same.
@yamaguchi1024 These are more of your meeting notes and not really a description of a project plan for how to move forward on "new analysis" at a high level.
Someone else should elaborate on this... for now I'll just tag a bunch of relevant issues that might affect decision making: #102, #127, #174, #287, #294, #314, #377, #390