New Analysis - Githubissues

skeqiqevian commented 1 year ago

Someone else should elaborate on this... for now I'll just tag a bunch of relevant issues that might affect decision making: #102, #127, #174, #287, #294, #314, #377, #390

yamaguchi1024 commented 12 months ago

Below is the discussion we (I mean, 100% written by @gilbo) had, which previously existed as a comment in dataflow.py. Putting it here for a better reference in the future.

Big Question: How do we represent the result of dataflow analysis?

Option 1: Take LoopIR and transform into a similar but different IR which includes dataflow annotations and "desugars"/eliminates constructs we don't care about for analysis purposes (e.g. just inline all windowing)
Option 2: Leave the input LoopIR as is, and create auxiliary data structures which allows us to "lookup" dataflow results for different variables at different points in the code. For instance, use Python dictionaries to hold the annotations

We are taking Option1.

New compilation flow is as follows:

pyAST  -->  UAST
        --[type checking, inference, bounds checking]-->  [LoopIR]
        --[backend checks]-->  C-code as strings

[LoopIR]  --[Primitive Transformation]-->  [LoopIR]
 As part of Primitive Transformation:
      [LoopIR]  --[Dataflow analysis]-->  "AST annotated w/dataflow results"
                       --[modified new_eff_analysis]-->  AExpr/Query (Discharged to SMT)

Below are Gilbert's notes to explain Abstract Interpretation to me.

What is Lattice?

A Lattice is a set X equipped with

a partial order < relation and
two binary operators, join (^) and meet (v)
two constants, top and bottom i.e. a Lattice is a quadruple (X, <, ^, v)

Satisfying the following axioms:

join and meet are both commutative and associative
absorption: a v (a ^ b) = a, a ^ (a v b) = a

Consequences

idempotency: a v a = a, a ^ a = a
meet is the greatest lower bound of the two elements (aka, min)
join is the least upper bound of the two elements (aka, max)
note meet and join are not well defined in arbitrary partial orders

Examples:

powerset of a set forms a lattice
logical formulas form a lattice (meet is and, join is or)

Def. Lattice Homomorphism:

A function f : X -> Y from one lattice to another is a homomorphism if
- f(a v b) = f(a) v f(b)
- f(a ^ b) = f(a) ^ f(b)
- Every lattice homomorphism is monotonic: i.e. if a <= b, then f(a) <= f(b)

The Abstract Interpretation Algorithm

Propagate abstract values through statements in any order. This will compute a fixed-point assignment of variables to abstract values at every program point. This algorithm will terminate if the abstract lattice has a finite height.

P(V_1 x V_2 x ...) I have a concrete domain of values V. Define the "Concrete" lattice for V to be Powerset(V) (note every powerset is a lattice, with subset as partial order, union as join, and intersection as meet)

An abstraction of V (eqv. Powerset(V)) is a lattice A s.t. we have two lattice homomorphisms: abstraction (abs : P(V) -> A) concretization (conc : A -> P(V)) satisfying a property: namely that they are "adjoint" in the following sense abs(conc(x)) = x conc(abs(x)) <= x

A product lattice of lattices A and B is given as

set of values A x B
top = (top,top)
bot = (bot,bot)
(a0,b0) <= (a1,b1) iff. a0 <= a1 and b0 <= b1
(a0,b0) ^ (a1,b1) = (a0^a1, b0^b1)
(a0,b0) v (a1,b1) = (a0 v a1, b0 v b1)

Can we form a product abstraction? Well, we have some V which A abstracts and some W which B abstracts? No, we're actually interested in the case where A and B both abstract V. So, what we are actually interested in isn't a totally arbitrary product...

Product Abstraction (on a common concrete domain V) Given abstractions (A,absA,concA) and (B,absB,concB) both of V, Construct a product abstraction:

product lattice A x B
absAB(X) = ???
concAB(a,b) = concA(a) ^ concB(b)

conclusion 1: any (bot,_) or (_,bot) must be bot in A x B! i.e. we cannot simply use a product lattice conjecture: this occurs for any (singleton) set, not just bottom

Major Conclusion:

Trying to form product abstractions is a bad idea!

Btw, suppose I have P(X) the power set of X. Using "type theory" notation, I can also talk about the set of all functions X -> Bool, or all predicates on X (e.g. in Coq one would write X -> Prop) All of these are (ignoring stupid logical foundation issues) essentially the same. Why? Well, consider some S in P(X) S is a subset of X; define f_S(x) = "x in S". Similarly, using set comprehension and given f : X -> Bool, define S_f = { x | f(x) }

(one more thing for fun) X -> P(Y) ~== X -> Y -> Bool ~== X x Y -> Bool ~== P(X x Y) ~== Rel X Y

A program is a control-flow-graph, and btw, let's fission each basic block into invidividual SSA statements (including unique variable names, and phi nodes)

A program point is basically an edge between statements in this CFG In some sense, program points are the real "states" and the statements are transitions.

More precisely the state of my machine is (PC (i.e. program point), stack (i.e. variable environment mapping))

Abstract States (PC, stack but values from A instead of from V)

How do I abstract a statement (i.e. function) y = f(x) Well, first x is now a abstract value, so concretize it to a set of values Then, we know how to map each individual value with f This produces a set of values for y Then re-abstract this set In other words... _y = _f(_x) = abs({ f(x) | x in conc(_x) }) That's a definition; we need to work it out for any given choice of language (i.e. statements/functions f_i) and choice of abstract lattice (i.e. A, abs, conc)

How do we abstract multiple incoming edges to some program point? (i.e. how do we abstract phi nodes) Answer: phi is join. done.

Abs = V U {top, bot} -- only decision we've made Conc = P(V) abs : Conc -> Abs conc : Abs -> Conc s.t. abs(conc(x)) = x conc(abs(X)) >= X

What is the specific definition of conc? conc(v) = {v} for v in V conc(bot) = {} conc(top) = V

yamaguchi1024 commented 12 months ago

Also, we have a limited version of dataflow analysis in LoopIR_dataflow.py, which usage probably should be replaced by DataflowIR once it's done.

yamaguchi1024 commented 12 months ago

Discussion topic

I think we don't need InlinedCall as part of DataflowIR. If it is just for error message it should be part of srcinfo.
Is is possible to do a value dependent analysis with Abstract interpretation? In particular we want to analyze sth like:
```
x : R[n+1]
for i in seq(0, n):
x[i] = sin(i)
for i in seq(0, n):
x[i+1] = sin(i+1)
```
To fuse these two loops together, we need to know that value written to x[1] in the first loop and the second loop are exactly the same.

gilbo commented 11 months ago

@yamaguchi1024 These are more of your meeting notes and not really a description of a project plan for how to move forward on "new analysis" at a high level.

exo-lang / exo

New Analysis #486

What is Lattice?

The Abstract Interpretation Algorithm