decomp / decomp

Components of a decompilation pipeline.
The Unlicense
459 stars 30 forks source link

type: Solve type equations using constraint programming #166

Open mewmew opened 9 years ago

mewmew commented 9 years ago

The type analysis will be required to solve type equations based on type constraints derived from the LLVM IR instructions. Identify a programming language with good support for constraint programming (e.g. Prolog, ...), and use this language when developing the type analysis component.

Separate concerns related to LLVM IR from the type analysis component, such that it may operate directly on type constraints specified in JSON format (or whichever encoding is well supported by the selected constraint programming language). Use Go, C++, Haskell or some other language with mature LLVM IR libraries to produce JSON files containing the type constraints. The type analysis component would output the solution (or partial solution) to the type equations in JSON format.

Type Analysis Example

  1. Assign the universe type set (any type) to all variables. T ⊂ U
  2. Recursively subset each type set based on the given type constraints; e.g.
    1. The variable foo is only used in integer addition statements, thus its type set is a subset of the set of signed and unsigned integers. T_foo ⊂ (Int ∪ Uint)
    2. The variable bar is assigned to the value of foo without truncation or extension, thus the type of bar is identical to that of foo. T_bar = T_foo
    3. The statements which reference bar never operate on values smaller than 0 or larger than 255, thus bar is known to fit inside an unsigned 8-bit integer. T_bar ⊂ Uint8
    4. ...
    5. profit
mewmew commented 8 years ago

Look into using Gecode for this.

mewmew commented 5 years ago

This is further described in the A meta-study of type recovery methods used during binary lifting paper of decomp/doc.