bliutech / mbased

MIT IEEE URTC 2024. GSET 2024. Repository for the "MBASED: Practical Simplifications of Mixed Boolean-Arithmetic Obfuscation". A Binary Ninja decompiler plugin taking ideas from compiler construction to simplify obfuscated boolean expressions.
https://github.com/bliutech/mbased/blob/main/.github/paper.pdf
MIT License
6 stars 0 forks source link

Parser: Design a Context-Free Grammar for Boolean Expressions #10

Closed bliutech closed 3 months ago

bliutech commented 3 months ago

Even after encoding the MLIL instruction, we want to assign a more sophisticated data structure to the underlying expression we will be using. Particularly, using an abstract syntax tree (AST) is a good data structure to use when trying to define a formal language. Before we can even begin writing a representation of our AST, we need to define a formal specification for our language. This will be in the form of a context-free grammar which will represent all possible "programs" of boolean statements that can be represented in obfuscated programs. The goal of this component is to design a CFG to represent boolean expressions and write some classes which will represent nodes of the AST which a parser can build. One quick note is that there are many different classes of possible CFGs that can represent this language, however, we will be using a particular type of CFG known as an LL(1) grammar. Using an LL(1) grammar will make the process a lot easier when writing our parser. Remember that boolean expressions are slightly simplified because of our encoder but here are some sample boolean programs.

A | !A
A & (BC | !D)
(ZZ | !(D & !A))

Action Items

Write the following in parser/ast.py.

Resources