bliutech / mbased

MIT IEEE URTC 2024. GSET 2024. Repository for the "MBASED: Practical Simplifications of Mixed Boolean-Arithmetic Obfuscation". A Binary Ninja decompiler plugin taking ideas from compiler construction to simplify obfuscated boolean expressions.
https://github.com/bliutech/mbased/blob/main/.github/paper.pdf
MIT License
6 stars 0 forks source link

Utils: Write a Dictionary Encoder #9

Closed bliutech closed 3 months ago

bliutech commented 3 months ago

After lifting our binary to MLIL instructions, before we can pass it to our parser, we need to do some preprocessing. Instructions produced from MLIL can contain many tokens which may make parsing difficult and are not needed for analysis. The goal of the encoder is to write a dictionary coder which takes a MediumLevelILOperation.MLIL_IF instruction and preprocesses it by encoding it to a representation that can be more easily parsed by our parser into an AST. An example encoding is shown below.

if ([ebp_1 + 0x14].b == 0 || [ebp_1 + 0x14].b != 0) then 387 @ 0x8040da8 else 388 @ 0x8040d8b -> A | !A

Notice a couple things about this example. One, each component of a boolean statement is encoded to a unique symbol (i.e. every generated symbol should correspond to an expression that can be evaluated to true or false). Two, duplicate expressions in the above example should reuse the same symbol. This is true not only for booleans within the same expression but also across multiple statements (i.e. some expressions may be used throughout multiple if statements within an obfuscated program). Three, encode the NOT expression outside of the statement. This is most likely one of the trickier parts of the encoder but the idea is to improve symbol reuse and make the encoded output more useful for the rest of our analysis.

Another quick note is to simplify our parser design, we will have all boolean symbols be one character. The supported operations are shown below.

Action Items

Do all of the following inside utils/coding.py.

Resources

I can add more resources here as necessary so just add some questions depending on what you might need some resources to tackle this.