bliutech / mbased

MIT IEEE URTC 2024. GSET 2024. Repository for the "MBASED: Practical Simplifications of Mixed Boolean-Arithmetic Obfuscation". A Binary Ninja decompiler plugin taking ideas from compiler construction to simplify obfuscated boolean expressions.
https://github.com/bliutech/mbased/blob/main/.github/paper.pdf
MIT License
6 stars 0 forks source link

Utils: Write a Dictionary Decoder #16

Closed bliutech closed 2 months ago

bliutech commented 3 months ago

After completing #9, we need to write the inverse operation which involves writing a dictionary decoder. The goal of this component is to take the simplified encoded boolean statement and decode it back to the original values in MLIL but simplified. An example of the decoding process is shown below.

A | !A -> if ([ebp_1 + 0x14].b == 0 || [ebp_1 + 0x14].b != 0) then 387 @ 0x8040da8 else 388 @ 0x8040d8b

This most likely can be done in linear time.

Action Items

Do all of the following inside utils/coding.py.

Resources

See the resources of #9. This process should be similar.

Reference the current implementation of the encoder as well as the AST. The input to this component will be calling str(ast) on the root node which will give a string representation of the tree (i.e. the encoded form).

Some notes on how to implement this involve using the raw form of the encoded string to get some more context (may need to hack on the encoder a bit to get more information to support decoding).

Most likely, this will be similar to https://github.com/bliutech/mba-deobfuscation/blob/main/utils/coding.py#L141. You can follow the code structure in a similar way where you decode some of the parts of the expression (you can split the encoded string returned from the AST on spaces to get a list of tokens or you could use the Lexer). From there you can encode each part and look up in the mapping how to replace that symbol.

One part that might be tricky is trying to simplify !A if A contains ==. You can do this by doing first rewriting everything to a list and then doing an additional pass over the decoded list to see if there is a token that is a ! followed by a token containing a == and combine them together with re.sub. You can then join the final output list.