MIT IEEE URTC 2024. GSET 2024. Repository for the "MBASED: Practical Simplifications of Mixed Boolean-Arithmetic Obfuscation". A Binary Ninja decompiler plugin taking ideas from compiler construction to simplify obfuscated boolean expressions.
After completing #9, we need to write the inverse operation which involves writing a dictionary decoder. The goal of this component is to take the simplified encoded boolean statement and decode it back to the original values in MLIL but simplified. An example of the decoding process is shown below.
A | !A -> if ([ebp_1 + 0x14].b == 0 || [ebp_1 + 0x14].b != 0) then 387 @ 0x8040da8 else 388 @ 0x8040d8b
This most likely can be done in linear time.
Action Items
Do all of the following inside utils/coding.py.
[x] Write a DictionaryDecoder class with a constructor which takes an argument for the internal mapping of the encoded values, and a method decode which takes in a string from an encoded string and returns another string which represents the decoded form of a MediumLevelILOperation.MLIL_IF.
[x] Inside your decode function, replace all instances of each of the encoded values with the mapped values in the mapping provided by the encoder. Note a tricky case as shown in the example above where you have to distribute the ! operator.
[x] Write some test cases for the above classes you wrote using the the builtin unittest Python module. This is a good practice for test-driven development (TDD). Create a file called tests/test_coding.py which contains these unit tests.
Resources
See the resources of #9. This process should be similar.
Reference the current implementation of the encoder as well as the AST. The input to this component will be calling str(ast) on the root node which will give a string representation of the tree (i.e. the encoded form).
Some notes on how to implement this involve using the raw form of the encoded string to get some more context (may need to hack on the encoder a bit to get more information to support decoding).
Most likely, this will be similar to https://github.com/bliutech/mba-deobfuscation/blob/main/utils/coding.py#L141. You can follow the code structure in a similar way where you decode some of the parts of the expression (you can split the encoded string returned from the AST on spaces to get a list of tokens or you could use the Lexer). From there you can encode each part and look up in the mapping how to replace that symbol.
One part that might be tricky is trying to simplify !A if A contains ==. You can do this by doing first rewriting everything to a list and then doing an additional pass over the decoded list to see if there is a token that is a ! followed by a token containing a == and combine them together with re.sub. You can then join the final output list.
After completing #9, we need to write the inverse operation which involves writing a dictionary decoder. The goal of this component is to take the simplified encoded boolean statement and decode it back to the original values in MLIL but simplified. An example of the decoding process is shown below.
This most likely can be done in linear time.
Action Items
Do all of the following inside
utils/coding.py
.DictionaryDecoder
class with a constructor which takes an argument for the internal mapping of the encoded values, and a methoddecode
which takes in a string from an encoded string and returns another string which represents the decoded form of aMediumLevelILOperation.MLIL_IF
.decode
function, replace all instances of each of the encoded values with the mapped values in the mapping provided by the encoder. Note a tricky case as shown in the example above where you have to distribute the!
operator.tests/test_coding.py
which contains these unit tests.Resources
See the resources of #9. This process should be similar.
Reference the current implementation of the encoder as well as the AST. The input to this component will be calling
str(ast)
on the root node which will give a string representation of the tree (i.e. the encoded form).Some notes on how to implement this involve using the
raw
form of the encoded string to get some more context (may need to hack on the encoder a bit to get more information to support decoding).Most likely, this will be similar to https://github.com/bliutech/mba-deobfuscation/blob/main/utils/coding.py#L141. You can follow the code structure in a similar way where you decode some of the parts of the expression (you can split the encoded string returned from the AST on spaces to get a list of tokens or you could use the
Lexer
). From there you can encode each part and look up in the mapping how to replace that symbol.One part that might be tricky is trying to simplify
!A
ifA
contains==
. You can do this by doing first rewriting everything to a list and then doing an additional pass over the decoded list to see if there is a token that is a!
followed by a token containing a==
and combine them together withre.sub
. You can thenjoin
the final output list.