MIT IEEE URTC 2024. GSET 2024. Repository for the "MBASED: Practical Simplifications of Mixed Boolean-Arithmetic Obfuscation". A Binary Ninja decompiler plugin taking ideas from compiler construction to simplify obfuscated boolean expressions.
From some basic experiments, it seems that sympy seems to struggle with recognizing simplifications for XOR. For example, see the following identity from Obfuscator LLVM's instruction substitution.
a = a ^ b => a = (~a & b) | (a & ~b)
Here is a POC script using sympy.
from sympy import *
a, b = Symbol("a"), Symbol("b")
expr = (~a & b) | (a & ~b)
print(simplify_logic(expr))
# (a & ~b) | (b & ~a)
Since identities for XOR are well known (all possible permutations of the above), as a final optimization pass after the sympy_pass, it would be ideal to write a pass which recognizes identities for XOR and simplifies this to make up for gaps in `sympy's simplification.
Note: The reason why simplify_logic struggles with this is because of the strategy it uses to minimize boolean expressions. It performs simplifications to both sum of products (SOP) and product of sums (POS) and tries to find the simplification which has the least number of operations and returns that. Because SOP and POS do not use XOR, this is a simplification it easily misses (Source).
From some basic experiments, it seems that
sympy
seems to struggle with recognizing simplifications forXOR
. For example, see the following identity from Obfuscator LLVM's instruction substitution.Here is a POC script using
sympy
.Since identities for XOR are well known (all possible permutations of the above), as a final optimization pass after the
sympy_pass
, it would be ideal to write a pass which recognizes identities for XOR and simplifies this to make up for gaps in `sympy's simplification.Note: The reason why
simplify_logic
struggles with this is because of the strategy it uses to minimize boolean expressions. It performs simplifications to both sum of products (SOP) and product of sums (POS) and tries to find the simplification which has the least number of operations and returns that. Because SOP and POS do not use XOR, this is a simplification it easily misses (Source).