bliutech / mbased

MIT IEEE URTC 2024. GSET 2024. Repository for the "MBASED: Practical Simplifications of Mixed Boolean-Arithmetic Obfuscation". A Binary Ninja decompiler plugin taking ideas from compiler construction to simplify obfuscated boolean expressions.
https://github.com/bliutech/mbased/blob/main/.github/paper.pdf
MIT License
6 stars 0 forks source link

XOR Simplification Pass #39

Closed bliutech closed 2 months ago

bliutech commented 2 months ago

From some basic experiments, it seems that sympy seems to struggle with recognizing simplifications for XOR. For example, see the following identity from Obfuscator LLVM's instruction substitution.

a = a ^ b => a = (~a & b) | (a & ~b)

Here is a POC script using sympy.

from sympy import *

a, b = Symbol("a"), Symbol("b")
expr = (~a & b) | (a & ~b)

print(simplify_logic(expr))
# (a & ~b) | (b & ~a)

Since identities for XOR are well known (all possible permutations of the above), as a final optimization pass after the sympy_pass, it would be ideal to write a pass which recognizes identities for XOR and simplifies this to make up for gaps in `sympy's simplification.

Note: The reason why simplify_logic struggles with this is because of the strategy it uses to minimize boolean expressions. It performs simplifications to both sum of products (SOP) and product of sums (POS) and tries to find the simplification which has the least number of operations and returns that. Because SOP and POS do not use XOR, this is a simplification it easily misses (Source).

bliutech commented 2 months ago

Closing for now. Future work.