MIT IEEE URTC 2024. GSET 2024. Repository for the "MBASED: Practical Simplifications of Mixed Boolean-Arithmetic Obfuscation". A Binary Ninja decompiler plugin taking ideas from compiler construction to simplify obfuscated boolean expressions.
Once the LL(1) grammar design is finished in #10, the first part of the parser involves writing a lexer. Note, this can be done in parallel to the last part of #10 (i.e. while the classes for the AST are being written). The goal of our lexer is to read the lexems (individual characters) of our program and extract the tokens which will be used by the parser. For our purposes, we want our lexer to produced a list of strings which will represent tokens that will be passed to our parser. An example output of this component of the parser is shown below.
l = Lexer()
tokens : list[str] = l.lex(encoded_instr)
p = Parser()
ast = p.parse(tokens)
...
# rest of the program
An example output of Lexer.lex is shown below.
A & ! A -> ["A", "&", "!", "A"]
Action Items
Do the following inside parse/lex.py.
[x] write a class called Lexer which contains a constructor and a function called lex.
[x] write an implementation for lex which lexes the input program and produces a list of tokens for parsing. There are two approaches that can be taken to implement this. One approach is to write a define a automata for our parser which is shown in some of the resources linked below. A simpler approach is to write a regular expression and use the builtin re Python module. You may find https://regex101.com/ helpful when writing your regular expression.
[x] Write some test cases for the above class you wrote using the the builtin unittest Python module to test a few different programs that can be lexed by this program. This is a good practice for test-driven development (TDD). Create a new file called tests/test_lex.py which contains these unit tests.
Resources
UCLA CS 132 Notes. Notes from UCLA's compilers class which contain some information on designing LL(1) grammars and writing parsers. Chapters 2 & 3 are the only relevant ones for this project.
LL(1) Academy: A website used for learning about LL(1) grammars. It contains a tutorial for how to build a parse table and verify that a grammar is LL(1) along with some practice problems. Updated Link:http://ao.bliu.tech/
To be added. An example of writing an LL(1) parser in Python from Benson. Remind me to add this once I get a chance to finish it.
Once the LL(1) grammar design is finished in #10, the first part of the parser involves writing a lexer. Note, this can be done in parallel to the last part of #10 (i.e. while the classes for the AST are being written). The goal of our lexer is to read the lexems (individual characters) of our program and extract the tokens which will be used by the parser. For our purposes, we want our lexer to produced a list of strings which will represent tokens that will be passed to our parser. An example output of this component of the parser is shown below.
An example output of
Lexer.lex
is shown below.Action Items
Do the following inside
parse/lex.py
.Lexer
which contains a constructor and a function calledlex
.lex
which lexes the input program and produces a list of tokens for parsing. There are two approaches that can be taken to implement this. One approach is to write a define a automata for our parser which is shown in some of the resources linked below. A simpler approach is to write a regular expression and use the builtinre
Python module. You may find https://regex101.com/ helpful when writing your regular expression.unittest
Python module to test a few different programs that can be lexed by this program. This is a good practice for test-driven development (TDD). Create a new file calledtests/test_lex.py
which contains these unit tests.Resources