bliutech / mbased

MIT IEEE URTC 2024. GSET 2024. Repository for the "MBASED: Practical Simplifications of Mixed Boolean-Arithmetic Obfuscation". A Binary Ninja decompiler plugin taking ideas from compiler construction to simplify obfuscated boolean expressions.
https://github.com/bliutech/mbased/blob/main/.github/paper.pdf
MIT License
6 stars 0 forks source link

Parser: Write a Lexer #11

Closed bliutech closed 3 months ago

bliutech commented 3 months ago

Once the LL(1) grammar design is finished in #10, the first part of the parser involves writing a lexer. Note, this can be done in parallel to the last part of #10 (i.e. while the classes for the AST are being written). The goal of our lexer is to read the lexems (individual characters) of our program and extract the tokens which will be used by the parser. For our purposes, we want our lexer to produced a list of strings which will represent tokens that will be passed to our parser. An example output of this component of the parser is shown below.

l = Lexer()
tokens : list[str] = l.lex(encoded_instr)

p = Parser()
ast = p.parse(tokens)
...
# rest of the program

An example output of Lexer.lex is shown below.

A & ! A -> ["A", "&", "!", "A"]

Action Items

Do the following inside parse/lex.py.

Resources