Parser: Write a Lexer - Githubissues

Once the LL(1) grammar design is finished in #10, the first part of the parser involves writing a lexer. Note, this can be done in parallel to the last part of #10 (i.e. while the classes for the AST are being written). The goal of our lexer is to read the lexems (individual characters) of our program and extract the tokens which will be used by the parser. For our purposes, we want our lexer to produced a list of strings which will represent tokens that will be passed to our parser. An example output of this component of the parser is shown below.

l = Lexer()
tokens : list[str] = l.lex(encoded_instr)

p = Parser()
ast = p.parse(tokens)
...
# rest of the program

An example output of Lexer.lex is shown below.

A & ! A -> ["A", "&", "!", "A"]

Action Items

Do the following inside parse/lex.py.

[x] write a class called Lexer which contains a constructor and a function called lex.
[x] write an implementation for lex which lexes the input program and produces a list of tokens for parsing. There are two approaches that can be taken to implement this. One approach is to write a define a automata for our parser which is shown in some of the resources linked below. A simpler approach is to write a regular expression and use the builtin re Python module. You may find https://regex101.com/ helpful when writing your regular expression.
[x] Write some test cases for the above class you wrote using the the builtin unittest Python module to test a few different programs that can be lexed by this program. This is a good practice for test-driven development (TDD). Create a new file called tests/test_lex.py which contains these unit tests.

Resources

UCLA CS 132 Notes. Notes from UCLA's compilers class which contain some information on designing LL(1) grammars and writing parsers. Chapters 2 & 3 are the only relevant ones for this project.
LL(1) Academy: A website used for learning about LL(1) grammars. It contains a tutorial for how to build a parse table and verify that a grammar is LL(1) along with some practice problems. Updated Link: http://ao.bliu.tech/
To be added. An example of writing an LL(1) parser in Python from Benson. Remind me to add this once I get a chance to finish it.

bliutech / mbased

Parser: Write a Lexer #11

Action Items

Resources