Closed JacobNickerson closed 1 week ago
The current lexer is a single function with a signature String -> ArrayList<String>
. We can think of it as an intelligent string splitter. Here are some pieces I'd like to add before merging into main:
'
or "
characters)
(
or )
characters)
$(xxx)
is grabbed as a single string.{
or }
characters)
${xxx}
is grabbed as a single string.Before I begin the work here, I may have to modify the signature of the lexer -- rather than returning an ArrayList<String>
, it'd be nice to return ArrayList<Token>
-- these tokens would likely be just a record like Token(String lexeme, TokenType type)
, just so we have more information once we parse the tokens into some kind of tree.
This change would likely be committed directly to main, since it's a breaking change that I'd like to ensure we all have ASAP.
The parser is a lot more robust, but there are certain basic design decisions that is going to make POSIX-compliancy near impossible (i.e., order of operations on string evaluation.) Read here for more information on this. I am going to begin writing tests that ensure we stick as close to the standard as possible (we may deviate later, but as a starting point, we should keep as compliant as possible.)
Closing issue for now, too broad a scope. Will reopen with a narrower scope and more details.
Implement a more robust parser than just splitting on white space