Closed AzizCode92 closed 4 months ago
Sorry I didn't see this- Currently, whitespace and comments are handled in python_lex_wrapper.py
; in particular calc_modified_hint
(https://github.com/amazon-science/incremental-parsing/blob/e2b9eabfe4274b916e6cf2cf5081b76370d20a61/incremental_parsing/lex_earley/python_lex_wrapper.py#L148); this is because the exact rules of when various whitespace tokens are allowed end up being surprisingly language-dependent.
If you want to add whitespace to a custom language, you'd probably want to create a new implementation of AbstractLexer
that wraps an IncrementalLexer
in a similar manner to PythonLexWrapper
.
lexer_hint
and initialize
would need to add the appropriate whitespace tokens to the list of allowed tokens, and the other methods would need to turn the LexResultSuccess
into LexResultPartial
when a whitespace token is actually matched.
(For reference, the parser calls lexer_hint
with the set of scannable terminals given by the grammar; the lexer will then fail-fast if the in-progress symbol is not in that set. Because whitespace/comments are never part of the grammar itself, the wrapper needs to add these lexemes as allowed, and then handle the case when the lexer reads them)
Hi, First of all thank you very much for this great work. I have defined a very simple ebnf grammar and adapt the lark_to_context to it. Same way it is done for the calculator. Now when playing around with the interactive recognition script, I realised that if I put a space between the words of my grammar, the state become invalid. The state is Completable only if I put input text in a continuous form. My grammar is very simple and looks like this:
CLASSMyClassDEF -> Completable CLASS MyClass DEF -> Invalid
any help on how to do that whithout explicitly add WS between the words of my grammar? thank you
Edit: I have found this in the lark documentation and adapt the grammar defined here but it failed same way as the above example. https://lark-parser.readthedocs.io/en/latest/examples/indented_tree.html