Open cronburg opened 5 years ago
glrParsInc'
, including possible change to its type signature with what a Tokenizer
does)glrParseInc'
is a self-contained, small piece of code)To get started on this issue, check out the unit test in test/lr/GLRInc.hs
and run it with:
stack test antlr-haskell:lr
to see the error message:
ErrorNoAction: Current input = ' '
Current state = <2>
Rest of input = ' - foo'
Note that if you swap the Prim
and LowerID
lexeme declarations in GLRInc.hs
the test case should pass with the expected behavior (a successful parse), but that simple fix defeats the larger purpose of this issue.
The incremental GLR parsing algorithm doesn't take into account every possible alternative tokenization when looking for one that will result in a successful parse. Namely if two lexical rules can both match on the same terminal string, and both of them are feasible (based on a lookahead of just the one terminal symbol) then the incremental GLR parser currently makes the first one listed in the G4 grammar take precedence, never trying the second one and reporting a parse failure if the first one became infeasible due to subsequent attempted parsing.
Incremental GLR extension: either have the tokenizer return a (lazy) list of terminal tokens that successfully match on the current input (in order of precedence), and/or have the GLR parser recall the tokenizer with one fewer DFA name when
parseResults
does not contain any successful parses.