Open bluetech opened 1 year ago
That sounds really great! I wish we have it for our parser.
Will the return type be (ast, Vec<ParseError>)
in that case?
Ruff is interested in having error recovery too but I believe that this will also require changes to the AST because errors should be represented in the tree in some way or another.
A possible approach is to introduce a BogusExpression
node that the parser uses in places where it can't make sense of the syntax (we may want to have more than that). Another issue is that it could be interesting to synthesize nodes. Let's say we have a +
where the right hand side is missing. The parser could synthesize a non-existing name expression with an empty name (it would need to mark the node as synthesized).
TypeScript uses synthesized nodes. I don't remember if they have BogusExpression
s or if they simply try to force the syntax to somehow fit into an existing node type. I do know, that some nodes have properties that only exist for error recovery. Meaning, it can never be a valid tree if the node has one of those child-attributes initialized.
@youknowone
Will the return type be (ast, Vec
) in that case?
I think that makes sense, yes. The lalrpop doc seems to suggest this.
I also saw that the person who added the error recovery support to lalrpop implemented a language in rust called gluon, and its grammar uses error recovery so we can have a real example: https://github.com/gluon-lang/gluon/blob/be67982154319dea5e5486dfd1c46f8ca587c82e/parser/src/grammar.lalrpop gluon returns a vec of parser errors (wrapped in some structs and stuff).
@MichaReiser
Ruff is interested in having error recovery too but I believe that this will also require changes to the AST because errors should be represented in the tree in some way or another.
The errors will definitely need to be represented in the AST; just skipping over them (even if returning errors separately) is not an option IMO.
As for BogusExpression
vs. synthesized nodes, synthesized nodes are more useful, but will need to see how complex it gets.
BTW, I forgot to mention the lexer, but as matklad says, "Lexer itself should be resilient, but that’s easy — produce an Error token for anything which isn’t a valid token.". This can be the first step, though it's mostly orthogonal to parser error recovery.
Currently, all of the parse functions return
Result<ast, ParseError>
. This means that if the file has a syntax error, we get the error and nothing else. However, there are some use cases in which having access to parts which did parse successfully would be beneficial:On the other hand, interpreters usually don't have too much use for this AFAIK.
lalrpop seems to have some support for error recovery: https://lalrpop.github.io/lalrpop/tutorial/008_error_recovery.html Alex Kladov has written about error recovery, most recently here: https://matklad.github.io/2023/05/21/resilient-ll-parsing-tutorial.html
Is RustPython/Parser interested in error recovery or is it out of scope?