RustPython / Parser

MIT License
77 stars 29 forks source link

Error recovery/resilience #68

Open bluetech opened 1 year ago

bluetech commented 1 year ago

Currently, all of the parse functions return Result<ast, ParseError>. This means that if the file has a syntax error, we get the error and nothing else. However, there are some use cases in which having access to parts which did parse successfully would be beneficial:

On the other hand, interpreters usually don't have too much use for this AFAIK.

lalrpop seems to have some support for error recovery: https://lalrpop.github.io/lalrpop/tutorial/008_error_recovery.html Alex Kladov has written about error recovery, most recently here: https://matklad.github.io/2023/05/21/resilient-ll-parsing-tutorial.html

Is RustPython/Parser interested in error recovery or is it out of scope?

youknowone commented 1 year ago

That sounds really great! I wish we have it for our parser.

Will the return type be (ast, Vec<ParseError>) in that case?

MichaReiser commented 1 year ago

Ruff is interested in having error recovery too but I believe that this will also require changes to the AST because errors should be represented in the tree in some way or another.

A possible approach is to introduce a BogusExpression node that the parser uses in places where it can't make sense of the syntax (we may want to have more than that). Another issue is that it could be interesting to synthesize nodes. Let's say we have a + where the right hand side is missing. The parser could synthesize a non-existing name expression with an empty name (it would need to mark the node as synthesized).

TypeScript uses synthesized nodes. I don't remember if they have BogusExpressions or if they simply try to force the syntax to somehow fit into an existing node type. I do know, that some nodes have properties that only exist for error recovery. Meaning, it can never be a valid tree if the node has one of those child-attributes initialized.

bluetech commented 1 year ago

@youknowone

Will the return type be (ast, Vec) in that case?

I think that makes sense, yes. The lalrpop doc seems to suggest this.

I also saw that the person who added the error recovery support to lalrpop implemented a language in rust called gluon, and its grammar uses error recovery so we can have a real example: https://github.com/gluon-lang/gluon/blob/be67982154319dea5e5486dfd1c46f8ca587c82e/parser/src/grammar.lalrpop gluon returns a vec of parser errors (wrapped in some structs and stuff).

@MichaReiser

Ruff is interested in having error recovery too but I believe that this will also require changes to the AST because errors should be represented in the tree in some way or another.

The errors will definitely need to be represented in the AST; just skipping over them (even if returning errors separately) is not an option IMO.

As for BogusExpression vs. synthesized nodes, synthesized nodes are more useful, but will need to see how complex it gets.


BTW, I forgot to mention the lexer, but as matklad says, "Lexer itself should be resilient, but that’s easy — produce an Error token for anything which isn’t a valid token.". This can be the first step, though it's mostly orthogonal to parser error recovery.