Proposal: panic-mode error recovery

shivansh commented 6 years ago

As far as I understand, gocc halts after reporting the first encountered error. Using panic-mode error recovery it will be able to report more than one error at a time, thus avoiding the need to do multiple compiles in case more than one error exists.

Implementation details

In case an invalid state is encountered, following steps are taken in order to recover -

The stack is scanned until a state s with goto on a particular nonterminal A is found.
Zero or more input symbols are discarded until a symbol is found which belongs to follow(A).
In case there is more than one choice of nonterminal A, one of them is chosen randomly. This might lead to poor error reports with each recovery.
The parser pushes the state GOTO(s, A) on the stack and resumes normal parsing.
The overall number of recoveries can be capped to some specific value (8?), as the error reports after a while might not be useful at all.

This mode can be provided as an option, so that the user can choose whether to use the currently implemented mode (exit on error) or the proposed recovery mode.

In case the above is agreeable, I'm happy to prepare a PR which implements panic-mode error recovery.

awalterschulze commented 6 years ago

Very interesting :) I am a bit afraid to give a yes at this point. I would want to see how it works, with examples. But I am very intrigued.

I am a bit worried about the random element. I would prefer if some deterministic choice was made, that would make it at least somewhat possible for the user to reason about.

I don't know how much work it is, but maybe it is something that you would want to attempt in a fork and then show us how a proof of concept is working, with examples?

@mewmew what do you think?

shivansh commented 6 years ago

I would want to see how it works, with examples.

@awalterschulze Sure. I'll try things first and demonstrate if it works out.

I am a bit worried about the random element. I would prefer if some deterministic choice was made, that would make it at least somewhat possible for the user to reason about.

Makes sense. So instead of choosing randomly, we'll lookup the goto table for state s, choose the first nonterminal from the list of valid ones and proceed. This will make things deterministic.

I don't know how much work it is, but maybe it is something that you would want to attempt in a fork and then show us how a proof of concept is working, with examples?

Sure!

shivansh commented 6 years ago

Computing follow sets

For proceeding with the error recovery, computation of follow sets for all the nonterminals will be required. I don't think this has been implemented in gocc yet, so I guess I'll go ahead and implement it unless there is an existing alternative approach.

Also, I did come across FollowingSymbol.

shivansh commented 6 years ago

Update

This commit describes a very simple demo using the calc example, and also the final problem [1] [2] (details in the commit message). Hopefully I'll figure this out soon.

awalterschulze commented 6 years ago

How is this different from the error recovery mode that is already a feature of gocc? In that error recovery mode you need to provide hints in the bnf about where errors can occur, whereas this is more automatic. Or am I totally missing the point, because I haven't done these things recently enough :(

shivansh commented 6 years ago

In that error recovery mode you need to provide hints in the bnf about where errors can occur, whereas this is more automatic.

Yep, the difference in how the two modes behave is that the former (in gocc) requires hints in form of modifications to production rules, notifying which ones can recover. In panic mode, the recovery is automatic. Also, it is highly probable that the recovery will succeed in most of the cases. In terms of implementation the two modes vary significantly, the latter depending on data from GOTO table and follow sets.

Concerning the error recovery mode implemented in gocc, I think I've figured out why the b in the input a b ; d e f in error recovery example went missing (it was marked TBD in the docs). I'll discuss that in a separate issue/PR, just need to formalize my arguments a bit.

awalterschulze commented 6 years ago

Ooo if you could figure out why the b went missing, that would be super valuable :)

goccmack / gocc