erikrose / parsimonious

The fastest pure-Python PEG parser I can muster
MIT License
1.82k stars 127 forks source link

Naughty-or operator for errors #24

Open keleshev opened 11 years ago

keleshev commented 11 years ago

I think language.js presents a very promising approach of error-handling.

mvcisback commented 8 years ago

@erikrose do you have a rough outline of code that might need to be changed to support this? E.g. low hanging fruit that might enable it? May have some time to look into it in the coming weeks.

mvcisback commented 8 years ago

Also, presumably this is the relevant paper: http://www.ialab.cs.tsukuba.ac.jp/%7Emizusima/publications/paste513-mizushima.pdf

erikrose commented 8 years ago

Also, presumably this is the relevant paper: http://www.ialab.cs.tsukuba.ac.jp/%7Emizusima/publications/paste513-mizushima.pdf

I think you mispasted; that's the "cuts" paper. Now, I'd love to have cuts too (and later autogenerated ones), so I'd welcome that work as well. :-)

https://github.com/pegjs/pegjs/issues/145 is a start for naughty-or reading; it outlines some pluses and minuses. I haven't done a lot of thinking yet about whether they're a flexible or practical error-handling approach, but feel free to do that thinking and put up a proposed syntax and behavior as a PR!

do you have a rough outline of code that might need to be changed to support this?

This is going to be vague because I don't have naughty ors loaded into my head, so take it as a bucket of possibilities, not a to-do list:

You'll probably end up either modeling errors as exceptions (if we want to catch only one), possibly caught in Expression.match_core or match, or you'll want to collect multiple errors for some reason, in which case you'll want some kind of accumulator, passed down the stack, a la error at https://github.com/erikrose/parsimonious/blob/0.6.2/parsimonious/expressions.py#L127 (which is intended to report parse failure positions but never worked very well and you should feel free to replace).

seggen-ibuildings commented 7 years ago

I think the cut operator and naughty-or are similar. There's also a paper on a throw operator: http://www.inf.puc-rio.br/~roberto/docs/sblp2013-1.pdf.

I built a PEG parser generator in PHP, and I added the cut operator for improved error reporting. The idea is: once you pass the cut operator, you switch from warning mode to error mode. If the parser fails, it will report the expectations that failed (e.g. expected ';') with the highest position (e.g. line 10, character 3). In practice, this can mean that you have a lot of options to choose from (e.g. expected '(' or '[' or ';' or...).

So here's the trick: if you have expectations that failed in error mode, these take precedence over expectations that failed in warning mode.

So naughty-or is good for error recovery. The cut operator is good for error reporting.

See also: https://github.com/scato/phpeg/blob/master/doc/error-reporting.md