erikrose / parsimonious

The fastest pure-Python PEG parser I can muster
MIT License
1.79k stars 126 forks source link

Raise an exception from the grammar? #222

Open rowlesmr opened 1 year ago

rowlesmr commented 1 year ago

Hi all

Is there an ability to raise an exception straight from the grammar?

grammar = 
"""
datablockheading  = DATA  blockframecode
DATA = "data_"
blockframecode = nonblankchar+ / RAISE_ERROR
nonblankchar = ~"[A-Za-z0-9]"
"""

If not, what is the best sort of way to accomplish the same behaviour? I'm coming from PEGTL.

erikrose commented 1 year ago

I never got around to adding fine-grained error reporting to Parsimonious, but the design in my head involved annotated PEG-style cuts. Semantically, they might have been similar to what you suggest here, if I guess the behavior right.

In the meantime, you could define a visitor method called visit_RAISE_ERROR (in this case) and raise an exception from there.

lucaswiman commented 1 year ago

In the meantime, you could define a visitor method called visit_RAISE_ERROR (in this case) and raise an exception from there.

I don't think that would work since you'd end up with failed parsing where it doesn't consume all the input. E.g. you could define RAISE_ERROR to either consume zero characters or the rest of the string, neither of which would work for some grammars:

from parsimonious import *
g = Grammar(r"""
    parenthesized = "(" addition_expr ")"
    addition_expr = (number "+" number) / RAISE_ERROR
    number = ~"\d+"
    RAISE_ERROR = ~".+"m
""")
g.parse("(...)")  # parsimonious.exceptions.ParseError: Rule 'number' didn't match at '...)' (line 1, column 2).

There the RAISE_ERROR node doesn't match because it greedily consumes the ) at the end.

It's a bit clunky, but one option would be to define your own custom expression type that just raises an error:

from parsimonious.expressions import Expression
class RAISE_ERROR(Expression):
    def _uncached_match(self, text, pos, cache, error):
        raise Exception(f"You messed up at {pos}, 🤦‍♂️.")

g = Grammar(r"""
    parenthesized = "(" addition_expr ")"
    addition_expr = (number "+" number) / RAISE_ERROR
    number = ~"\d+"
""", RAISE_ERROR=RAISE_ERROR())
g.parse("(...)")  # Exception: You messed up at 1, 🤦‍♂️.
erikrose commented 1 year ago

Good point. So yes, I think your workaround is the best bet for the moment.