erikrose / parsimonious

The fastest pure-Python PEG parser I can muster
MIT License
1.79k stars 126 forks source link

Struggling with what I thought should be a simple grammar #230

Open smontanaro opened 1 year ago

smontanaro commented 1 year ago

I'm trying to parse Gmail-like queries, stuff like

from:skip.montanaro@gmail.com OR subject:vintage bikes

Since I'm dealing entirely with strings and parsimonious doesn't seem to have a tokenizer, I'm happy for now to use something like this, so the character set of the operators is completely distinct from the character set of words:

from:skip.montanaro@gmail.com || subject:vintage bikes

Still, I'm not getting that to work. My grammar looks like this:

grammar = Grammar("""
    term = factor add*
    factor = string mult*
    mult = and string
    add = or factor
    and = "&&"
    or = "||"
    word = ~"[-a-z0-9:.@]+"i
    string = word (spc word)*
    spc = ~"\s*"
    """)

When trying to parse the second query string I get:

parsimonious.exceptions.IncompleteParseError: Rule 'term' matched in its entirety, but it didn't consume all the text. The non-matching portion of the text begins with ' || faliero masi' (line 1, column 13).

If I tack optional whitespace around the and/or operators to gobble up the space before the || operator, it parses:

grammar = Grammar("""
    term = factor add*
    factor = string mult*
    mult = and string
    add = or factor
    and = spc* "&&" spc*
    or = spc* "||" spc*
    word = ~"[-a-z0-9:.@]+"i
    string = word (spc word)*
    spc = ~"\s*"
    """)

but that seems like a crude hack. While I'm not totally averse to the idea of hacking my way to a solution, it still seems there should be a cleaner way to define the grammar. None of the parsimonious examples I found dealt with anything like this. Am I missing something?

erikrose commented 1 year ago

The typical practice with Parsimonious grammars is to add a whitespace term (e.g. spc) to the right side of every "leaf" node of a grammar. Take a look at what I did in the grammar that describes Parsimonious grammars themselves: https://github.com/erikrose/parsimonious/blob/d5636a6ae4d7fe2ddb96f567e289ab3eeb454b49/parsimonious/grammar.py#L220-L256.