erikrose / parsimonious

The fastest pure-Python PEG parser I can muster
MIT License
1.8k stars 126 forks source link

Can't get full parse tree without consolidations #180

Open JoelMahon opened 3 years ago

JoelMahon commented 3 years ago

I'd like to parse with a flag (or something of that nature) that results in a FULL parse tree with no consolidations.

By consolidation I mean what occurs in this example:

default_rule = foo
foo = bar
bar = "fizz"

If you parse the string "fizz" with a grammar formed from this PEG your node tree will not contain a single foo or default_rule node as far as I can tell.

The output is this (if I got it right):

<Node matching "fizz">
    <Node called "bar" matching "fizz">

There are also possibly more nodes being missed but I'm less desperate to access them (but I think there should be a flag for them too, either a separate one or included as part of the previously mentioned flag).

foo could have important semantic meaning that is lost, or a visit_foo and this will mean it won't get called (this is the case for my program where I want to highlight all foos with a certain colour but not bars except indirectly when in foos).

I attempted to find where the code does this consolidation but the closest I could find was Node_Visitor.lift_child but overriding that seemed to have no effect and I couldn't see it being used anywhere.


A work around is this:

default_rule = foo ""
foo = bar ""
bar = "fizz"

Parsing fizz we get:

<Node matching "fizz">
    <Node called "default_rule" matching "fizz">
        <Node called "foo" matching "fizz">
            <Node called "bar" matching "fizz">
            <Node matching "">
        <Node matching "">
        <Node matching "">

I get the nodes I want, but unfortunately get some useless ones as well.

createyourpersonalaccount commented 3 years ago

The docstring of Grammar mentions:

https://github.com/erikrose/parsimonious/blob/3da7e804c07d4e495873be208701b5c955247c58/parsimonious/grammar.py#L44-L46

I can't spot the exact place where that optimization takes place either. As noted in the docstring,

https://github.com/erikrose/parsimonious/blob/3da7e804c07d4e495873be208701b5c955247c58/parsimonious/grammar.py#L38-L40

which means you can write your own parser and solve this issue. However, there's a little hack to get exactly what you want with no work:

>>> g = Grammar(
... r"""
... foo = bar / tag_this
... bar = "fizz"
... tag_this = !"" ""   # Never matches, useful for ensuring rule shows up in tree
... """
... )
>>> print(g.parse("fizz"))
<Node called "foo" matching "fizz">
    <Node called "bar" matching "fizz">