dlang-community / Pegged

A Parsing Expression Grammar (PEG) module, using the D programming language.
534 stars 66 forks source link

Keeping a node even it has zero matches #344

Open iK4tsu opened 7 months ago

iK4tsu commented 7 months ago

Given the grammar

Example:

Begin < AList
AList < A*
A <- 'a'

It would be nice to keep AList even for zero matches, given that 0 is a valid match. I expect something like

Example[0,0][]
  +-Example.Begin[0,0][]
    +-Example.AList[0,0][]

It is possible to achieve something like this by using '^AList', however, this also implies that all rules not defined in the grammar are kept, which creates unwanted noise. Moreover, since there is no way to halt the persistence, i.e. all child nodes share the same constraint, it causes even more noise when more than zero elements match.

Either all zero or more constraints would need to be kept in the final tree, which is probably undesired given the breaking change, or a new symbol that enforces the tagged rule, and only the tagged rule, to always be kept, could be added. Unless there is an intended and already implemented way of achieving this.

This affects constraints that can match with zero elements, meaning that the zero or one constraint would work similarly.

Edit: I didn't test this grammar. I think the description is simple enough to understand the desired outcome.

veelo commented 7 months ago

I agree that there is either too little or a little too much.

/+dub.sdl:
dependency "pegged" version="~>0.4.9"
+/
import std;

void main()
{
    import pegged.grammar;

    mixin(grammar(`
    Example:
        Begin <^ AList
        AList <- A*
        A <- 'a'
    `));

    writeln(Example(""));
}

Output:

Example[0, 0][]
 +-Example.Begin[0, 0][]
    +-Example.AList[0, 0][]
       +-zeroOrMore!(Example.A)[0, 0][]

I haven't studied the docs or the code in detail, but I would expect the last line not to be there. Relevant docs: https://github.com/PhilippeSigaud/Pegged/wiki/Tree-Decimation.

The question is, how big of a deal is this? Does the added noise in the parse tree cause problems big enough to invest in a solution? Perhaps, instead of a fix to Pegged itself, it is viable to change the tree using a semantic action? https://github.com/PhilippeSigaud/Pegged/wiki/Semantic-Actions