dlang-community / Pegged

A Parsing Expression Grammar (PEG) module, using the D programming language.
533 stars 66 forks source link

Implement lookahead operator #253

Closed WebFreak001 closed 6 years ago

WebFreak001 commented 6 years ago

Hi, it would be nice having a lookahead operator, so that I can for example match

VariableName < !?Keyword identifier

That example would match identifiers that are not keywords because it wouldn't try to continue when it succeeded to parse a Keyword (while keeping it in the string that is yet to parse)

Analogous for a positive lookahead:

FirstName < Name ?LastName

Which would match a Name only if it was followed by a LastName (while not returning the LastName nor advancing the string)

I tried implementing the negative lookahead with Keyword {discard} / identifier but that didn't compile and resulted in an internal compiler error.

veelo commented 6 years ago

I have sometimes wanted this too, but it is not straightforward to get the desired result:

Keyword <- "return"
VariableName < !?Keyword identifier

Here, VariableName would not be able to match valid identifiers such as returnValue. To properly prevent keywords from being matched as identifiers you'd have to use nested lookahead like

VariableName < !( ?( Keyword !?[a-zA-Z0-9_] ) ) identifier

or compare the whole match. I do the latter today using semantic actions:

VariableName <{not_a_keyword} identifier

given

PT not_a_keyword(PT)(PT p)
{
    import std.uni: sicmp; // I need case insensitive comparison.
    if (sicmp(p.input[p.begin .. p.end], "return") == 0)
        p.successful = false;
    return p;
}

So I'm not sure it would be worth the effort to add such an operator, if the objective can arguably be achieved easier using existing functionality.

veelo commented 6 years ago

Besides, it's already there: https://github.com/PhilippeSigaud/Pegged/wiki/PEG-Basics

FirstName < Name &LastName

:-)

WebFreak001 commented 6 years ago

Oh you are right, !keyword identifier actually worked well and positive lookahead is implemented too, closing this then