dlang-community / Pegged

A Parsing Expression Grammar (PEG) module, using the D programming language.
534 stars 66 forks source link

Comments and Spacing #58

Closed novabyte closed 12 years ago

novabyte commented 12 years ago

I believe this is a silly question but at the moment, I'm just trying to understand what the best practices are for discarding comments across all rules. At the moment I have the following code and it appears to just pause during the CTFE pragma evaluation:

mixin(grammar(`
TEST:
    Spacing     <- (spacing / Comment)* 
    Comment     <- ';' (!eol .*) eol
`));

version (unittest) {
pragma(msg, TEST(`
    ; a comment
`));
}

I'm not sure what I'm doing wrong? :/

callumenator commented 12 years ago

I would change:

Comment     <- ';' (!eol .*) eol

to:

Comment     <- ';' (!eol .)*

But it is a little odd that for this input:

string input =  "; a comment";

this grammar crashes Pegged:

TEST:
    Spacing <- Comment
    Comment < ';' (!eol .)*

Whereas this one:

TEST:
    Comment < ';' (!eol .)*

matches the comment fine. This might be related to the hang-up you describe.

PhilippeSigaud commented 12 years ago

I was way, sorry for the late answer.

On Wed, Oct 3, 2012 at 6:34 PM, callumenator notifications@github.comwrote:

I would change:

Comment <- ';' (!eol .*) eol

to:

Comment <- ';' (!eol .)*

But it is a little odd that for this input:

string input = "; a comment";

this grammar crashes Pegged:

TEST: Spacing <- Comment Comment < ';' (!eol .)*

Whereas this one:

TEST: Comment < ';' (!eol .)*

matches the comment fine. This might be related to the hang-up you describe.

The Spacing rule is what it used by <: it does not add a space-consuming rule by itself. With

TEST
    Spacing <- Comment
    Comment < ';' (!eol.)*

Then Comment calls Spacing, which calls Comment. So we have a hidden left-recursion. Nice trap, there! I didn't thought of that.

Step 1 - I'll add in the docs 'do no use < as an arrow inside a comment sub-rule (more generally: do not call Spacing indirectly inside a Spacing subrule). Step 2 - I've a potential solution for left recursion that does not mean rewriting the grammar. I'll try to find the time to implement it.

callumenator commented 12 years ago

Then Comment calls Spacing, which calls Comment. So we have a hidden left-recursion. Nice trap, there! I didn't thought of that.

Ah sneaky. Could Pegged generate a compile-time warning when a user defines a rule with the same name as a Pegged built-in rule? You have defined some very useful built-in rules, it would be good if we could be warned when hijacking them. Sorry for hijacking this issue BTW

I guess the OP still needs explaining then. Is the built-in rule 'spacing' or 'Spacing'?

PhilippeSigaud commented 12 years ago

On Wed, Oct 3, 2012 at 6:17 PM, Chris Molozian notifications@github.comwrote:

I'm not sure what I'm doing wrong? :/

You did nothing wrong.It's just that spacing can match 0 space. So putting it into a ( )* means the entire expression can loop infinitely:

The fault is mine for having defined spacing to accept 0 space. That was to simplify space-munching rules. I just pushed a commit that makes your grammar function correctly, without destroying <'s functionnality at the same time.

I've updated the docs slightly too.

Going back to your example, I'd use:

mixin(grammar(`
TEST:
    Spacing     <- (spacing / Comment)*
    Comment     <- ';' ~(!eol .)* eol
`));

As was said in another post, there is a difference between (!eol .*) and (!eol .)*. The first one (your original version) will consume everything to the end, once it reaches .* (a dangerous pattern, that). There is no backtracking in PEGs, so the entire rule (!eol .*) eol will always fail: the eol at the end will always be presented an empty input, since .* consumed everything in sight.

PhilippeSigaud commented 12 years ago

On Thu, Oct 4, 2012 at 8:15 AM, callumenator notifications@github.comwrote:

Then Comment calls Spacing, which calls Comment. So we have a hidden

left-recursion. Nice trap, there! I didn't thought of that.

Btw, the solution here is to use a standard arrow.

Ah sneaky. Could Pegged generate a compile-time warning when a user defines a rule with the same name as a Pegged built-in rule? You have defined some very useful built-in rules, it would be good if we could be warned when hijacking them. Sorry for hijacking this issue BTW

This is not the same. In this case, it's hidden left-recursion. It does not hijack a predefined rule. And I think a user should be authorized to define its own rules with the same name as Pegged predefined ones.

If at one point, she really specifically wants to use a predefined rule, use a pegged.peg. prefix. Maybe I should create a pegged.predefined module?

novabyte commented 12 years ago

It's code like this that'll come in really handy on a Pegged Cheatsheet:

https://github.com/PhilippeSigaud/Pegged/pull/45

Thanks for the share Alex. I hope you don't mind, I took the liberty of adding you as a collaborator in case you were interested in contributing to the first version of the cheatsheet.

Unfortunately it's only a skeleton layout in LaTeX form at the moment because I'm traveling a lot right now and haven't had the chance to put the first draft together.

Any and all help is welcome =)

Alex Rønne Petersen wrote:

For what it's worth, here are my spacing/comment rules:

|Spacing< spacing / lineComment / blockComment

lineComment<: slash slash (&(eol / eoi) (eol / eoi) / .) blockComment<: slash "" (&("" slash) "" slash / .*) |

— Reply to this email directly or view it on GitHub https://github.com/PhilippeSigaud/Pegged/issues/58#issuecomment-9170851.