dlang-community / Pegged

A Parsing Expression Grammar (PEG) module, using the D programming language.
534 stars 66 forks source link

Comment syntax #306

Open Bolpat opened 2 years ago

Bolpat commented 2 years ago

Pegged uses # as the start of its line comment which does not work well with D's q{} strings. This is not an issue of taste, the D compiler rejects # as a token. Literally anything else would be a better choice as it would at least compile. As a stupid question, why not also accept regular old D comments starting with //? Technically, // is valid as part of a rule, but I'd guess no one uses empty choices that way.

veelo commented 2 years ago

q{} strings make sense for D code. Does your grammar include snippets of D code or is there another reason to not use the simpler ` ` strings? What is the advantage of q{} over ` ` for a PEG?

I think // comments could be made to work, but I also think comments are easier to spot when they start with # with all the other / already there...

Bolpat commented 2 years ago

Please note that the main issue is that q{} strings are in conflict with PEG comments by the choice of # instead of virtually any other character. The question isn't why use q{} but why needlessly have features block each other. The only argument I could see is: Most other ASCII symbols are taken and the from the rest, it's the best choice. But there's almost no reason why it has to be a single character.

q{} strings make sense for D code. Does your grammar include snippets of D code or is there another reason to not use the simpler ` ` strings?

It does not, I disagree with the above assumption. q{} make sense for code that's reasonably similar to D code.

What is the advantage of q{} over ` ` for a PEG?

String literals (among other things) get highlighted in q{} and inside ` `, one can easily miss a closing quote. That's actually quite big.

I think // comments could be made to work, but I also think comments are easier to spot when they start with # with all the other / already there...

Comments are spotted easiest when displayed in a different color, don't you think? Also, / and // are easily distinguished. Wouldn't it suffice to replace pegged.peg.literal!("#") by pegged.peg.or!(pegged.peg.literal!("#"), pegged.peg.literal!("//")) here and here? D comments are already part of Pegged for semantic actions. Oh, and this answers the question above.

Bolpat commented 2 years ago

I'd even like to do the PR myself, but when it has near-zero probability of getting merged, I'll not waste my time.

veelo commented 2 years ago

There is no need to debate over whether # was a good choice for comments or not, the choice was made by Bryan Ford, the inventor of Parsing Expression Grammars (https://bford.info/pub/lang/peg.pdf, 2004). Pegged implements a parser generator for these grammars, so naturally # is what it accepts for comments.

In the wording of your reply I sense some kind of aggression, which I hope I am misinterpreting. I am just trying to get an understanding of the advantages, as I see it as my duty to weigh them against the disadvantages. If I understand you correctly, editors do a good enough job of syntax highlighting PEG grammars with your proposed changes. That'll be a nice improvement, although I am not 100% sure that there are no other sequences of characters that are valid in a PEG but invalid inside q{}. Please keep an eye out for that once you start playing with this.

But note that parser.d is a generated file and should not be edited manually. See pegged/dev. I now see that examples have moved to a new directory structure, so it is possible that instructions and/or build scripts need to be updated.

The first step I think should be to see how // is currently handled when it occurs in a grammar at various places (at the beginning of a line, at the end of a line, in the middle of a multi-line rule). Because if I look at Pegged's grammar I think it would be rejected. If that is the case everywhere, the extension can be added without breaking any existing grammars. If not, this requires a new major release.

Second step would be to change the grammar and regenerate the parser, and test the examples. See if it works well for you.

Third step is to make the PR, do the release and update the documentation.

Sounds good? Thanks for helping to improve Pegged!

Bolpat commented 2 years ago

There is no need to debate over whether # was a good choice for comments or not, the choice was made by Bryan Ford, the inventor of Parsing Expression Grammars (https://bford.info/pub/lang/peg.pdf, 2004). Pegged implements a parser generator for these grammars, so naturally # is what it accepts for comments.

To be honest, I didn't know that, but I suspected it.

In the wording of your reply I sense some kind of aggression, which I hope I am misinterpreting.

You're a little right, but mostly misinterpreting. Annoyed fits far better. I actually tried to be constructive in my state of mind.

I am just trying to get an understanding of the advantages, as I see it as my duty to weigh them against the disadvantages. If I understand you correctly, editors do a good enough job of syntax highlighting PEG grammars with your proposed changes. That'll be a nice improvement, although I am not 100% sure that there are no other sequences of characters that are valid in a PEG but invalid inside q{}. Please keep an eye out for that once you start playing with this.

I am 100% sure that there do exist sequences of characters that are valid in a PEG but invalid inside q{}: If I'm not mistaken, you can put unquoted braces in small choices [], like [{\[(] or [}\])]. Also, PEG allows multi-character strings in single quotes 'like this' which isn't a valid D token. As escape sequences in strings work almost the same in Pegged and in D, that's not a source of errors. The only difference I could find is that D allows HTML entities like '\"' for '"'. I'll think about that.

[...]

Sounds good? Thanks for helping to improve Pegged!

Does well sound good! Maybe all the negative emotion was there, because I really like Pegged. So, I'll play around the Pegged grammar a bit.

Bolpat commented 2 years ago

I closed this accidentally, sorry.

ArthaTi commented 2 years ago

Workaround: Put something after the hash. I do my comments with #:// and the compiler doesn't complain at all!