dlang-community / Pegged

A Parsing Expression Grammar (PEG) module, using the D programming language.
534 stars 66 forks source link

Character range in braces #151

Closed robinwils closed 8 years ago

robinwils commented 9 years ago

Hi,

I noticed a strange behavior with '-' inside braces. The following rule will give an error: A <- [+-] Whereas this one won't: A <- [-+]

Is this intended ? Apparently, in the first example, it tries to recognize a character range like [0-9] but fails ? Shouldn't both rule behave the same ?

The rule A <- [+-*] has the same behavior but not the same error message. But this one looks more like an error.

PhilippeSigaud commented 9 years ago

This is standard behavior for regex, so Pegged adopted the same rule: - is authorized only at the beginning of a char range. IIRC, [ and ] are also treated a bit specially. So yes, [+-]is parsed as a range starting at + and... which has no defined end point. For alternates like this, I tend to use / directly: "=" / "+" / "-". IIRC, Pegged recognizes these and uses a special 'keyword' template to speed up their matching.

robinwils commented 9 years ago

Thanks for the answer ! I looked it up a bit by curiosity, what I got from IEEE Std 1003.1 was:

'-' When found anywhere but first (after an initial '^', if any) or last in a bracket expression, or as the ending range point in a range expression

[+-] [-+] should be equivalent according to this, but it's indeed risky to use '-' inside braces in this case.

For alternates like this, I tend to use / directly: "=" / "+" / "-". IIRC, Pegged recognizes these and uses a special 'keyword' template to speed up their matching.

Thanks for the tip ! I'll keep that in mind

PhilippeSigaud commented 9 years ago

Oh, I didn't know this standard. I'll read it, thanks for the info.

On Wed, Mar 18, 2015 at 6:57 PM, Robin WILS notifications@github.com wrote:

Thanks for the answer ! I looked it up a bit by curiosity, what I got from IEEE Std 1003.1 was:

'-' When found anywhere but first (after an initial '^', if any) or last in a bracket expression, or as the ending range point in a range expression

[+-] [-+] should be equivalent according to this, but it's indeed risky to use '-' inside braces in this case.

For alternates like this, I tend to use / directly: "=" / "+" / "-". IIRC, Pegged recognizes these and uses a special 'keyword' template to speed up their matching.

Thanks for the tip ! I'll keep that in mind

— Reply to this email directly or view it on GitHub https://github.com/PhilippeSigaud/Pegged/issues/151#issuecomment-83097167 .