Closed david-wahlstedt closed 5 months ago
I noticed that the
a{,7}
variant is taken as a literal string and just matches itself. But according to the man page, it should match from zero to seven a's.
versions before 10.43 treated this as a literal, it was changed then to match perl (see #298)
the documentation includes pcre2pattern and pcre2syntax as guides but ultimately the definition of what is to be expected (specially on the edges) comes from Perl
I noticed that the
a{,7}
variant is taken as a literal string and just matches itself. But according to the man page, it should match from zero to seven a's.versions before 10.43 treated this as a literal, it was changed then to match perl (see #298)
the documentation includes pcre2pattern and pcre2syntax as guides but ultimately the definition of what is to be expected (specially on the edges) comes from Perl
Thanks! Ok , I see. What I find most difficult now is to tell what should be literal and what should be errors. David
I have tried various examples of patterns containing braces and quantifiers, trying to figure out what is legal and not. I noticed that the
a{,7}
variant is taken as a literal string and just matches itself. But according to the man page, it should match from zero to seven a's.Here are some examples I tried, and how pcre2test behaves with them (PCRE2 version 10.39 2021-10-29, Linux x64, Ubuntu 22.04):
Should fail (and does):
{|{5}
Should pass (no syntax error, but I can't match anything on it):
{\|{5}
Shouldn't this match literally, is it a bug?Now my question is, on top of the fact that the
a{,7}
should be supported according to the man page, considering various patterns with braces, sometimes when parsing them as quantifiers fails, they are treated as a literal string, and sometimes they give an error:I am trying to find an accurate description of the syntax, but I haven't found it. The reason is that I am trying to write a parser for (a decent fragment of) PCRE2 in Haskell, and I want to base the parser on a grammar description, preferrably some form of BNF. I have found one grammar for ANTLRv4, namely https://github.com/bkiers/pcre-parser It is an approximation and work in progress, but it supports most of it.
The
pcre2test
should be the reference I guess, since it is the implementation. But a formal grammar would be nice, and I am willing to contribute to such, in one way or another, if it is of interest.Best regards, David