firasdib / Regex101

This repository is currently only used for issue tracking for www.regex101.com
3.2k stars 198 forks source link

quantifier after \Q \E block not allowed, but in pcre2test it is #2293

Open david-wahlstedt opened 2 weeks ago

david-wahlstedt commented 2 weeks ago

Bug Description

A quoting environment \Q ... \E can be quantified, according to pcre2test, I just found. This is not documented in the man page, as far as I could tell, so it surprised me. Your tool (in PCRE2 mode) says it can't be quantified, and this seems to be in accordance with the man page. However, if pcre2test says othersiwe, isn't that the true reference?

Reproduction steps

Entering ^\Qa\E+$ in regex101's input field in PCRE2 mode results in the quantifier marked as an error and the following text is shown:

+ The preceding token is not quantifiable

However, it works fine with pcre2test. A single character inside the quoting can match:

echo -n '/^\Qa\E+$/debug@aa@' | tr -s "@" "\n"|pcre2test
PCRE2 version 10.39 2021-10-29
/^\Qa\E+$/debug
------------------------------------------------------------------
  0   7 Bra
  3     ^
  4     a++
  6     $
  7   7 Ket
 10     End
------------------------------------------------------------------
Capture group count = 0
Compile options: <none>
Overall options: anchored
First code unit = 'a'
Subject length lower bound = 1
aa
 0: aa

But If I have more than one charatcer in the class, I cant make anything match, but the tool still allows the expression as syntax correct. I can also quantify with {N,M}.

A also tried aa in the subject, and it matched as well, but aaa did not match. Here is another weird result:

echo -n '/^\Qaaa\E+$/debug@aaaa@' | tr -s "@" "\n"|pcre2test
PCRE2 version 10.39 2021-10-29
/^\Qaaa\E+$/debug
------------------------------------------------------------------
  0  11 Bra
  3     ^
  4     aa
  8     a++
 10     $
 11  11 Ket
 14     End
------------------------------------------------------------------
Capture group count = 0
Compile options: <none>
Overall options: anchored
First code unit = 'a'
Subject length lower bound = 3
aaaa
 0: aaaa

Shorter aa sequences didn't match.

Expected Outcome

I expect regex101 to give the same result as pcre2test, when in PCRE2 mode, but maybe this should be seen as a flaw in pcre2test: I don't know.

Best regards, David

Browser

Include browser name and version google-chrome Version 124.0.6367.201 (Official Build) (64-bit)

OS

Ubuntu 22.04