PCRE2Project / pcre2

PCRE2 development is now based here.
Other
922 stars 194 forks source link

Potential syntax incompatibility with other engines (ex: Javascript, GO, NET or POSIX) by flexible interpretation in curly brackets #298

Closed carenas closed 5 months ago

carenas commented 1 year ago

In addition to #297; the new syntax allowed since 0fa5367 will result in some expressions to be valid when they weren't previously.

When the affected escape syntax was not valid, it was previously interpreted as a literal, and that is also what the other engines are doing as well, hence resulting in different results going forward.

Not sure how relevant the differences are, and would argue it would had been more reasonable to originally return an error (as Rust, Java or even sligthly older Perl does for x{,3}), neither if there is a need for a fix, but it is at least worth reporting.

PhilipHazel commented 1 year ago

I always thought Perl's treatment of {,3} as a literal was an enormous gotcha, since it accepted {,3} as a qualifier. Now it treats {,3} as {0,3} which seems more reasonable. I think we should now stick with Perl compatibility but yes, add something to the docs. Anybody who cares about compatibility between engines would probably never write {,3} anyway -- they would make it clear as either {0,3} or {,3}.

carenas commented 1 year ago

I always thought Perl's treatment of {,3} as a literal was an enormous gotcha

FWIW, somewhat recent versions of perl (ex: 5.30) show an error instead:

% /usr/bin/perl -e 'shift =~ /x{,3}/' 'x'
Unescaped left brace in regex is illegal here in regex; marked by <-- HERE in m/x{ <-- HERE ,3}/ at -e line 1.

Anybody who cares about compatibility between engines would probably never write {,3} anyway

At least Python, Ruby and GNU RegEx seem to interpret that qualifier just like recent Perl, so I agree.

Still the fact that this is possible AND seems to match the behaviour in other engines is worth recognizing on its own:

$ perl -e 'print "$]: $1\n" if shift =~ /(x{0, 3})/' 'x{0, 3}'
5.020002: x{0, 3}
% perl -e 'print "$]: $1\n" if shift =~ /(x{0, 3})/' 'x{0, 3}'
5.036001: x
PhilipHazel commented 1 year ago

I have added a few words to the doc, pointing out that other engines may behave differently.