firasdib / Regex101

This repository is currently only used for issue tracking for www.regex101.com
3.24k stars 199 forks source link

Unrecognized `\N{U+1234}` #1967

Closed danon closed 1 year ago

danon commented 1 year ago

Bug Description

In PHP dialect, notation \N{U+1234} matches character of unicode code point 1234.

Reproduction steps

$result = preg_match('/\N{U+1234}/u', 'ሴ', $matched);

or

$result = preg_match('/\N{U+1234}/u', chr(225) . chr(136) . chr(180), $matched);

It should only be recognized with u modifier or with (*UTF) verb.

According to my understanding, for any dd/ddd/dddd that works for \x{dd} it should also work for \N{U+dd}.

firasdib commented 1 year ago

I am unable to find documentation on this. Do you have a link to share?

danon commented 1 year ago

No, sorry I don't :/

I'm writing a regular expression library (https://t-regx.com/), and I was trying to write a parser for regular expression. I didn't know exactly which parts are recognized by the PHP, so I have writting a script that tries different set of characters and by trial-and-error run it over days.

Most return false, but \N{U+1234} returned true - which means it was recognized, and then I latered checked it and it is in fact recognized by the PHP on all versions, with u modifier.

danon commented 1 year ago

@firasdib I found this reference in PCRE source code (it's in C). Maybe it'll be helpful:

https://github.com/PCRE2Project/pcre2/blob/7c49b40e8aed10cc2667dd3c4b7bb692d13ade2a/src/pcre2_compile.c#L1531

It would appear that \N{2,3} is also supported.

firasdib commented 1 year ago

\N is supported, but I wasn't aware of \N{U+NNNN}... I'll see if this is something we want to support, as it seems very undocumented.

danon commented 1 year ago

\N is supported, but I wasn't aware of \N{U+NNNN}..

Me neither, until recently. It's a very wierd syntax. it's clearly a legacy from perl, nonetheless it's still supported in even the newest PHP.

And it doesnt even make sense, since \N is something utterly different to \N{U+nnnn}.

I'll see if this is something we want to support, as it seems very undocumented.

Yea, sure. I'll leave this decision up to you.

firasdib commented 1 year ago

Thank you, I have implemented support for this and it will be included in the next release.