firasdib / Regex101

This repository is currently only used for issue tracking for www.regex101.com
3.25k stars 199 forks source link

Incomplete or missing support for Unicode properties in PCRE2 mode #2336

Closed david-wahlstedt closed 6 hours ago

david-wahlstedt commented 3 weeks ago

Bug Description

Many unicode properties are not suported, they are too many to test and report here, and you are probably already aware of them, but here is a list of some of it:

Reproduction steps

Enter the examples above in the regex field, and get syntax errors.

Expected Outcome

No syntax error

Browser

Chrome and Firefox on Linux, latest versions

OS

Ubuntu 22.04

firasdib commented 2 weeks ago

This was more than I knew, I will make sure to adjust the parser to handle the cases you have described! Thank you for the great report.

david-wahlstedt commented 2 weeks ago

This was more than I knew, I will make sure to adjust the parser to handle the cases you have described! Thank you for the great report.

Thanks! I tried pasting in all the binary properties(as from the output of pcre2test -LP), and they are all accepted! I noticed that the explanation to the right says "\p{ahex} matches any characters in the ahex script", even if the "full name" of the property is Asciihexdigit, also it doesn't say what it matches(but it works). But I tested the matching and it works correctly. I haven't tried matching with the other properties, though, only parsing.

firasdib commented 2 days ago

I've added support to this in the new version. The explanation always \p{...} matches any characters in the ... script -- is there a better way to explain this without too much manual labour?

david-wahlstedt commented 1 day ago

I've added support to this in the new version. The explanation always \p{...} matches any characters in the ... script -- is there a better way to explain this without too much manual labour?

Thanks! Sounds like a good explanation!