Closed Ayesh closed 3 months ago
This is a bug, caused by my misreading or misunderstanding one of the rules in Unicode Annex 29, way back when I implemented \X. I'm a bit surprised it's taken so long for it to hit anybody. Furthermore, the documentation correctly describes what the code does, but it's not what it's supposed to do! (Somewhere I even noted a difference from Perl, but never investigated.) I hope to have this fixed in HEAD in the next day or two. This is a very timely issue because the 10.44 release will be forthcoming once this fix is done. Thanks for the report.
Thank you. I tested after commit 067c2f1f5851335d4b6feff8b5c5a566d6f9e669, it worked correctly!
Using PCRE2 10.43, the
\X
selector seems to capture more than one graphemes, as if does not break before the start of a new grapheme cluster.Regex:
\X
Input:🏳️🌈🏴☠️
(U+1F3F3 U+FE0F U+200D U+1F308
+U+1F3F4 U+200D U+2620 U+FE0F
)When run,
\X
matches both flag graphemes: Regex101 preview.Could you kindly shed me a light if I'm missing something?
Thank you.