geongeorge / i-hate-regex

The code for iHateregex.io 😈 - The Regex Cheat Sheet
https://iHateRegex.io
4.5k stars 320 forks source link

Han unification regex is incorrect #107

Open KarolS opened 6 months ago

KarolS commented 6 months ago

The regex tries to use 5-digit Unicode escapes, but Unicode escapes are only 4-digit, which makes it not work.

For example this fragment: \u20000-\u2A6DF is interpreted as 3 Unicode ranges:

I guess the regex should be rewritten using surrogates, like the emoji one.