Open KarolS opened 6 months ago
The regex tries to use 5-digit Unicode escapes, but Unicode escapes are only 4-digit, which makes it not work.
For example this fragment: \u20000-\u2A6DF is interpreted as 3 Unicode ranges:
\u20000-\u2A6DF
U+2000 (which is not a Han character)
from 0 (U+0030) to U-2A6D (which encompasses tons of various characters, including the entire Latin alphabet, but no Han characters)
F (U+0045)
I guess the regex should be rewritten using surrogates, like the emoji one.
The regex tries to use 5-digit Unicode escapes, but Unicode escapes are only 4-digit, which makes it not work.
For example this fragment:
\u20000-\u2A6DF
is interpreted as 3 Unicode ranges:U+2000 (which is not a Han character)
from 0 (U+0030) to U-2A6D (which encompasses tons of various characters, including the entire Latin alphabet, but no Han characters)
F (U+0045)
I guess the regex should be rewritten using surrogates, like the emoji one.