Closed wisim closed 4 years ago
Thank you for the issue. I don't understand what is the error with this pattern. I just copy+pasted the string to EmEditor and ran WordCount and saw no issue. Please elaborate.
Thank you for the issue. I don't understand what is the error with this pattern. I just copy+pasted the string to EmEditor and ran WordCount and saw no issue. Please elaborate.
This pattern is for counting cjk extension of Unicode.
And the issue is , it can't work with these code points.
Characters like 𠀁 will not be matched, and WordCount will count all characters of the selected text.
I'm having trouble understanding because I don't know what programming language this is.
[\x{20000}-\x{2A6D6}\x{2A700}-\x{2B734}\x{2B820}-\x{2CEA1}\x{2CEB0}-\x{2EBE0}\x{30000}-\x{3134A}]
If I enter "𠀀𪛖", I get
Characters 2
Width 4
Words 0
Lines 1
View Lines 1
Pages 1
Which is correct.
Also, which EmEditor version did you use?
I'm having trouble understanding because I don't know what programming language this is.
[\x{20000}-\x{2A6D6}\x{2A700}-\x{2B734}\x{2B820}-\x{2CEA1}\x{2CEB0}-\x{2EBE0}\x{30000}-\x{3134A}]
If I enter "𠀀𪛖", I get
Characters 2 Width 4 Words 0 Lines 1 View Lines 1 Pages 1
Which is correct.
Also, which EmEditor version did you use?
Oh, sorry, for correction:
The problem is that If I enter 𠀀𪛖 jkjlkj
, I get
Characters 9
My versionis 20.1.0.
That should be correct, though EmEditor gives the wrong character count on the status bar.
I asked Yutaka and he said that EmEditor counts UTF-16 surrogate pairs as two characters. WordCount counts a surrogate pair as one character. As of now, EmEditor's behavior will not change.
[\x{20000}-\x{2A6D6}\x{2A700}-\x{2B734}\x{2B820}-\x{2CEA1}\x{2CEB0}-\x{2EBE0}\x{30000}-\x{3134A}]