Some alphabets with accent are mistakenly treated.

coldnew / pangu-spacing

emacs minor-mode to add space between Chinese/Japanese/Korean and English characters.

144 stars 16 forks source link

Some alphabets with accent are mistakenly treated. #4

Closed kuanyui closed 10 years ago

kuanyui commented 10 years ago

Some alphabets with accent are mistakenly treated. For example, Frédéric Chopin will be converted to Fr é d é ric Chopin

Ferada commented 10 years ago

Same issue with öüäß and other diacritics. I don't exactly know what the general fix should be like (e.g. should it be a single space between every non- and chinese character?), but using (category latin) instead of (in "[a-zA-Z0-9]") in the regex definitions seems reasonable if you're using the latin alphabet, i.e. https://github.com/Ferada/pangu-spacing/commit/4a140aa23a6b056acbcfe967c458f66412dea45a, also (category chinese-two-byte), because at least á is included in the chinese character class, but I'm assuming this isn't particularly helpful for this mode.

coldnew commented 10 years ago

I think the general fix is to make a single space between every non- and Chinese characters. Using (category latin) is resonable. However, after I use (category latin) to replace (in "[a-zA-Z0-9]") in the regex, it seems like if space already between chinese and non- charaters, pangu-spacing-mode will still add a dulpicate space between them. I'll find the solution and fix this issus. Thanks :)

coldnew commented 10 years ago

Using (category latin) may also make pangu-spacing-mode use more time to generate the virtual space, I'll fix all these up then update this mode.

coldnew commented 10 years ago

At last, I only use (category chinese-two-byte) to prevent this issue, since use (category latin) will cause this mode use more time to parse the buffer, which are not acceptable.