Closed Trey314159 closed 7 years ago
Combining characters (incluing diacritics and other characters in non-Latin scripts) cause tokens to split. Some examples from various scripts:
I close this issue, it will be fixed in new tokenizer #37
Combining characters (incluing diacritics and other characters in non-Latin scripts) cause tokens to split. Some examples from various scripts: