Closed Dan-wanna-M closed 2 months ago
One major problem is whether we want to support invalid UTF-8 bytes. It might be useful given how current BPE tokenizer works, but it requries significant refactoring of kbnf-syntax.
\uXXXX basic unicode support done in v0.1.6.
Fully supported(and actually support more features than this issue) in v0.1.7.
\uXXXX
basic unicode support\uXXXXXX
full unicode support