Open nelson-liu opened 7 years ago
Using byte encoding on unicode characters could be a good idea, vs a single index for each unicode characters.
Allowing for different character encodings in tokenizers that return characters would thus be nice.
Using byte encoding on unicode characters could be a good idea, vs a single index for each unicode characters.
Allowing for different character encodings in tokenizers that return characters would thus be nice.