Closed cyanskies closed 1 year ago
Another issue is Unicode string equality.
We're using encoded byte equality, but some glyphs can be represented using multiple code point combinations, and so basic byte equality isn't sufficient.
EDIT: this can be solved through unicode normalisation, see: https://unicode.org/reports/tr15/. Will have to look at the algorithm to see if it's viable to implement it in this library.
I incorporated uni-algo to provide this capability.
We need a way to count stream input as unicode glyphs rather than chars. Until we have this sorted automatic line breaks and error messages will have unexpected output for lines that contain some unicode glyphs.
Ideally I want to avoid bringing in a large Unicode library, but this is probably a non-trivial challenge.
Gives the following incorrect output:
Instead of: