Open ldci opened 4 weeks ago
I would not say most Unicode characters. It is mostly the emoticon poison which requires 32bit characters. It is possible to enhance the range, but it is quite a lot of work and not my priority at this moment.
See Carl's comment here: https://github.com/Oldes/Rebol-issues/issues/683
And it is good to read Brian comments as well: https://github.com/Oldes/Rebol-issues/issues/2024
I'm not decided which model to use. But currently UTF-8 everywhere (the path used in Ren-C) is a little bit winning. But it is also a huge amount of work. But have all strings to use 32bit chars just because someone used an emoticon in a text is not good.
Implementing the UCS switching model is easier, but using just UTF-8 internally has many advantages. Of course, it would be best to implement both and compare their real performance.
To-char integer! is functional with R3. This only works for code points in the range of 0 to 65535. But, most Unicode characters have code points beyond that range, going up to 1114111. Would it be possible to extend this to integers greater than 65535?