Oldes / Rebol-issues

Issue tracker for https://github.com/oldes/Rebol3
4 stars 0 forks source link

Extend Unicode range #2618

Open ldci opened 4 weeks ago

ldci commented 4 weeks ago

To-char integer! is functional with R3. This only works for code points in the range of 0 to 65535. But, most Unicode characters have code points beyond that range, going up to 1114111. Would it be possible to extend this to integers greater than 65535?

Oldes commented 3 weeks ago

I would not say most Unicode characters. It is mostly the emoticon poison which requires 32bit characters. It is possible to enhance the range, but it is quite a lot of work and not my priority at this moment.

Oldes commented 3 weeks ago

See Carl's comment here: https://github.com/Oldes/Rebol-issues/issues/683

Oldes commented 3 weeks ago

And it is good to read Brian comments as well: https://github.com/Oldes/Rebol-issues/issues/2024

I'm not decided which model to use. But currently UTF-8 everywhere (the path used in Ren-C) is a little bit winning. But it is also a huge amount of work. But have all strings to use 32bit chars just because someone used an emoticon in a text is not good.

Oldes commented 3 weeks ago

Implementing the UCS switching model is easier, but using just UTF-8 internally has many advantages. Of course, it would be best to implement both and compare their real performance.