bitkeeper-scm / little-lang

The Little Programming Language
http://www.little-lang.org
215 stars 18 forks source link

Unicode Strings cannot handle high-plane characters #9

Open HalosGhost opened 8 years ago

HalosGhost commented 8 years ago

This may very well be due to how tcl handles strings underneath L.

Here is an initial PoC:

#!/usr/bin/L

puts("🐼");

Running the above file produces the following output:

ð¼

That is, characters with UTF-32 codepoints beyond 0xffff are out-of-range and are not displayed correctly.

It seems like this might be because tcl appears to use UTF-16 (or what it refers to as “double-byte” representation) internally.

As a result, anything high-plane is quite difficult to do in L.