Currently, input and output dealing with Unicode is inconsistent.
Some operation assumes UTF-8 where chars are just bytes and 'é' is `c3 a9, some assume internal UTF-16, some other assumptions are platform dependent (because it is the bad default of many Java methods).
Previously:
printChar prints the lowest bytes as a character, so if a0 is a9c3 then c3 is understood as a code-point, so à is printed.
readChar depends on the mode, in terminal and GUI mode, it seems that utf-8 is used: bytes are read as is. In popup mode, the code-point of the first character read is used, that might even set a0 to a value larger than 0xff or even negative
This PR makes both operations more consistent with a common UFT-8 encoding
readChar read bytes, so if the input is c3 a9, readchar consume and return c3 in all mode and systems
printChar print the lowest byte as is, since c3 alone is not a valid UTF-8 unicode character, a replacement character is displayed by the GUI.
Benefits:
Behavior more consistent
in case of bugs, replacement character likely used instead of some random character, making the behavior more explainable.
Problem that remain:
printChar print each character independent, so even is we printChar c3 then a3, 2 replacement character will be issued instead of a single é.
Currently, input and output dealing with Unicode is inconsistent.
Some operation assumes UTF-8 where chars are just bytes and
'é'
is `c3 a9, some assume internal UTF-16, some other assumptions are platform dependent (because it is the bad default of many Java methods).Previously:
a9c3
thenc3
is understood as a code-point, soÃ
is printed.a0
to a value larger than0xff
or even negativeThis PR makes both operations more consistent with a common UFT-8 encoding
c3 a9
, readchar consume and returnc3
in all mode and systemsc3
alone is not a valid UTF-8 unicode character, a replacement character is displayed by the GUI.Benefits:
Problem that remain:
printChar print each character independent, so even is we printChar
c3
thena3
, 2 replacement character will be issued instead of a singleé
.