TheThirdOne / rars

RARS -- RISC-V Assembler and Runtime Simulator
Other
1.14k stars 217 forks source link

readChar and printChar are more UTF-8 aware #221

Closed privat closed 5 days ago

privat commented 5 days ago

Currently, input and output dealing with Unicode is inconsistent.

Some operation assumes UTF-8 where chars are just bytes and 'é' is `c3 a9, some assume internal UTF-16, some other assumptions are platform dependent (because it is the bad default of many Java methods).

Previously:

This PR makes both operations more consistent with a common UFT-8 encoding

Benefits:

Problem that remain:

printChar print each character independent, so even is we printChar c3 then a3, 2 replacement character will be issued instead of a single é.