Closed LilithSilver closed 2 years ago
Oh, is this really the only thing that breaks with utf-8? quity handy.
I will try it my self this weekend, but it looks promissing.
Thanks for the input
Yep, I was surprised as well, but it makes sense considering that UTF-8 was designed for full ASCII compatibility!
Note that if you want the UTF-8 to display properly, you'll have to reinterpret the byte data as UTF-8. Visual Studio for example doesn't support UTF-8 and outputs strings as garbled ASCII extended. But the test confirms that the byte data produced by ink is indeed correct.
Currently, there is a bug with parsing UTF-8 or ASCII Extended characters: the C call
isspace()
doesn't accept negativechar
values. The simple fix is to cast the value to anunsigned char
, which is fine because no ASCII spaces can appear in the negatives of achar
anyways.This PR also adds a test based on a modified version of Markus Kuhn's UTF-8 Demo Page, to ensure that it can parse a variety of characters. The demo is under the CC BY license which allows unrestricted use with attribution, and the attribution is at the top of the file, so we should be good there.