KelvinShadewing / brux-gdk

Free runtime and development kit using SDL and Squirrel
GNU Affero General Public License v3.0
39 stars 20 forks source link

UTF-8 -> Unicode code points #29

Closed ghost closed 2 years ago

ghost commented 2 years ago

Better solution for #27

Rather than taking the approach of fiddling around with converters/wrappers with CP437, instead this code undoes the encoding of Unicode code points into UTF-8 using bit manipulation (https://en.wikipedia.org/wiki/UTF-8#Examples). Now, when a string is passed to the draw function, the Unicode code point of each character is extracted from the UTF-8 byte representation, and that value instead is passed as the index into the bitmap.

Like before, a new bind is needed to handle multi-byte characters, as Squirrel's string length function will over-count those characters. This bind uses the same process as the draw function, but without the bit manipulation.

This worked on my end with the Unifont file in SuperTux, as that bitmap has each character drawn in order of Unicode code point, and I tested it using some Cyrillic script from the Bulgarian translation.

CP437 bitmaps will still work to the same extent as before, as ASCII values retain their index due to compatibility. If anything, it'd probably be easier to make them work fully as you could just map the code point to the proper index in that type of bitmap.

The only catch is that the code only supports up to U+FFFF, as I'm only checking for 3 significant bits in the bit operations. It wouldn't be difficult to add support for more planes, but this alone covers the entirety of Plane 0 in Unicode.