Closed ISSOtm closed 1 year ago
ICU's license is a bit special, so you might want to consider making it an optional component, which would allow you to not distribute it with RGBDS (even in binary form).
The ICU library (libicudata.a, libicui18n.a, libicuio.a, libicutest.a, libicutu.a, libicuuc.a) is around 38MB and libunistring (libunistring.a) is around 2MB, which is unacceptable for static linking. Both take many minutes to compile even on a good computer and require a lot of dependencies, including Python for ICU. On the other hand libgrapheme (libgrapheme.a) only weighs in at around 40K and is compiled (including Unicode data parsing) in fractions of a second, requiring nothing but a C99 compiler and make(1).
While ICU and libunistring offer a lot of functions and the weight mostly comes from locale-data provided by the Unicode standard, which is applied implementation-specifically (!) for some things, the same standard always defines a sane 'default' behaviour as an alternative in such cases that is satisfying in 99% of the cases and which you can rely on.
If the only thing we need more Unicode handling for is case conversion, https://github.com/rust-lang/rust/blob/master/library/core/src/unicode/unicode_data.rs looks portable without needing an entire ICU library.
Honestly I don't think we want to depend on a version of the Unicode Standard, and given RGBASM's existing ASCII reliance, I'm of the opinion that we should define the case conversion functions to only work on ASCII?
Yeah, that would be sensible.
I'm happy with that approach.
STRUPR
andSTRLWR
do not handle non-ASCII text properly, as in theSTRUPR
call below:Processing Unicode correctly beyond UTF8 encoding/decoding is difficult, so it would probably be best to use an external library for this. 0.4.3 / 0.5.0 already changed dependencies (Yacc → Bison), so this is probably a good opportunity. Two questions, then: