Open HertzDevil opened 1 month ago
A pure crystal implementation would be lovely. For the sake of the argument, are there alternatives to libiconv?
ucnv_*
API, main problem is either the source or the destination has to be UTF-16Thank you 🙇
The W3C Encoding Standard already sets the bar quite high, but seems to support a good list of general encodings :+1:
There's a part 2 to the comparison article that focuses on C and presents ztd.cuneicode. I'm not saying we should use it, but it sounds like a solid reference, and both articles are treasure trove of information.
Crystal currently relies on iconv or GNU libiconv for conversions between text encodings. This has a few problems:
iconvlist
orlibiconvlist
is present in BSD libc and GNU libiconv respectively.) For all we know, an iconv implementation that doesn't support UTF-8 nor UTF-16 is still POSIX-compliant. The same goes for theinvalid: :skip
option.Char
to be equivalent toInt32
, yet they are not integrated into the usual transcoding APIs likeString#encode
andIO#set_encoding
. In particular, it makes sense that these encodings should remain supported in those places, even when-Dwithout_iconv
is defined.The essence of, for example, UTF-16 to UTF-8 conversion can be implemented on top of
iconv
's function signature as:Going in the opposite direction would need something like #13639 to be equally concise, but the point is that we could indeed achieve this without using iconv at all. If both the source and destination encodings are one of UTF-8, UTF-16, UTF-32, or maybe ASCII, then we could use our own native transcoders instead of iconv; or if we are ambitious enough, we could port the entire set of ICU character set mapping tables in an automated manner, and remove our dependency on iconv.