Open Artoria2e5 opened 7 years ago
Please feel free to change anything about simplified chinese, since I am not native user for it, the current state is just enough for my previous use cases.
Sure.
Wait... With #17 how did it even work...
You can add/rewrite encoder/decoder and/or replace or add aliases..
Aliases are defined in https://github.com/buganini/bsdconv/blob/master/modules/from/alias and https://github.com/buganini/bsdconv/blob/master/modules/to/alias
After changing alias files, make alias
will update https://github.com/buganini/bsdconv/blob/master/modules/inter/ALIAS-FROM.txt
https://github.com/buganini/bsdconv/blob/master/modules/inter/ALIAS-INTER.txt
https://github.com/buganini/bsdconv/blob/master/modules/inter/ALIAS-TO.txt
https://github.com/buganini/bsdconv/blob/master/modules/inter/ALIAS-FILTER.txt
Big5 is using UAO250 as default decoder and CP950 as default encoder to achieve maximum compatibility for practical use.
bsdconv's GB2312 table which comes from unicode.org and went missing after EASTASIA charts became obsolete is, to some extent, similar to Unicode's Big5 table in quality. (I will use unicode.org's whatever hex to refer to GB codepoints, so add 0x8080 for EUC-CN.)
In GB2312-1980, 212A is defined as 破折号 (em dash), but the Unicode mapping gives a U+2015 (horizontal bar) instead of U+2014, apparently without reading the Chinese text at all. Hence GB2312's decoder should be changed to emit U+2014 just for proper punctuation; the encoder should be made to accept U+2014 too.
By the way, 212A is one of "Unicode" gb2312-80's incompatibilities with GBK; the other one is at 2124. You may choose to use a non-fullwidth, regular "middle dot" as GBK does and W3C CLREQ recommends typographically, but what I hope for now is just the encoder accepting U+00B7.