buganini / bsdconv

A simple but powerful DSL for charset/encoding conversion and transformation, pure C implementation with no extra dependencies
https://bsdconv.io/bsdconv/
BSD 2-Clause "Simplified" License
53 stars 6 forks source link

GB2312 is not in any of its interchangable encodings #17

Open Artoria2e5 opened 7 years ago

Artoria2e5 commented 7 years ago

GB2312, which is just a table without binary numbers, have various interchangable encodings like EUC-CN (the common "gb2312" encoding which looks like GBK) and HZ (which uses escapes). Bsdconv's current hex mappings should be all added with 0x8080 to generate the actual hex digits in EUC-CN.

buganini commented 7 years ago

Current table for GB2312 came from http://glyph.iso10646hk.net/download/GB2312.TXT And table for GBK came from http://icu-project.org/repos/icu/data/trunk/charset/source/gb18030/gbkuni30.txt gbk decoder is probably problematic since it's transposed from encoder.. I forgot how I got them... I only used them to convert some short articles and id3 tags.