buganini / bsdconv

A simple but powerful DSL for charset/encoding conversion and transformation, pure C implementation with no extra dependencies
https://bsdconv.io/bsdconv/
BSD 2-Clause "Simplified" License
53 stars 6 forks source link

please add cccii decode/from #5

Closed Thomas-Tsai closed 12 years ago

Thomas-Tsai commented 12 years ago

Is it possible to add CCCII(Chinese Character Code for Information Interchange) support? There are many libraries system database still use cccii or big5, and they shoud convert to utf8 since national library did it. This project should be very helpful for converting to UTF-8. Please kindly add cccii for these library database.

Thank You.

buganini commented 12 years ago

Sure, but link for the table at http://unicode.ncl.edu.tw/ is dead for now, do you have a copy?

buganini commented 12 years ago

BTW, According to http://unicode.ncl.edu.tw/, 50,764/46,057 mapped characters, but there are only 19,700 mapping in ftp://unicode.org/Public/5.0.0/ucd/Unihan.txt

buganini commented 12 years ago

19,699 in ftp://unicode.org/Public/6.2.0/ucd/Unihan-6.2.0d3.zip

buganini commented 12 years ago

Done, please test.

Thomas-Tsai commented 12 years ago

This is convert result and log. http://www.libthomas.org/~thomas/temp/cccii-test.txt

This is CCCII MARC data from library system. http://www.libthomas.org/~thomas/temp/export.txt

There are some noise included, but we still can check the cccii convert to utf8 right now!

Thank You.