Closed liyujiang-gzu closed 3 years ago
By the way, use http://jchardet.sourceforge.net can detected
hope support , offcen use gbk ....
Hi I don't know how to work (or test) with Chinesse Charsets. GB18030 was disabled in 2.0 to fix #9 and #11. Can you try version 1.0.3 and check if it works for you. Then we can try to reenable.
Hi I don't know how to work (or test) with Chinesse Charsets. GB18030 was disabled in 2.0 to fix #9 and #11. Can you try version 1.0.3 and check if it works for you. Then we can try to reenable.
Thank you for your reply.
It's easy to create a file in GBK:
~/gbk-sample.txt
M-x set-buffer-file-coding-system gbk
and saveThe result is attached. gbk-sample.txt
I can confirm that juniversalchardet correctly detects the sample file after uncommenting the line here:
$ mvn test # fails because of false-positive unit tests
$ java -cp target/classes:target/test-classes org/mozilla/universalchardet/example/TestDetector gbk-sample.txt
Detected encoding = GB18030
(It also works for the file posted by @liyujiang-gzu)
I'm not sure what should be done to prevent the false positives. I checked and the texts in both of the unit tests don't seem to be valid GBK.
GB 18030 is compatible with GBK and GBK is compatible with GB2312.
GB18030 is the registered Internet name for the official character set of the People's Republic of China (PRC) superseding GB2312.[1] As a Unicode Transformation Format[a] (i.e. an encoding of all Unicode code points), GB18030 supports both simplified and traditional Chinese characters. It is also compatible with legacy encodings including GB2312, CP936,[b] and GBK 1.0.
https://en.wikipedia.org/wiki/GB_18030
@albfernandez @amake Chinese just need GB18030
After testing, GB2312 also can not be detected