Open wesinator opened 6 years ago
Glad to see you.
I'm just a general user rather than official maintainer. So I just share some of my ideas here.
(By the way, the standard GB 2312-1980 had been renamed to GB/T 2312-1980 in 2017.)
For standard documents they are: GB/T 2312-1980 ⊊ GBK 1.0 ⊊ GB 18030-2000 ⊊ GB 18030-2005
The latest effective standard is GB 18030-2005. All of the rest were replaced.
Maybe it is hard to identify if a file is encoded in GB 18030 (unless it has unique characters of GB 18030).
For example, if I create a file in GB 18030 and input some characters from CJK Unified Ideographs Extension B, which has been included in GB 18030-2005, it cannot be decoded correctly by encode guess.
https://github.com/atom/encoding-selector/issues/65
Steps to Reproduce
https://github.com/malice-plugins/yara/blob/17a4fc946febe8b002e285f591bcb21b92a99e9e/rules/userdb_panda.yar
Expected behavior: Detects the encoding of the file as GB18030.
iconv -f GB18030 -t UTF-8 userdb_panda.yar
worksActual behavior: Atom auto detects the encoding as gb2312, 'undefined encoding'
iconv fails to convert from GB2312, but works with GB18030:
Reproduces how often: Always