CharsetDetector / UTF-unknown

Character set detector build in C# - .NET 5+, .NET Core 2+, .NET standard 1+ & .NET 4+
308 stars 46 forks source link

Added support for CP949 #59

Closed HelloWorld017 closed 5 years ago

HelloWorld017 commented 5 years ago

Imported CP949 detector from sv24-archive/charade@ae6dd2 under LGPL 2.1 Edited CP949 detector as that code has some bugs (Class 8-Related bugs)

Added alternative naming in EncodingShortName

See Also: https://github.com/chardet/chardet/blob/master/chardet/cp949prober.py

What is CP949?

It is a superset of Korean encoding, EUC-KR.
It is default locale of Korean Windows.

Is it different with EUC-KR?

Yes. CP949 is superset of EUC-KR. Some extra characters have been added by Microsoft.

But many people call CP949 as EUC-KR confusingly (even if they encoded it as CP949). It is because WHATWG Encoding Standard calls CP949 as "EUC-KR" (in fact, it is CP949).

This program only supports EUC-KR, not CP949. So I added CP949 support.