CharsetDetector / UTF-unknown

Character set detector build in C# - .NET 5+, .NET Core 2+, .NET standard 1+ & .NET 4+
313 stars 47 forks source link

Unrecognized encoded Chinese text file #142 #143

Open melinyi opened 2 years ago

melinyi commented 2 years ago

Unrecognized encoded Chinese text file #142

I have uploaded the corresponding file

304NotModified commented 2 years ago

What is the expected encoding?

melinyi commented 2 years ago

What is the expected encoding?

Chinese encoding, maybe GB18030

zhuxb711 commented 2 years ago

From my side, GB2312 was recognized as EUC-JP with confidence 0.99 if the text is short (10 characters). But correct if it's text is long (>200 characters)

Zeugma440 commented 1 year ago

Any chance we're gonna get an update on that one, given the low activity of late?

My library has an open issue depending on it 😅