CharsetDetector / UTF-unknown

Character set detector build in C# - .NET 5+, .NET Core 2+, .NET standard 1+ & .NET 4+
303 stars 45 forks source link

Difference between SBCSCodePageEncoding and GetEncoding(1252) #131

Closed dradovic closed 2 years ago

dradovic commented 2 years ago

In my stream, I get a Detected: Detected windows-1252 with confidence of 0,8796469. So the Encoding property refers to an instance of a System.Text.SBCSCodePageEncoding encoding (which does not work for me). However, this encoding does not seem to be equivalent to CodePagesEncodingProvider.Instance.GetEncoding(1252) (which would work for me).

What is the reason for choosing the first one? And what's the difference to the latter one?

dradovic commented 2 years ago

I realized that I forgot to rollback the stream (stream.Position = 0;) after running detection. Now, it seems that the detected encoding just works fine.