CharsetDetector / UTF-unknown

Character set detector build in C# - .NET 5+, .NET Core 2+, .NET standard 1+ & .NET 4+
303 stars 45 forks source link

File detected as iso-8859-2, but it is Windows-1250 #141

Open petrbizon opened 2 years ago

petrbizon commented 2 years ago

Hello, I attached the file which is in Windows-1250, but library detects it as iso-8859-2. Thank You for trying. Petr B. 774723_PPBP_0510_02_gu.csv

304NotModified commented 2 years ago

Could you please add the code call?

I'ts important to know if you use DetectFromFile/DetectFromStream/DetectFromBytes/ and with of without loops

petrbizon commented 2 years ago

I receive bytes from web request (byte[] data)

and then I am working with them:

DetectionResult charsetDetectorResult = CharsetDetector.DetectFromBytes(data);
DetectionDetail resultDetected = charsetDetectorResult.Detected;
string encodingName = resultDetected.EncodingName;
petrbizon commented 11 months ago

I tried to look on the file in notepad++ with detected codepage iso-8859-2 and there is wrong character 'ž' image when I am looking on the file with codepage Windows-1250, the character 'ž' is showed right. image