Closed MIMAXUZ closed 3 years ago
Hello!
It is possible that the library could not detect what encoding the file has.
When I open the file via notepad, Encoding shows ANSI.
Do you mean nodepad++? This library slightly different algorithm, see https://github.com/CharsetDetector/UTF-unknown/issues/80
NullReferenceException
is also thrown if file is empty (file size is 0):
// Detect from File (NET standard 1.3+ or .NET 4+)
DetectionResult result = CharsetDetector.DetectFromFile("path/to/file.txt"); // or pass FileInfo
Maybe it also fails for other methods which accept strings/streams.
Please share a full stracktrace, thanks!
I verified detection on empty file/stream/bytes and it works as expected:
[Test]
public void CharsetDetector_EmptyStreamDetection_DetectedShouldBeNull()
{
const string emptyFile = "empty.txt";
File.Create(emptyFile).Dispose();
Assert.IsNull(CharsetDetector.DetectFromFile(emptyFile).Detected);
Assert.IsNull(CharsetDetector.DetectFromStream(File.Open(emptyFile, FileMode.Open)).Detected);
Assert.IsNull(CharsetDetector.DetectFromBytes(Array.Empty<byte>()).Detected);
}
@MIMAXUZ It means that encoding detection failed for your file - in this case charsetDetectorResult.Detected.Encoding
is null
.
Thanks for the confirm @i2van
Indeed, Detected could be null
is the detection failed.
The code in the start:
int encodeResult = enocder.Detected.Encoding.CodePage
Could indeed throw an exception
recommend usage:
int? encodeResult = enocder.Detected?.Encoding.CodePage;
I have several files and I can read them in own ecnoding format. But there is a problem reading a single file. I read the file by determining which codePage contains the information in the file.
I have the following code:
Error:
But no other file had such a problem. When I open the file via notepad, Encoding shows ANSI. The file is not empty, and contains mostly texts in the Cyrillic alphabet. I taught in 1251, UTF-8 format but ???? character is changing. How can the problem be solved? Thank you!