CharsetDetector / UTF-unknown

Character set detector build in C# - .NET 5+, .NET Core 2+, .NET standard 1+ & .NET 4+
303 stars 45 forks source link

Catch NotSupportedException trying GetEncoding #140

Closed jamesjohnmcguire closed 2 years ago

jamesjohnmcguire commented 2 years ago

Fix for issue #139 (NotSupportedException while trying GetEncoding)

jamesjohnmcguire commented 2 years ago

As I mentioned in issue #139, I can no longer find a reproducible test case for this. I'm pretty sure this happened on a corrupted buffer, probably with some binary data mixed in. Very random. At the time, I was in a time crunch, so I just put in the catch for NotSupportedException, along with exception catch and things started working smoothly again.

I tried the following:

char[] myChars = new char[] { 'z', 'a', '\u0306', '\u01FD', '\u03B2', '\uD8FF', '\uDCFF' };

byte[] bytes = Encoding.UTF7.GetBytes(myChars);

DetectionResult results = CharsetDetector.DetectFromBytes(bytes);

But it worked, no exception thrown.

So, I'm somewhat at a loss as to how to make some tests for this. I'm open to suggestions.

304NotModified commented 2 years ago

Just make an unit test by creating an instance of DetectionDetail with an invalid encoding name?

E.g. new DetectionDetail("wrong",...

304NotModified commented 2 years ago

superseded by https://github.com/CharsetDetector/UTF-unknown/pull/146