CharsetDetector / UTF-unknown

Character set detector build in C# - .NET 5+, .NET Core 2+, .NET standard 1+ & .NET 4+
307 stars 46 forks source link

Refactor ctor DetectionDetail: not all codepage names are supported by .NET #78

Closed rstm-sf closed 4 years ago

rstm-sf commented 4 years ago

Hello!

Refactor ctor DetectionDetail(string, float, CharsetProber, TimeSpan?, string): not all codepage names are supported by .NET.

Code like Split('(').First().Trim() looks redundant.

To do this, will need to consider that not all codepage names are supported by .NET (see CodepageName by #74). Also, it will be necessary to indicate explicitly in the readme that not all Encoding exists.

And maybe it will be possible to get rid of try / catch?

304NotModified commented 4 years ago

Also, it will be necessary to indicate explicitly in the readme that not all Encoding exists.

Sounds good!

And maybe it will be possible to get rid of try / catch?

That would be nice if it's 100% safe

rstm-sf commented 4 years ago

That would be nice if it's 100% safe

Class CodepageName was created based on .net core source :)

https://github.com/CharsetDetector/UTF-unknown/blob/cc2b081a621001aedfac20229bd555585d366e9f/src/Core/CodepageName.cs#L33-L36

All unsupported can be added to some object and check before calling Encoding.GetEncoding(string)

304NotModified commented 4 years ago

Nice! I checked a lot of files, but missed that :angel:

rstm-sf commented 4 years ago

Not supported iso-2022-ch? Maybe fix to x-cp50227?

see

https://docs.microsoft.com/en-us/windows/win32/intl/code-page-identifiers

https://github.com/microsoft/referencesource/blob/17b97365645da62cf8a49444d979f94a59bbb155/mscorlib/system/text/iso2022encoding.cs#L30

https://github.com/dotnet/corefx/blob/cf28b7896a762f71c990a5896a160a4138d833c9/src/System.Text.Encoding.CodePages/src/System/Text/EncodingTable.Data.cs#L342

https://github.com/dotnet/corefx/blob/cf28b7896a762f71c990a5896a160a4138d833c9/src/System.Text.Encoding.CodePages/src/System/Text/EncodingTable.Data.cs#L65

https://github.com/mirror/reactos/blob/c6d2b35ffc91e09f50dfb214ea58237509329d6b/reactos/dll/win32/kernel32/winnls/lang/en-US.rc#L153-L154

https://www.ibm.com/support/pages/codepage-conversion-fails-mbcs002e-connectdirect-windows-after-upgrading-os-windows-2000-server

rstm-sf commented 4 years ago

Sorry, but the tasks is not yet completed. I wait #86