CharsetDetector / UTF-unknown

Character set detector build in C# - .NET 5+, .NET Core 2+, .NET standard 1+ & .NET 4+
303 stars 45 forks source link

How about changing the Shift-JIS codepage name to `shift_jis` #124

Open puran1218 opened 3 years ago

puran1218 commented 3 years ago

It seems shift-jis is not a common use name of the Shift-JIS codepage. How about changing the Shift-JIS codepage name to shift_jis?

The change will affect two places: ~shift-jis~ -> shift_jis

  1. https://github.com/CharsetDetector/UTF-unknown/blob/43623b7e6895f328c35624cc90772a691755b50e/src/Core/CodepageName.cs#L151
  2. https://github.com/CharsetDetector/UTF-unknown/blob/43623b7e6895f328c35624cc90772a691755b50e/README.md#L102
304NotModified commented 3 years ago

Sounds good to me, but I guess it's a (semantic) breaking change?

rstm-sf commented 3 years ago

We can check this for the rest of the names as well. Here's an example for shift_jis

https://github.com/dotnet/runtime/blob/547b2018a9013e18a9128478355380796d7ebccd/src/libraries/System.Text.Encoding.CodePages/tests/EncodingCodePages.cs#L87-L94

304NotModified commented 3 years ago

I think also that shift_jis is better, but any proposal to change it without breaking others? (E.g. code build on shift-jis)

rstm-sf commented 3 years ago

Sorry, but I didn't quite understand the question :)

For the creation of the Encoding object, this shouldn't be affected, in my opinion.

This will only affect the EncodingName https://github.com/CharsetDetector/UTF-unknown/blob/43623b7e6895f328c35624cc90772a691755b50e/src/DetectionDetail.cs#L57

304NotModified commented 3 years ago

This is an issue:

if(encoder.Detected?.EncodingName == "shift-jis")
{
...
}

As it will be "shift_jis".

I think this is a good idea, but because it's a semantic breaking change, it should wait for v3