dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.36k stars 4.75k forks source link

String.StartsWith doesn't work if the string contains "AA" when using Norwegian/Danish cultureinfo in .NET 6 #73999

Closed runebrg closed 2 years ago

runebrg commented 2 years ago

Description

String.StartsWith() will sometimes return the wrong value if the string contains "AA" and culture is set to Norwegian (nb-NO) or Danish (da-DK)

Even though the double A has a special meaning in Norwegian, I would expect s.StartsWith(s.Substring(0, 2)) to always return true

Example .net fiddle: https://dotnetfiddle.net/h4u01x

Reproduction Steps

var s = "BAAC"; var b = s.StartsWith(s.Substring(0, 2), false, CultureInfo.CreateSpecificCulture("nb-NO"));

Expected behavior

b should be true

Actual behavior

b is false

Regression?

Using .NET 4.7.2, it works as expected for Norwegian, returning true. But for Danish it is still false.

Example: https://dotnetfiddle.net/FCwqGH

Known Workarounds

Specifying InvariantCulture fixes the problem.

Configuration

No response

Other information

No response

ghost commented 2 years ago

Tagging subscribers to this area: @dotnet/area-system-globalization See info in area-owners.md if you want to be subscribed.

Issue Details
### Description String.StartsWith() will sometimes return the wrong value if the string contains "AA" and culture is set to Norwegian (nb-NO) or Danish (da-DK) Even though the double A has a special meaning in Norwegian, I would expect `s.StartsWith(s.Substring(0, 2))` to always return true Example .net fiddle: https://dotnetfiddle.net/h4u01x ### Reproduction Steps `var s = "BAAC"; var b = s.StartsWith(s.Substring(0, 2), false, CultureInfo.CreateSpecificCulture("nb-NO"));` ### Expected behavior b should be true ### Actual behavior b is false ### Regression? Using .NET 4.7.2, it works as expected for Norwegian, returning true. But for Danish it is still false. Example: https://dotnetfiddle.net/FCwqGH ### Known Workarounds Specifying InvariantCulture fixes the problem. ### Configuration _No response_ ### Other information _No response_
Author: runebrg
Assignees: -
Labels: `area-System.Globalization`
Milestone: -
krwq commented 2 years ago

FWIW I don't know specifics of Danish/Norwegian culture but in Polish we also have special phonetic characters (i.e. sz) and I'd be surprised if this:

string test = "Pszczoła";
Console.WriteLine(test.StartsWith(test.Substring(0, 2), false, CultureInfo.CreateSpecificCulture("pl-PL"))); // true

ever returned false (this works as I'd expect for Polish) so it makes sense that this works consistently across other cultures as well.

tarekgh commented 2 years ago

@runebrg this behavior is defined by the Unicode standard. aa is considered equivalent to Å. You may look at the history for more info. Look at the similar issue https://github.com/dotnet/runtime/issues/72770.

If you disagree with this behavior, you may log a ticket to ICU unicode-org.atlassian.net/jira/software/c/projects/ICU/issues.

runebrg commented 2 years ago

I agree that aa should be considered equivalent to å in many cases in Norwegian (though not always, this is context dependent). But I still think the .NET framework behaves inconsistently here. Both "aa".StartsWith("a") and "aa".StartsWith("å") are false, but "aa".StartsWith('a') and "aa".Contains("a") are true

Fiddle: https://dotnetfiddle.net/RcbfSa

svick commented 2 years ago

But I still think the .NET framework behaves inconsistently here.

It does, but this behavior is documented and I believe it can't be changed, because that would break backwards compatibility.

tarekgh commented 2 years ago

The following operations are performed as ordinal operation and not linguistic operation. You can achieve the same things with StartsWith and input string by doing something like "aa".StartsWith("a", StringComparison.Ordinal). This should return true.

        Console.WriteLine("aa".Contains("a")); //True
        Console.WriteLine("aa".StartsWith('a')); //True

Also, consistency with .NET Framework can be achieved if you enable the NLS mode. We don't recommend that though.