Open daverayment opened 5 months ago
Tagging subscribers to this area: @dotnet/area-system-globalization See info in area-owners.md if you want to be subscribed.
In .NET 5.0 and later, we switched to using the ICU library. For more information, please refer to this article.
You may notice some behavioral differences between the legacy NLS (used in .NET Framework) and ICU. In ICU, the StringSort
behavior is enabled by default, rendering the StringSort
option ineffective. This default setting is why you consistently see the following order:
bill's
billet
bills
can't
cannot
cant
co-op
con
coop
This behavior is explained in the comment in the code here. We do not plan to change this behavior in the future as we adhere to ICU behavior, which aligns with the Unicode Standard.
We may add some information about this specific case in the documentation in the article.
@tarekgh Thank you for the quick response.
Sorry, I do see now that the StringSort
option is being applied by default in .NET 5+ rather than not being applied at all.
This still means the CompareOptions
documentation is incorrect for .NET 5 and later. The example code says to expect different outputs for None
and StringSort
options.
I will raise a separate documentation issue for that page and refer back here. I also thank you for suggesting an update to the ICU article to mention the CompareOptions
change - that would be very useful, as I read that article myself while trying to troubleshoot.
Thanks again!
I've raised a new documentation issue for the CompareOptions
enum page: https://github.com/dotnet/docs/issues/41052
Description
When comparing strings,
CompareOptions.StringSort
should apply low sort weights to hyphens and other non-alphanumeric characters. This works in .NET Framework projects. In .NET 5 and later, however, the weightings are not applied and the results of sorting withCompareOptions.StringSort
are the same as whenCompareOptions.None
is chosen.Note: I am using the default ICU Unicode processing for .NET 5+ testing.
Reproduction Steps
This code is adapted from the
CompareOptions Enum
documentation page here. The word list has been copied verbatim.DotNetFiddle for the code here.
Expected behavior
The
CompareOptions.StringSort
should apply a correct weighted ordering to the unordered collection of strings. The results are correct in .NET Framework 4.7.2 and Roslyn 4.8:Actual behavior
In .NET 5 and later,
CompareOptions.StringSort
is incorrect, producing the same results asCompareOptions.None
:Regression?
According to testing on dotnetfiddle.net, the correct results were produced in .NET Framework 4.7.2 and Roslyn 4.8. .NET 5 and later produce the incorrect sort order.
Known Workarounds
A potential workaround may be to switch from ICU to NLS, but I have not tested this.
Configuration
My system:
I don't think the issue is specific to my OS or architecture, as the same problem can be seen via dotnetfiddle.
Other information
No response