Closed armandsuiska closed 2 years ago
this is what I got, seems like a big itself
All of our culture-aware data on Linux comes from https://icu.unicode.org/home.
If you put the input data here into their online tool at https://icu4c-demos.unicode.org/icu-bin/collation.html, you'll see the same output:
input:
Ādb
Aug
Ārz
Alū
Ada
Aiz
Āda
Ārv
Cda
Ciz
Clū
Cug
Čda
Čdb
Črv
Črz
output:
<1 [5] Ada
<2 [7] Āda
<1 [1] Ādb
<1 [6] Aiz
<1 [4] Alū
<1 [8] Ārv
<1 [3] Ārz
<1 [2] Aug
<1 [9] Cda
<2 [13] Čda
<1 [14] Čdb
<1 [10] Ciz
<1 [11] Clū
<1 [15] Črv
<1 [16] Črz
<1 [12] Cug
But if you switch the locale to cs
(Czech), it becomes
<1 [5] Ada
<2 [7] Āda
<1 [1] Ādb
<1 [6] Aiz
<1 [4] Alū
<1 [8] Ārv
<1 [3] Ārz
<1 [2] Aug
<1 [9] Cda
<1 [10] Ciz
<1 [11] Clū
<1 [12] Cug
<1 [13] Čda
<1 [14] Čdb
<1 [15] Črv
<1 [16] Črz
So this looks like just a difference between Windows NLS data and Linux ICU data; but that's out of .NET's hands. https://docs.microsoft.com/en-us/dotnet/core/extensions/globalization-icu
I have problem with text ordering in .Net core.
List.Sort() or linq.OrderBy() is not ordering utf8 chars correctly.
Here is simple .Net core 6 console application:
Output:
I see that letters C,Č are ordered correctly, but A,Ā are not. And it is ordered correctly in .Net Framework 4.8 - it feels like bug to me - can someone check it and confirm it?