Closed kkurni closed 4 years ago
CC @saurabh500 @corivera
/cc @eerhardt. Eric just recently landed changes to improve our support for these flags, so it would be good to understand exactly what CoreCLR build you were testing with.
I built using linux and run it on ubuntu14.0.4 msbuild /p:OSGroup=Linux
Do you also build and copy over a custom coreclr version or do you just use whatever comes by default?
I built this on Windows and copy the whole dnxcore50 to unix and run it in Unix.
and I didn't update coreclr for a while.
Does the fix is in CoreCLR ?
Yes, the fix is in CoreCLR, not CoreFX. We will need to update the test runtime we use.
I just update my coreCLR and built it again, but still getting the same issue.
I was able to repro this issue with the latest CoreCLR as well.
It appears the problem is coming from the last character in that line. What appears to be a dash.
Test this in UNIX fails
CultureComparisonTests("ja-JP", "ファズ・ギター" , "ファズ・ギター");
But this test in UNIX passes
CultureComparisonTests("ja-JP", "ファズ・ギタ" , "ファズ・ギタ");
Good finding.
I've reproed this in a standalone native application on my Mac OSX El Capitan machine (ICU 55).
void CompareStrings(const char * localeName)
{
UErrorCode err;
UCollator *pCol = ucol_open(localeName, &err);
ucol_setAttribute(pCol, UCOL_STRENGTH, UCOL_SECONDARY, &err);
const UChar halfWidth[] = { 0xff80, 0xff70 }; // "ター"
const UChar fullWidth[] = { 0x30bf, 0x30fc }; // "ター"
int result = ucol_strcoll(pCol, halfWidth, 2, fullWidth, 2);
printf("%d\n", result);
ucol_close(pCol);
}
int main(int argc, const char * argv[])
{
CompareStrings("en");
CompareStrings("ja");
CompareStrings("zh");
CompareStrings("ko");
return 0;
}
that code prints out:
0
-1
0
0
I'll be sending an email to ICU discussion on this issue, but I'm assuming this issue is going to get closed as "external" since this is the underlying ICU behavior.
I reproed this on ubuntu 14.0.4. Does ubuntu use the same ICU 55 ? will it be the same for all unix ?
My Ubuntu 14.04 was using ICU 52, which is even older.
@markusicu replied on the ICU support email discussion:
This is the issue: halfwidth vs. fullwidth forms. The Japanese sort order has special rules for length-mark-after-syllable, but only for the regular length mark, not for its halfwidth form. It also does not seem to have a complete duplicate of the rules for the halfwidth syllable (the Ta) compared to its regular form. The data is in CLDR: http://unicode.org/cldr/trac/browser/trunk/common/collation/ja.xml It is curious that we have had this sort order for some twelve years but no one seems to have noticed or cared... If you want the halfwidth forms to be treated like the regular forms, then please submit a ticket at http://unicode.org/cldr/trac/newticket
I'm going to close this issue, since this needs to be fixed in the underlying CLDR data. Once that happens, if someone is using an ICU version with the fixed data, this bug will be fixed.
Thanks, @eerhardt. I assume you opened a ticket per the response?
Yes - I opened http://unicode.org/cldr/trac/ticket/9146 for the underlying CLDR data problem.
There seems to be a bug when ignoring Kana Character in ja-JP locale.
Repro
Summary
Test this in UNIX
Expected to be true for comparison with ignore Kana and ignore case character
This only failed in