CultureInfo.TextInfo.ListSeparator broken in .Net 5

olmobrutall commented 4 years ago

As part of updating of updating the Csv class in Signum.Utilities to .Net 5, I've realize that the value for ListSeparator has changed for many cultures that use , instead of . for decimals separators.

foreach (var ci in new [] { "", "en", "en-GB", "en-US", "es", "es-ES", "de", "de-DE", "fr", "fr-FR" })
{
    Console.WriteLine($"Culture {ci}\tListSeparator {CultureInfo.GetCultureInfo(ci).TextInfo.ListSeparator}");
}

netcoreapp3.1

Culture         ListSeparator ,
Culture en      ListSeparator ,
Culture en-GB   ListSeparator ,
Culture en-US   ListSeparator ,
Culture es      ListSeparator ;
Culture es-ES   ListSeparator ;
Culture de      ListSeparator ;
Culture de-DE   ListSeparator ;
Culture fr      ListSeparator ;
Culture fr-FR   ListSeparator ;

net5 rc2

Culture         ListSeparator ,
Culture en      ListSeparator ,
Culture en-GB   ListSeparator ,
Culture en-US   ListSeparator ,
Culture es      ListSeparator .
Culture es-ES   ListSeparator .
Culture de      ListSeparator .
Culture de-DE   ListSeparator .
Culture fr      ListSeparator  
Culture fr-FR   ListSeparator

Both tests in the same machine.

Previously I was using ListSeparator as a way to detect the separator that will be used in a Csv file. Is this the intended purpose?

ghost commented 4 years ago

Tagging subscribers to this area: @tarekgh, @safern, @krwq See info in area-owners.md if you want to be subscribed.

tarekgh commented 4 years ago

@olmobrutall In .NET 5.0 we have switched to depend on the ICU instead of NLS Win32 APIs. Usually ICU behavior is more correct as it picks its globalization data from CLDR (Unicode Standard). Please have a look at the doc https://docs.microsoft.com/en-us/dotnet/standard/globalization-localization/globalization-icu for more details about the change in .NET 5.0. We offer a config switch too to allow apps revert back to old behavior if needed too. the doc has details how you can use the config switch.

Another point, Globalization data is not considered constant and can change. It is not right to take any dependency on the Globalization data or assume it will never change. If you see any data doesn't make sense or is not correct, the issue can be raised to the CLDR and can be tracked to be discussed and fixed if it is real issue.

Let me know if you have any question.

ANahr commented 4 years ago

Sorry @tarekgh, but to me this surely looks like a bug. If you look at the relevant code at https://github.com/dotnet/runtime/blob/0e402bcd28f38a3b65720474dcd7a338cab0841d/src/libraries/Native/Unix/System.Globalization.Native/pal_localeStringData.c#L246 you can see that it isn't even implemented (falls through to ThousandsSeparator). So either this is a bug in .Net Core or the info is generally not available in ICU, which would then still be a bug (And would not be an issue for CLDR if it is not part of it). As native german I can tell that using . as a list separator does not make any sense.

ANahr commented 4 years ago

After checking our code this would be a .Net 5 blocker for us for the majority of our software products unless we would remove all uses of ListSeparator. A ListSeparator has no connection of any kind (and should not have) to the ThousandsSeparator.

olmobrutall commented 4 years ago

Thanks @ANahr, nice the you found the code.

I've checked .Net Core 3.1 behaviour to see if ListSeparator can be inferred from DecimalsSeparator somehow.

ListSeparator ,   NumberDecimalSeparator . (180)
    , as, as-IN, bn, bn-BD, bn-IN, bo, bo-CN, chr, chr-Cher, chr-Cher-US, en, en-001, en-029, en-AG, en-AI, en-AS, en-AU, en-BB, en-BI, en-BM, en-BS, en-BW, en-BZ, en-CA, en-CC, en-CK, en-CM, en-CX, en-CY, en-DM,
    en-ER, en-FJ, en-FK, en-FM, en-GB, en-GD, en-GG, en-GH, en-GI, en-GM, en-GU, en-GY, en-HK, en-IE, en-IL, en-IM, en-IN, en-IO, en-JE, en-JM, en-KE, en-KI, en-KN, en-KY, en-LC, en-LR, en-LS, en-MG, en-MH, en-MO,
    en-MP, en-MS, en-MT, en-MU, en-MW, en-MY, en-NA, en-NF, en-NG, en-NR, en-NU, en-NZ, en-PG, en-PH, en-PK, en-PN, en-PR, en-PW, en-RW, en-SB, en-SC, en-SD, en-SG, en-SH, en-SL, en-SS, en-SX, en-SZ, en-TC, en-TK,
    en-TO, en-TT, en-TV, en-TZ, en-UG, en-UM, en-US, en-VC, en-VG, en-VI, en-VU, en-WS, en-ZM, en-ZW, es-MX, es-US, gu, gu-IN, he, he-IL, hi, hi-IN, hy, hy-AM, iu, iu-Cans, iu-Cans-CA, iu-Latn, iu-Latn-CA, ja,
    ja-JP, km, km-KH, kn, kn-IN, ko, ko-KR, kok, kok-IN, ks-Deva, ks-Deva-IN, la, la-001, mi, mi-NZ, mn-Mong, mn-Mong-CN, mn-Mong-MN, mni, mni-IN, moh, moh-CA, mr, mr-IN, ne, ne-NP, or, or-IN, pa, pa-Guru, pa-IN, pap,
    pap-029, quc, quc-Latn, quc-Latn-GT, quz-PE, sa, sa-IN, sd-Deva, sd-Deva-IN, syr, syr-SY, ta, ta-IN, te, te-IN, th, th-TH, ug, ug-CN, zh, zh-CN, zh-Hans, zh-Hant, zh-HK, zh-MO, zh-SG, zh-TW

ListSeparator ;   NumberDecimalSeparator . (264)
    aa, aa-DJ, aa-ER, aa-ET, ak, ak-GH, am, am-ET, ar, ar-001, ar-AE, ar-BH, ar-DJ, ar-DZ, ar-EG, ar-ER, ar-IL, ar-IQ, ar-JO, ar-KM, ar-KW, ar-LB, ar-LY, ar-MA, ar-OM, ar-PS, ar-QA, ar-SA, ar-SD, ar-SO, ar-SS, ar-SY,
    ar-TD, ar-TN, ar-YE, asa, asa-TZ, bem, bem-ZM, bez, bez-TZ, bin, bin-NG, bm, bm-Latn, bm-Latn-ML, bo-IN, brx, brx-IN, byn, byn-ER, ce, ce-RU, cgg, cgg-UG, cy, cy-GB, dav, dav-KE, de-CH, de-LI, dje, dje-NE, dz,
    dz-BT, ebu, ebu-KE, ee, ee-GH, ee-TG, es-419, es-BR, es-BZ, es-CU, es-DO, es-GT, es-HN, es-NI, es-PA, es-PE, es-PR, es-SV, ff-Latn-NG, fil, fil-PH, ga, ga-IE, gd, gd-GB, gsw, gsw-CH, gsw-LI, guz, guz-KE, gv,
    gv-IM, ha, ha-Latn, ha-Latn-GH, ha-Latn-NE, ha-Latn-NG, haw, haw-US, ibb, ibb-NG, ig, ig-NG, ii, ii-CN, it-CH, jmc, jmc-TZ, kam, kam-KE, kde, kde-TZ, khq, khq-ML, ki, ki-KE, kln, kln-KE, ko-KP, kr, kr-Latn,
    kr-Latn-NG, ks, ks-Arab, ks-Arab-IN, ksb, ksb-TZ, ku-Arab-IR, kw, kw-GB, lag, lag-TZ, lg, lg-UG, lkt, lkt-US, lrc, lrc-IQ, lrc-IR, luo, luo-KE, luy, luy-KE, mas, mas-KE, mas-TZ, mer, mer-KE, mfe, mfe-MU, mg,
    mg-MG, mgo, mgo-CM, ml, ml-IN, mn, mn-Cyrl, mn-MN, ms, ms-MY, ms-SG, mt, mt-MT, my, my-MM, mzn, mzn-IR, naq, naq-NA, nd, nd-ZW, ne-IN, nso, nso-ZA, nus, nus-SS, nyn, nyn-UG, om, om-ET, om-KE, pa-Arab, pa-Arab-PK, rm,
    rm-CH, rof, rof-TZ, rwk, rwk-TZ, saq, saq-KE, sbp, sbp-TZ, sd, sd-Arab, sd-Arab-PK, ses, ses-ML, si, si-LK, sn, sn-Latn, sn-Latn-ZW, so, so-DJ, so-ET, so-KE, so-SO, ssy, ssy-ER, sw, sw-KE, sw-TZ, sw-UG, ta-LK,
    ta-MY, ta-SG, teo, teo-KE, teo-UG, ti, ti-ER, ti-ET, tig, tig-ER, tn, tn-BW, tn-ZA, to, to-TO, twq, twq-NE, ur, ur-IN, ur-PK, vai, vai-Latn, vai-Latn-LR, vai-Vaii, vai-Vaii-LR, vo, vo-001, vun, vun-TZ, wal,
    wal-ET, xh, xh-ZA, xog, xog-UG, yi, yi-001, yo, yo-BJ, yo-NG, zh-Hans-HK, zh-Hans-MO, zu, zu-ZA

ListSeparator ;   NumberDecimalSeparator , (375)
    af, af-NA, af-ZA, agq, agq-CM, ar-MR, ast, ast-ES, az, az-Cyrl, az-Cyrl-AZ, az-Latn, az-Latn-AZ, ba, ba-RU, bas, bas-CM, be, be-BY, bg, bg-BG, br, br-FR, bs, bs-Cyrl, bs-Cyrl-BA, bs-Latn, bs-Latn-BA, ca,
    ca-AD, ca-ES, ca-ES-valencia, ca-FR, ca-IT, co, co-FR, cs, cs-CZ, cu, cu-RU, da, da-DK, da-GL, de, de-AT, de-BE, de-DE, de-IT, de-LU, dsb, dsb-DE, dua, dua-CM, dyo, dyo-SN, el, el-CY, el-GR, en-ID, eo, eo-001, es,
    es-AR, es-BO, es-CL, es-CO, es-CR, es-EC, es-ES, es-GQ, es-PH, es-PY, es-UY, es-VE, et, et-EE, eu, eu-ES, ewo, ewo-CM, ff, ff-Latn, ff-Latn-BF, ff-Latn-CM, ff-Latn-GH, ff-Latn-GM, ff-Latn-GN,
    ff-Latn-GW, ff-Latn-LR, ff-Latn-MR, ff-Latn-NE, ff-Latn-SL, ff-Latn-SN, fi, fi-FI, fo, fo-DK, fo-FO, fr, fr-029, fr-BE, fr-BF, fr-BI, fr-BJ, fr-BL, fr-CA, fr-CD, fr-CF, fr-CG, fr-CH, fr-CI, fr-CM,
    fr-DJ, fr-DZ, fr-FR, fr-GA, fr-GF, fr-GN, fr-GP, fr-GQ, fr-HT, fr-KM, fr-LU, fr-MA, fr-MC, fr-MF, fr-MG, fr-ML, fr-MQ, fr-MR, fr-MU, fr-NC, fr-NE, fr-PF, fr-PM, fr-RE, fr-RW, fr-SC, fr-SN, fr-SY, fr-TD, fr-TG,
    fr-TN, fr-VU, fr-WF, fr-YT, fur, fur-IT, fy, fy-NL, gl, gl-ES, gsw-FR, hr, hr-BA, hr-HR, hsb, hsb-DE, hu, hu-HU, ia, ia-001, id, id-ID, is, is-IS, it, it-IT, it-SM, it-VA, jgo, jgo-CM, jv, jv-Java, jv-Java-ID,
    jv-Latn, jv-Latn-ID, ka, ka-GE, kab, kab-DZ, kea, kea-CV, kk, kk-KZ, kkj, kkj-CM, kl, kl-GL, ksf, ksf-CM, ksh, ksh-DE, ky, ky-KG, lb, lb-LU, ln, ln-AO, ln-CD, ln-CF, ln-CG, lo, lo-LA, lt, lt-LT, lu, lu-CD, lv, lv-LV, mgh,
    mgh-MZ, mk, mk-MK, ms-BN, mua, mua-CM, nb, nb-NO, nb-SJ, nds, nds-DE, nds-NL, nl, nl-AW, nl-BE, nl-BQ, nl-CW, nl-NL, nl-SR, nl-SX, nmg, nmg-CM, nn, nn-NO, nnh, nnh-CM, no, nr, nr-ZA, oc, oc-FR, os, os-GE, os-RU, pl,
    pl-PL, prg, prg-001, prs, prs-AF, ps, ps-AF, pt, pt-AO, pt-BR, pt-CH, pt-CV, pt-GQ, pt-GW, pt-LU, pt-MO, pt-MZ, pt-PT, pt-ST, pt-TL, rn, rn-BI, ro, ro-MD, ro-RO, ru, ru-BY, ru-KG, ru-KZ, ru-MD, ru-RU, ru-UA, rw,
    rw-RW, sah, sah-RU, se, se-FI, se-NO, se-SE, seh, seh-MZ, sg, sg-CF, shi, shi-Latn, shi-Latn-MA, shi-Tfng, shi-Tfng-MA, sk, sk-SK, sl, sl-SI, sma, sma-NO, sma-SE, smj, smj-NO, smj-SE, smn, smn-FI, sms, sms-FI,
    sq, sq-AL, sq-MK, sq-XK, sr, sr-Cyrl, sr-Cyrl-BA, sr-Cyrl-ME, sr-Cyrl-RS, sr-Cyrl-XK, sr-Latn, sr-Latn-BA, sr-Latn-ME, sr-Latn-RS, sr-Latn-XK, ss, ss-SZ, ss-ZA, st, st-LS, st-ZA, sv, sv-AX, sv-FI,
    sv-SE, sw-CD, tg, tg-Cyrl, tg-Cyrl-TJ, tk, tk-TM, tr, tr-CY, tr-TR, ts, ts-ZA, tt, tt-RU, tzm, tzm-Arab, tzm-Arab-MA, tzm-Latn, tzm-Latn-DZ, tzm-Latn-MA, tzm-Tfng, tzm-Tfng-MA, uk, uk-UA, uz, uz-Arab,
    uz-Arab-AF, uz-Cyrl, uz-Cyrl-UZ, uz-Latn, uz-Latn-UZ, ve, ve-ZA, wae, wae-CH, wo, wo-SN, yav, yav-CM, zgh, zgh-Tfng, zgh-Tfng-MA

ListSeparator ,   NumberDecimalSeparator , (20)
    arn, arn-CL, en-150, en-AT, en-BE, en-CH, en-DE, en-DK, en-FI, en-NL, en-SE, en-SI, en-ZA, gn, gn-PY, quz, quz-BO, quz-EC, vi, vi-VN

ListSeparator ?   NumberDecimalSeparator . (4)
    dv, dv-MV, nqo, nqo-GN

ListSeparator ?   NumberDecimalSeparator / (2)
    fa, fa-IR

ListSeparator ?   NumberDecimalSeparator . (3)
    ku, ku-Arab, ku-Arab-IQ

Doesn't look like it can easily be inferred, there are even a few cultures where they use the same ListSeparator and NumberDecimalSeparator.

tarekgh commented 4 years ago

@olmobrutall @ANahr thanks for your analysis. To clarify, we have switched to use ICU on Windows. Unfortunately, ICU doesn't have a property matching ListSeparator as Windows designed it. That is why for now we are using the Thousand separator as a fallback for List separator. I agree this is not the best and we need to enhance this experience. We are looking at if we can derive the list separator from ICU pattern separator but need more investigation to ensure if this will be satisfying outcome.

@KalleOlaviNiemitalo thankfully linked the other issue tracking the exact problem we are talking about here. as we are in very late stage of 5.0 release, it will be very risky to change that now. For sure we'll address this in the next release.

After checking our code this would be a .Net 5 blocker for us for the majority of our software products unless we would remove all uses of ListSeparator. A ListSeparator has no connection of any kind (and should not have) to the ThousandsSeparator

We always recommend developers to not take any dependency on the Globalization data nor assuming the data is never change. Globalization data can change at anytime. You may look at the Shawn Steele blog who is in Windows team describing that. We always recommend to use Invariant culture for formatting and parsing back any data has Globalization properties. Otherwise, the parsing can break at anytime when Globalization data changes. If it is really an issue for you, you have the option to use the System.Globalization.UseNls config switch to go back old behavior. Or you can customize the used TextInfo to set the ListSeparator to the value that prevent breaking. or do something like teh following extension method:

    public static string GetListSeparator(this TextInfo textInfo) => string.IsNullOrWhiteSpace(textInfo.ListSeparator) ? ";" : textInfo.ListSeparator;

by the way, does your product currently support running on Linux?

Last, I want to say the current behavior (which I agree is not the best) is used on Linux for awhile now. I agree with you we should do something here.

We are really appreciating your feedback and please let me know if the current plan I mentioned is reasonable for you.

KalleOlaviNiemitalo commented 4 years ago

I have been using ListSeparator for human-readable output, never for parsing. I suspect that hardcoding "," or ", " would be acceptable for more cultures than copying the ThousandsSeparator.

olmobrutall commented 4 years ago

by the way, does your product currently support running on Linux?

While I have applications running in Docker for production, parsing Csv files is typically done in ETL processes that we run in windows, so it has gone unnoticed for now.

About GetListSeparator, the current implementation of ListSeparator not only return null or string.Empty. Also returns . in a lot of cases.

For now I use:

public static string GetListSeparator(this CultureInfo culture) => string.IsNullOrWhiteSpace(culture.NumberInfo.NumberDecimalSeparator) == ',' ? ";" : ",";

This works for the first 180 cultures (english group) and the this group of 375 (europe group) but leaves the other cultures broken.

Can someone from this groups check what excel does when you export to CSV?

tarekgh commented 4 years ago

While I have applications running in Docker for production, parsing Csv files is typically done in ETL processes that we run in windows, so it has gone unnoticed for now.

It will be good to fix that now :-)

About GetListSeparator, the current implementation of ListSeparator not only return null or string.Empty. Also returns . in a lot of cases.

you can apply any customization logic to GetListSeparator for now as you indicated in your code.

This works for the first 180 cultures (english group) and the this group of 375 (europe group) but leaves the other cultures broken.

Could you tell more about the broken cases when doing GetListSeparator?

Can someone from this groups check what excel does when you export to CSV?

I am wonder about your scenario now. why you don't use a fixed list separator value (like the one in Invariant culture) across all data which will guarantee will work all the time across all cultures.

olmobrutall commented 4 years ago

Hi @tarekgh,

If you use Excel in Spain or Germany (any most (all?) of the continental Europe) and export to CSV instead of:

Year,Make,Model,Length
1997,Ford,E350,2.35
2000,Mercury,Cougar,2.38

you get

Year;Make;Model;Length
1997;Ford;E350;2,35
2000;Mercury;Cougar;2,38

Notice how the decimal numbers use ,, so we use ; to separate cells.

I'm sure there are ways to configure Excel to export the english way, but CSV is typically used for one time data-loading scenarios connecting different departments, provided by the customer or other third party company, downloaded from internet, etc.. it is very convenient to be able to set the culture once, and the CSV library switches number format, date format, and separator.

Csv.ReadFile<ProductInfoCsv>("productos.csv", culture: CultureInfo.GetCulture("es-ES")).Select(prod => ...)

By using the code that I mentioned (ups... corrected now):

public static string GetListSeparator(this CultureInfo culture) => culture.NumberInfo.NumberDecimalSeparator == ',' ? ";" : ",";

This will work for the European cultures and for the English cultures, but if someone from the second group (for example latin america es-DO, es-GT, es-HN, es-NI, es-PA, es-PE, es-PR) exports from Excel to CSV, maybe it will produce something like this:

Year;Make;Model;Length
1997;Ford;E350;2,35
2000;Mercury;Cougar;2.38

(notice the . in the decimal number. Confirmation of this behavior pending!)

And my simple heuristic won't work, because it will assume that is english style and expect , to be the separator.

@KalleOlaviNiemitalo I don't think using ", " or "," is useful for anybody, why use CultureInfo.Current.TextInfo.ListSeparator instead of just ", "? Just so maybe they change it in the future and what you formatted is completely crazy in other languages?

Remember that in Europe we use , to enumerate stuff, not ;. The ; is just a Excel CSV thing.

Also, checking for TextInfo.ListSeparator in google, looks like the people using it right now are CSV libraries:

https://dotnetfiddle.net/MAm1t1 https://github.com/JoshClose/CsvHelper/issues/918

tarekgh commented 4 years ago

@olmobrutall your logic is already broken. it is very possible can use a separator which not necessary match what is stored in the culture. Excel settings allow that. I believe to be able to get what Excel using is to call Excel APIs something like Application.International(xlListSeparator). without asking Excel for the separator, your logic will be fragile and easy to break.

The other idea could be, always include some fixed data in the excel sheet which you can derive always derive the separator from. for example, if the headers always constant like Year;Make;Model;Length then parse it and get the separator.

olmobrutall commented 4 years ago

... it’s not broken. We have used it in hundred of different CSV (mainly Europe and English, this is right).

Yes, you can configure excel to do non standard stuff, and if we ever need to parse some heterogeneous CSV like that we can add some configuration for it, but that didn’t happen yet.

About checking Application.International(xlListSeparator), we very often parse Csv that are generated in another machine, sometimes the same application parses CSV from different cultures. Maybe we parse the CSV in an machine without excel installed.

We just provide some extra configuration to the CSV library, like the encoding, the culture, how many header lines to ignore (typically 0 or 1).

You could use some heuristics to try to detect this information, but this is error prone and potentially slow. Excel tries to do it an always gets it wrong :)

tarekgh commented 4 years ago

@olmobrutall I cannot think in any solid information more than capturing the data you need in the collected CSV document and then use it to parse it. If you are providing the culture to CSV, why you don't provide Invariant culture all the time to guarantee specific formats?

anyway, I don't think anything we do in the .NET here will help you much as the problem looks more specific to how you parse Excel data.

You could use some heuristics to try to detect this information, but this is error prone and potentially slow. Excel tries to do it an always gets it wrong :)

If you can have always some fixed data in the doc, then it shouldn't be error prone and should be very predictable.

Considering the discussion here, do you mind we close this issue now as we have some issue tracking fixing TextInfo.ListSeparator in next release? feel free to send more question or asks and we'll be happy to help with.

olmobrutall commented 4 years ago

If you are providing the culture to CSV, why you don't provide Invariant culture all the time to guarantee specific formats?

Why File.ReadAllLines takes an Encoding encoding parameter instead of always using UTF8? Because you don't control the files that need to be opened, they come from third-party sources. It's a general-purpose Csv reader (and writer) library that will be used in ETL processes, like a small C# Script that reads data from a few sources and writes in a database, not an application that need to save and open a well-defined format.

If you can have always some fixed data in the doc, then it shouldn't be error prone and should be very predictable.

You don't control the source, for example imagine you want to import the CSV files in this page: https://portalestadistico.com/?pn=portalestadistico&pc=AAA00&idp=10003&idpl=100004&idioma=

Considering the discussion here, do you mind we close this issue now as we have some issue tracking fixing TextInfo.ListSeparator in next release? feel free to send more question or asks and we'll be happy to help with.

I already have a solution that is... Ok(ish) for me. But the current behavior of ListSeparator is just a fall-through. I understand that the .Net 5.0 needs to be released, maybe the best solution will be to mark the property as [Obsolete] or throw a NotImplementedException for now.

tarekgh commented 4 years ago

But the current behavior of ListSeparator is just a fall-through. I understand that the .Net 5.0 needs to be released, maybe the best solution will be to mark the property as [Obsolete] or throw a NotImplementedException for now.

I don't think it is a good idea to throw or mark it obsolete. we are going to fix it in the next releases anyway. Anyone can switch back to the old behavior anyway using the config switch. And the current behavior is already used on Linux for long time now.

Thanks for all your feedback and discussion and feel free to ping us if you get any more questions.

ANahr commented 4 years ago

This is a regression for .Net Core 3.1 on Windows -> .Net 5
This is a regression for ANY .Net Framework -> .Net 5 Imho this is a serious issue that should be fixed asap. Is just hard to spot because it a) by chance does not hit any EN-cultures in default cases (no user overrides) b) affects processes that are untestable or at least problematic to test automatically.

As much as I hate this option but even throwing a PlatformNotSupportedException would be far better than the current situation. For a short term personally I'd be pragmatic and try to find a heuristic that at least handles the 95% cases (Like all EN-Cultures get ",", all others get ";").

I have seen three types of usage in our code and the current situation breaks all of them:

1) User-(UI)-Text output of list elements. This is generally the most unproblematic, however would be very annoying because e.g. "." is in NO culture worldwide a reasonable list separator, however several cultures (like DE) deliver that. Imagine: "Data: 34,43.34.42,43"

2) CSV Export/Import: CSV formats are usually/always culture specific. Using the ListSeparator is a reasonable default that will work in the majority of cases wordwide (or at least in countries for which we develop software).

3a) Excel-"Interfacing" through "excel.exe export.csv": This currently works flawlessly because excel uses the same system list separator. This would be very inconvenient to implement yourself because it would require native dependecies to get user locale overrides (which excel also uses).

3b) Excel-Interfacting though COM-Automation: Although Excel offers functions to get internationalization these are known to not work in lots of situations (e.g. multilanguage Excel install but lots more) whereas list separator basically always works (unless VERY stupid user locale overrides, in which Excel itself will not work correctly anyways)

Considering the "workarounds": 1) Using Invariant: This is no workaround. We are talking about culture specific functions in the first place. If you could use Invariant there would be no need to use any culture-specific function at all. 2) Using the Config-Switch: This is highly problematic. CSV/Export is usually defined in libraries. As far as I see this is only available as an app-wide switch. Internally we have several hundered applications and there are more which are developed by customers over which we have no control at all. Even just internally this would be a huge undertaking just to inform all product owners, find out who even uses these libraries, etc. and it would be terribly error-prone. 3) Stop using ListSeparator and implement our own. Partially possible but a really bad situation for the following reasons: A) Code duplication B) Risk of multiple differing implementation and worst: C) would likely require native code/pinvokes for getting user overrides from the OS (which comes with a lot of additional problems). D) Efford E) Additional error source (might be overlooked in the future)

tarekgh commented 4 years ago

This is a regression for .Net Core 3.1 on Windows -> .Net 5

This is the effect of switching to use ICU. You still have the option to go back to the old behavior by using the config switch if you don't like the new behavior. as I mentioned before we'll look enhancing this in next release.

This is a regression for ANY .Net Framework -> .Net 5

This is not true. running .NET Core on Linux was always has this behavior.

As much as I hate this option but even throwing a PlatformNotSupportedException would be far better than the current situation.

I am not sure how this is going to help at all? if someone run into this exception, what you think how they can handle it? and what about the libraries that call it and you don't have control over it? this will be much worse.

Using Invariant: This is no workaround. We are talking about culture specific functions in the first place. If you could use Invariant there would be no need to use any culture-specific function at all.

Well you are expecting we .NET handle what to expect from Excel as a separator. as I mentioned before whatever you do in .NET will not make this robust as the design to parse any data from excel using .NET separator is wrong. Out guidelines always to have predictable formats that you ensure can be parsed. Invariant is what gives that. If Invariant cannot be used in such cases, the apps/libs should have a way to ensure what format used in Excel.

Using the Config-Switch: This is highly problematic. CSV/Export is usually defined in libraries. As far as I see this is only available as an app-wide switch. Internally we have several hundered applications and there are more which are developed by customers over which we have no control at all. Even just internally this would be a huge undertaking just to inform all product owners, find out who even uses these libraries, etc. and it would be terribly error-prone.

I am wondering how did you have running on Linux before which has the exact case we are talking about here?

Stop using ListSeparator and implement our own. Partially possible but a really bad situation for the following reasons: A) Code duplication B) Risk of multiple differing implementation and worst: C) would likely require native code/pinvokes for getting user overrides from the OS (which comes with a lot of additional problems). D) Efford E) Additional error source (might be overlooked in the future)

I wouldn't recommend that either but at least you can add your own fallback mechanism when using it.

tarekgh commented 4 years ago

@ANahr could you please tell more about your scenario that is broken for you? does your library/app doing the same as @olmobrutall scenario. more information here can help more suggesting more work arounds if needed.

ANahr commented 4 years ago

This is a regression for ANY .Net Framework -> .Net 5

This is not true. running .NET Core on Linux was always has this behavior.

Sorry if I didn't make that clear: I was talking about .Net Framework (4.7/4.8) which for me implies running on Windows.

As much as I hate this option but even throwing a PlatformNotSupportedException would be far better than the current situation.

I am not sure how this is going to help at all? if someone run into this exception, what you think how they can handle it? and what about the libraries that call it and you don't have control over it? this will be much worse.

E.g. we did some tests to "port" applications from .Net 4.7 to .Net 5.0. This problem however did not pop up in the tests because the changed outputs were unnoticed (likely because they are part of complex intra-app processes that are not easily testable). It was just by chance that I saw this issue and tested specifically and found out these things don't work for us anymore. If it would have thrown exceptions the tests would have found the problems and we would have been aware of them. If the portability analyzer had them we would also have seen the problem.

Using the Config-Switch: This is highly problematic. CSV/Export is usually defined in libraries. As far as I see this is only available as an app-wide switch. Internally we have several hundered applications and there are more which are developed by customers over which we have no control at all. Even just internally this would be a huge undertaking just to inform all product owners, find out who even uses these libraries, etc. and it would be terribly error-prone.

I am wondering how did you have running on Linux before which has the exact case we are talking about here?

None of these ever ran on Linux (and some of them never will).

ANahr commented 4 years ago

@ANahr could you please tell more about your scenario that is broken for you? does your library/app doing the same as @olmobrutall scenario. more information here can help more suggesting more work arounds if needed.

As written: All are broken:

Simple output because it is not reasonable to use . as a separator
CSV because none of the applications that we create output for would be able to handle a . or a space as a separator (, or ; would be ok)
Excel because situations with multiple ranges (e.g. for conditional formatting and merging) just don't work anymore.

olmobrutall commented 4 years ago

@ANahr thanks for your support. It's obvious that they are in a lot of pressure to ship .Net 5.

Maybe adding this Excel-CSV specific property into CultureInfo API was not a good idea in .Net Framework 1.1, but now people are relying on it.

I would prefer an NotImplementedException / Obsolete than silently changing the behavior.

@tarekgh I don't think the Linux arguments is too definitive... CSV imports from Excel are more probably happening in offices using Windows, not in Docker containers in production.

More usages of the property, related to Excel: https://stackoverrun.com/de/q/1727125 https://github.com/dotnet/runtime/issues/536 https://aakinshin.net/posts/how-listseparator-depends-on-runtime-and-operating-system/ http://winintro.ru/windowspowershell2corehelp.en/html/584a477e-4bf4-4981-85f1-4542a2639177.htm

tarekgh commented 4 years ago

@ANahr I am not sure I understand your scenario yet. I was asking, do you have a library or app is doing what you listed, simple output, CSV? could you share some more details what is the problem you run into it because of that?

Sorry if I didn't make that clear: I was talking about .Net Framework (4.7/4.8) which for me implies running on Windows.

Thanks for clarifying. Windows data can change from release to release. look at docs.microsoft.com/en-us/archive/blogs/shawnste/locale-culture-data-churn which written by Windows team warning from that. Assuming globalization data will never change is wrong and it is possible to get different data when using different Windows versions regardless using .NET Framework or .NET core.

I would prefer an NotImplementedException / Obsolete than silently changing the behavior.

This will make it much worse. Getting code throwing exception which not was throwing before is much concern than return not accurate value from this property. Maybe the best here is to have some code analyzer help in pointing at where the users have to pay more attention. I am not sure either how this will help you in your scenario either.

I don't think the Linux arguments is too definitive... CSV imports from Excel are more probably happening in offices using Windows, not in Docker containers in production.

I frankly disagree here. .NET used on MacOS too which has Office support too. Also I don't want to scope this to Excel issue. As I mentioned before your design of parsing Excel with the assumption that globalization data cannot change is totally wrong and broken. We always recommend what I said before, don't take any dependency on cultural globalization data, when necessary depend on fixed formats (like Invariant).

KalleOlaviNiemitalo commented 4 years ago

I don't think using ", " or "," is useful for anybody, why use CultureInfo.Current.TextInfo.ListSeparator instead of just ", "?

According to CLDR data, most locales use {0}, {1} as the middle pattern of "and" lists", the middle pattern of short "and" lists, or the middle pattern of narrow "and" lists. Some locales use different patterns though; e.g. Japanese has {0}、{1}. So if .NET were to temporarily hardcode , as the ListSeparator when using ICU, and applications used this ListSeparator to format lists for users, then I think the results would be OK for many locales, and a later version of .NET would be able to correct the ListSeparator for the remaining locales without needing more changes in applications.

Anyway, it seems the plan is to keep the bad ListSeparator in .NET 5 (https://github.com/dotnet/runtime/pull/43813 was closed without merging), fix ListSeparator properly in .NET 6 (https://github.com/dotnet/runtime/issues/536), and not make such a partial fix in between.

KalleOlaviNiemitalo commented 4 years ago

3a) Excel-"Interfacing" through "excel.exe export.csv"

When exporting CSV files specifically for Excel to read, a [sep=, line at the top of the file](https://superuser.com/questions/773644/what-is-the-sep-metadata-you-can-add-to-csvs "microsoft excel - What is the \"sep=\" metadata you can add to CSVs? - Super User") might be a more reliable solution, as it makes the list separator independent of locale settings. That does not help with the decimal separator, though.

rubenprins commented 4 years ago

I think that focusing on Excel and CSV files all the time muddles the issue: using the thousands separator as a list separator is the worst possible fallback that could have been used. Hard-coding a comma would even be better (albeit not what Excel would expect).

A "." or non-breaking space is never a list separator, AFAICT. The choice for the thousands separator works exclusively for locales using English-based number formatting, that happens to use the comma, almost by accident.

olmobrutall commented 4 years ago

I think this conversation is going running in circles... on one camp there is people that is using this property today to solve real world problems (me and @ANahr), on the other there is people that has never used this property and will never do because it solves a problem that they don't have.

@KalleOlaviNiemitalo

So if .NET were to temporarily hardcode , as the ListSeparator when using ICU, and applications used this ListSeparator to format lists for users, then I think the results would be OK for many locales, and a later version of .NET would be able to correct the ListSeparator for the remaining locales without needing more changes in applications.

Yes, maybe using ", " or "," is more intuitive behavior, but is useless, because if if you really want to use it to display data in different cultures you need all the complexity in: https://unicode-org.github.io/cldr-staging/charts/37/by_type/miscellaneous.displaying_lists.html#4f8acaf2d32aff3a. This requires something like 16 different properties.

If you want a solution that is good enough, just hard-code ", " like everybody does.

Then this property was added in .Net Framework 1.1, they where not thinking in a humanized list display library, they where thinking in parsing CSV (@tarekgh maybe you have access to the source code/internal documentation and confirm it). At least this is what it looks if you search for TextInfo.ListSeparator in google: most of the results are related to import/export CSV-like files.

@tarekgh

I frankly disagree here. .NET used on MacOS too which has Office support too.

In this add, who is more probably parsing CSV files from excel? PC Guy or Apple Guy?

As I mentioned before your design of parsing Excel with the assumption that globalization data cannot change is totally wrong and broken.

There are way too many different DateTime format/Numeric formats/ListSeparator/TimeZones That's the reasons there are databases to keep all this information and free developers from the burden of maintaining it themselves. And yes, of course localization data could be corrected or changed for political reasons. And when it happens is good that all the applications change it more or less at sync. Like Excel exporting with a new format and the application importing with the new format as well.

This is not the case here... we are talking about a change that is made for technical reasons. Do you realize that you are defending a fall-through implementation, right?

If you don't want that people take a dependency to your code... have you considering not releasing it?

We always recommend what I said before, don't take any dependency on cultural globalization data, when necessary depend on fix formats (like Invariant).

I'm trying to open CSV files generated from Excel, an application that is build by your employer, and not only localizes the CSV, it even localizes the function names in the formulas: http://es.excelfunctions.eu/

I would love that all the files that Excel generates use , as cell separator, . as decimals separator and are encoded in UTF8. But I do not control the format (This is also an argument for the sep=).

Again, just imagine you need to import one of the files in https://portalestadistico.com/?pn=portalestadistico&pc=AAA00&idp=10003&idpl=100004&idioma=.

This is not a cherry-picked example. This is how CSV works in continental Europe (and many other countries).

@rubenprins

I think that focusing on Excel and CSV files all the time muddles the issue: using the thousands separator as a list separator is the worst possible fallback that could have been used. Hard-coding a comma would even be better (albeit not what Excel would expect).

I think you mixing two things here. Using . as list separator is the current crappy fall-through implementation, not what excel produces/requires in any culture.

Conclusion

I agree that having a Excel-CSV specific stuff into System.Globalization is... strange. But I think this was the original intent anyway.

Implementing it using ICU is hard / impossible, because of course they are not going to have Excel-CSV specific stuff in a Unix library.

The property is not that important as to stop or delay .Net 5 or the migration to ICU.

Solution: Just deprecate the property so people know that they have to find their way to parse localized Excel CSV from now on.

olmobrutall commented 4 years ago

A few more links about TextInfo.ListSeparator and CSV:

https://stackoverflow.com/questions/3245387/automatically-generate-a-locale-csv-file-for-excel-in-c/3246947#3246947

Look like is related to LOCALE_SLIST in windows, everything related to CSV

https://community.dynamics.com/ax/f/microsoft-dynamics-ax-forum/211925/find-locale-list-and-decimal-seperator-locale_slist-locale_sdecimal?pifragment-96834=1 https://stackoverflow.com/questions/62705100/table-of-locale-slist-and-locale-sdecimal-by-language https://www.perlmonks.org/?node_id=850181

tarekgh commented 4 years ago

[edited]

I agree that having a Excel-CSV specific stuff into System.Globalization is... strange. But I think this was the original intent anyway.

This is not true. Nobody claimed before TextInfo.ListSeparator is to parse CSV files.

Implementing it using ICU is hard / impossible, because of course they are not going to have Excel-CSV specific stuff in a Unix library.

I am repeating myself here, your decision to use ListSepartor to parse CSV files written in different culture is wrong. .NET never promised support that and even I cannot think of a way can support that while users can customize Excel settings. You should get the separator from Excel to have reliable parsing.

The property is not that important as to stop or delay .Net 5 or the migration to ICU.

It is important but not a blocker.

Solution: Just deprecate the property so people know that they have to find their way to parse localized Excel CSV from now on.

I suggested before we can add analyzer for that. Also, we can enhance the docs if needed. Also, I can see this property can be useful in other scenarios when formatting your own list.

olmobrutall commented 4 years ago

It's true that the documentation of ListSeparator and LOCALE_SLIST is intentionally ambiguous, but here more evidence that THE WORLD is using ListSeparator and LOCALE_SLIST for localized excel CSVs.

https://community.intel.com/t5/Intel-Fortran-Compiler/CSV-delimiter-detection/td-p/956160 http://www.dataaccess.com/KBasePublic/KBPrint.asp?ArticleID=1388 https://it.mathworks.com/matlabcentral/answers/343998-how-can-i-recognize-automatically-the-list-separator-in-the-region-and-languages-settings http://www.vbaexpress.com/forum/showthread.php?42769-save-as-csv-file-format/page2 https://www.mrexcel.com/board/threads/reading-large-dataset-in-excel.929526/ https://forums.ni.com/t5/LabWindows-CVI/Creating-CSV-file-according-with-the-international-settings/td-p/3184019?profile.language=en http://archives.miloush.net/michkap/archive/2007/08/15/4396922.html https://engineertips.wordpress.com/2018/06/07/local-csv-separator-in-delphi/ https://www.generacodice.com/en/articolo/130293/How-to-read-%27List-separator%27-from-OS-in-Java http://www.office-loesung.de/ftopic87375_0_0_asc.php https://codereview.stackexchange.com/questions/126146/change-list-separator-parse-file-restore-list-separator-to-original-value https://www.experts-exchange.com/questions/24491939/Convert-a-xls-File-to-a-CSV-file-SemiColon-Seperated.html https://living-sun.com/excel/249689-how-to-export-excel-to-csv-file-with-ldquordquo-delimted-and-utf-8-code-excel-vba-csv-utf-8.html http://www.delphigroups.info/2/10/861201.html https://forums.codeguru.com/showthread.php?348853-Regional-config-of-panel-control

Do you really need more evidence?

tarekgh commented 4 years ago

It's true that the documentation of ListSeparator and LOCALE_SLIST is intentionally ambiguous, but here more evidence that THE WORLD is using ListSeparator and LOCALE_SLIST for localized excel CSVs.

Thanks for the links. sorry if I seamed giving some hard time but this was not my intention.

I totally understand people used this APIs in wrong way when used it for CSV. I was trying to say neither .NET nor Windows documented anything saying that. I agree there is a problem here we need to help. That is why I suggested to have some code analyzer warn users when using API to ensure their usage is correct and clarify the problems they can run into. Also, we should clarify our docs too.

Thanks again for the whole feedback you sent.

olmobrutall commented 4 years ago

What you have in mind for the analyzer?:

ListSeparator is broken in .Net 5, wait for .Net 6 and we will fix it <-- OK
ListSeparator not does not return your Excel CSV separator, instead it returns a constant ',' so it's quite useless anyway, use it if you are paid by lines of code. <-- Not OK
ListSeparator not does not return your Excel CSV separator, instead it returns the ThousandSeparator, have fun!. <-- Not OK

:D

tarekgh commented 4 years ago

What I am thinking is something like:

ListSeparator is not intended to parse any arbitrary data that not guaranteed be formatted using the exact same separator value.

The analyzer should be used in the future releases too to avoid the original problem which users used it to parse CSV files while it is not designed for.

olmobrutall commented 4 years ago

And what is it intended for then? The first 1% of library for displaying list localized for humans?

How you plan to fit https://unicode-org.github.io/cldr-staging/charts/37/by_type/miscellaneous.displaying_lists.html#4f8acaf2d32aff3a into one string property?

Over an already ambiguous documentation "Gets or sets the string that separates items in a list." you put a warning discouraging the usage. Is there any real use case for this property using the new implementations? And how you justify the . being returned for es/ de/fr/...? There is no language that uses . as list separator, no matter the interpretation.

I think the problem is clear:

The code is clearly broken, it was identified as broken in .Net 3.1 in UNIX and now is going to affect many people.

The proper implementation is not trivial. If ICU will have an equivalent of LOCALE_SLIST nobody would be trying to redefine the meaning of ListSeparator, but it doesn't.

Also, you all are about to release .Net 5 and you don't want/can't make changes in the code, like marking it as Obsolete, so you're trying to sell me that this is not a bug, it's a preciously crafted fallback implementations that has been carefully designed.

I can understand deadlines, but just be honest.

KalleOlaviNiemitalo commented 4 years ago

I totally understand people used this APIs in wrong way when used it for CSV. I was trying to say neither .NET nor Windows documented anything saying that.

Windows PowerShell 5.1 documents the use of ListSeparator in CSV, and PowerShell 7 continues that; see the -UseCulture switch in ConvertFrom-Csv, ConvertTo-Csv, Export-Csv, and Import-Csv. I don't think it has been filed as a bug, although the behavior is mentioned in https://github.com/PowerShell/PowerShell/issues/11754.

olmobrutall commented 4 years ago

@krwq looks like you are from Poland. Maybe you are aware of the ; separated CSV files in continental Europe?

KalleOlaviNiemitalo commented 4 years ago

if you really want to use it to display data in different cultures you need all the complexity in: https://unicode-org.github.io/cldr-staging/charts/37/by_type/miscellaneous.displaying_lists.html#4f8acaf2d32aff3a. This requires something like 16 different properties.

I have been thinking about .NET API for formatting lists with UTS 35 list patterns, including the Spanish and Hebrew special rules. Something like this:

namespace System.Globalization
{
    public enum ListPatternType
    {
        Standard,
        StandardShort,
        StandardNarrow,
        Or,
        OrShort,
        OrNarrow,
        Unit,
        UnitShort,
        UnitNarrow,
    };

    public enum ListPatternPartType
    {
        Two,
        Start,
        Middle,
        End,
    }

    public partial class TextInfo
    {
        public string this[ListPatternType patternType, ListPatternPartType partType] { get; set; }
        public string FormatList(ListPatternType patternType, IEnumerable<string> values);
    }
}

Work items:

ListPatternType.Or would clash with a Visual Basic keyword.
A separate enum ListPatternWidth { Default, Short, Narrow } might be nicer, or maybe not.
Should this feature be part of class TextInfo, or a separate class queryable from CultureInfo?
Should perhaps have more overloads, like in String.Join.
Add a method that appends this list to StringBuilder with minimal allocations, like StringBuilder.AppendJoin. Would that take CultureInfo, TextInfo, or IFormatProvider? Currently, CultureInfo.GetFormat(Type) never returns TextInfo.
Add a method that wraps an IEnumerable\<string> as IFormattable, which can then be used in String.Format calls without specifying the culture more than once, and may implement ISpanFormattable (https://github.com/dotnet/runtime/issues/26374) as well. The Default/Short/Narrow width of ListPatternType could be specified in the format string, but the Standard/Or/Unit semantic might be better specified when the IFormattable is created.
I don't know whether the list patterns are even available from the ICU API yet.
What are these going to do on downlevel Windows versions where ICU is not available?
Add properties for checking whether FormatList uses Spanish or Hebrew special rules, and for enabling those rules in custom cultures.
Instead of enum ListPatternPartType, should the API have a class ListPattern with four string properties? That way, an application could construct an instance of class ListPattern and format lists with it, without having to host it in a TextInfo.
In Spanish special rules, how would FormatList recognize that "the numeric value is 11 x 10^3×y"? If it parses the string for that, it may need NumberFormatInfo.NumberGroupSeparator, which is not part of TextInfo.

ANahr commented 4 years ago

I won't repeat myself about the sources for the problem but in the end I see three issues:

Issue 1: The current output is unreasonable for major parts of the world. This is NOT complaining that it would have changed if it would have changed to something reasonable. But it did not. At least not for several/most non-en cultures.

Issue 2: Usage as a CSV separator: @tarekgh understands that "people used this APIs in wrong way when used it for CSV". It doesn't seem to matter that people used ListSeparator for over 2 decades for this purpose. Excel uses it, SPSS uses it, Powershell uses it (and yes I know NOT for 20 years), basically every general purpose CSV processor uses and/or supports it but they are all doing it wrong. And they all did it wrong for 20 years and more and didn't even notice. But irony aside: This is a special case of Issue 1: The current output is unreasonable for use as a CSV separator for major/relevant parts of the cultures.

Issue 3: Usage for Excel-COM-Interop: That would really be an issue if ICU defined a list separator that was differing from the windows one. However it doesn't even define one, so this seems pretty made up. And IF it defined one I would be pretty condident that it would at least be some reasonable one for the corresponding culture.

And just something that made me smile (no good smile): @tarekgh s "I was trying to say neither .NET nor Windows documented anything saying that." With that argument you can also say that you must not use a String to store song lyrics because I bet that "neither .NET nor Windows documented anything saying that."

KalleOlaviNiemitalo commented 4 years ago

AFAICT, this issue does not affect the CSV cmdlets of PowerShell 7.1.0-rc.2 on Windows. Those are using CultureInfo.CurrentCulture.TextInfo.ListSeparator, which respects user overrides rather than the ICU data, so it matches what Control Panel shows.

tarekgh commented 4 years ago

Just to update, I'll try to look at getting some fix in the 5.0 servicing release (something like 5.0.1) so we don't have to wait to 6.0.

svick commented 4 years ago

@tarekgh

Just to update, I'll try to look at getting some fix in the 5.0 servicing release (something like 5.0.1) so we don't have to wait to 6.0.

Doesn't that mean this issue should be reopened?

tarekgh commented 4 years ago

@svick

Doesn't that mean this issue should be reopened?

We have it tracked here https://github.com/dotnet/runtime/issues/536 to include fixing it for Linux too.

ANahr commented 4 years ago

Maybe if keeping #536 it should be renamed and be rewritten because it currently speaks specifically about Unix, whereas most/all problems mentioned in this issue come from either Windows-only problems (e.g. Excel COM) or are regressions from .Net 4.7/4.8 -> .Net 5 or .Net Core 3.1 on Windows -> .Net 5 on Windows.

danmoseley commented 4 years ago

@tarekgh this might be worth adding to https://github.com/dotnet/core/blob/master/release-notes/5.0/5.0-known-issues.md

tarekgh commented 4 years ago

@danmosemsft I'll add something there. Thanks!

tarekgh commented 4 years ago

@danmosemsft I added Preview 4 section in the doc and listed this separator issue under it.

tarekgh commented 4 years ago

I have been looking at CLDR/ICU data to see what separator can be used for different cultures. I found the following which looks more reasonable than what currently NLS is providing. So I want your feedback regarding making .NET 5.0 use such data. Please advise if you see any problem if we do that.

All cultures uses , as a list separator except the cultures of the following languages:

Language	name	Separator	Description
Arabic	ar	\u060c ،	ARABIC COMMA
Amharic	am	\u1363፣	ETHIOPIC COMMA
Urdu	ur	\u060c ،	ARABIC COMMA
Pashto	ps	\u060c ،	ARABIC COMMA
Persian	fa	\u060c \u200f،	ARABIC COMMA, RIGHT-TO-LEFT MARK
Dzongkha	dz	\u0f51 \u0f44 \u0f0b དང་	TIBETAN LETTER DA, LETTER NGA, MARK INTERSYLLABIC TSHEG
Burmese	my	\u0020	Space
Thai	th	\u0020	Space
Tongan	to	\u0020	Space
Japanese	ja	\u3001 、	IDEOGRAPHIC COMMA
Chinese	zh	\u3001 、	IDEOGRAPHIC COMMA

Just for the record, here is what NLS currently returning, I think CLDR data would be better to use instead:


',':   
arn, arn-CL, as, as-IN, bn, bn-BD, bn-IN, bo, bo-CN, chr, chr-Cher, chr-Cher-US, en, en-001, en-029, en-150, en-AE, en-AG, en-AI, en-AS, en-AT, en-AU, en-BB, en-BE, en-BI, en-BM, en-BS, en-BW, en-BZ, en-CA, en-CC, en-CH, en-CK, en-CM, en-CX, en-CY, en-DE, en-DK, en-DM, en-ER, en-FI, en-FJ, en-FK, en-FM, en-GB, en-GD, en-GG, en-GH, en-GI, en-GM, en-GU, en-GY, en-HK, en-IE, en-IL, en-IM, en-IN, en-IO, en-JE, en-JM, en-KE, en-KI, en-KN, en-KY, en-LC, en-LR, en-LS, en-MG, en-MH, en-MO, en-MP, en-MS, en-MT, en-MU, en-MW, en-MY, en-NA, en-NF, en-NG, en-NL, en-NR, en-NU, en-NZ, en-PG, en-PH, en-PK, en-PN, en-PR, en-PW, en-RW, en-SB, en-SC, en-SD, en-SE, en-SG, en-SH, en-SI, en-SL, en-SS, en-SX, en-SZ, en-TC, en-TK, en-TO, en-TT, en-TV, en-TZ, en-UG, en-UM, en-US, en-VC, en-VG, en-VI, en-VU, en-WS, en-ZA, en-ZM, en-ZW, es-MX, es-US, gn, gn-PY, gu, gu-IN, he, he-IL, hi, hi-IN, hy, hy-AM, iu, iu-Cans, iu-Cans-CA, iu-Latn, iu-Latn-CA, ja, ja-JP, km, km-KH, kn, kn-IN, ko, ko-KR, kok, kok-IN, ks-Deva, ks-Deva-IN, la, la-001, mi, mi-NZ, mn-Mong, mn-Mong-CN, mn-Mong-MN, mni, mni-IN, moh, moh-CA, mr, mr-IN, ne, ne-NP, or, or-IN, pa, pa-Guru, pa-IN, pap, pap-029, quc, quc-Latn, quc-Latn-GT, quz, quz-BO, quz-EC, quz-PE, sa, sa-IN, sd-Deva, sd-Deva-IN, syr, syr-SY, ta, ta-IN, th, th-TH, ug, ug-CN, vi, vi-VN, zh, zh-CN, zh-Hans, zh-Hant, zh-HK, zh-MO, zh-SG, zh-TW,

';':   
aa, aa-DJ, aa-ER, aa-ET, af, af-NA, af-ZA, agq, agq-CM, ak, ak-GH, am, am-ET, ar, ar-001, ar-AE, ar-BH, ar-DJ, ar-DZ, ar-EG, ar-ER, ar-IL, ar-IQ, ar-JO, ar-KM, ar-KW, ar-LB, ar-LY, ar-MA, ar-MR, ar-OM, ar-PS, ar-QA, ar-SA, ar-SD, ar-SO, ar-SS, ar-SY, ar-TD, ar-TN, ar-YE, asa, asa-TZ, ast, ast-ES, az, az-Cyrl, az-Cyrl-AZ, az-Latn, az-Latn-AZ, ba, ba-RU, bas, bas-CM, be, be-BY, bem, bem-ZM, bez, bez-TZ, bg, bg-BG, bin, bin-NG, bm, bm-Latn, bm-Latn-ML, bo-IN, br, br-FR, brx, brx-IN, bs, bs-Cyrl, bs-Cyrl-BA, bs-Latn, bs-Latn-BA, byn, byn-ER, ca, ca-AD, ca-ES, ca-ES-valencia, ca-FR, ca-IT, ccp, ccp-Cakm, ccp-Cakm-BD, ccp-Cakm-IN, ce, ce-RU, ceb, ceb-Latn, ceb-Latn-PH, cgg, cgg-UG, co, co-FR, cs, cs-CZ, cu, cu-RU, cy, cy-GB, da, da-DK, da-GL, dav, dav-KE, de, de-AT, de-BE, de-CH, de-DE, de-IT, de-LI, de-LU, dje, dje-NE, dsb, dsb-DE, dua, dua-CM, dyo, dyo-SN, dz, dz-BT, ebu, ebu-KE, ee, ee-GH, ee-TG, el, el-CY, el-GR, en-ID, eo, eo-001, es, es-419, es-AR, es-BO, es-BR, es-BZ, es-CL, es-CO, es-CR, es-CU, es-DO, es-EC, es-ES, es-GQ, es-GT, es-HN, es-NI, es-PA, es-PE, es-PH, es-PR, es-PY, es-SV, es-UY, es-VE, et, et-EE, eu, eu-ES, ewo, ewo-CM, ff, ff-Latn, ff-Latn-BF, ff-Latn-CM, ff-Latn-GH, ff-Latn-GM, ff-Latn-GN, ff-Latn-GW, ff-Latn-LR, ff-Latn-MR, ff-Latn-NE, ff-Latn-NG, ff-Latn-SL, ff-Latn-SN, fi, fi-FI, fil, fil-PH, fo, fo-DK, fo-FO, fr, fr-029, fr-BE, fr-BF, fr-BI, fr-BJ, fr-BL, fr-CA, fr-CD, fr-CF, fr-CG, fr-CH, fr-CI, fr-CM, fr-DJ, fr-DZ, fr-FR, fr-GA, fr-GF, fr-GN, fr-GP, fr-GQ, fr-HT, fr-KM, fr-LU, fr-MA, fr-MC, fr-MF, fr-MG, fr-ML, fr-MQ, fr-MR, fr-MU, fr-NC, fr-NE, fr-PF, fr-PM, fr-RE, fr-RW, fr-SC, fr-SN, fr-SY, fr-TD, fr-TG, fr-TN, fr-VU, fr-WF, fr-YT, fur, fur-IT, fy, fy-NL, ga, ga-IE, gd, gd-GB, gl, gl-ES, gsw, gsw-CH, gsw-FR, gsw-LI, guz, guz-KE, gv, gv-IM, ha, ha-Latn, ha-Latn-GH, ha-Latn-NE, ha-Latn-NG, haw, haw-US, hr, hr-BA, hr-HR, hsb, hsb-DE, hu, hu-HU, ia, ia-001, ibb, ibb-NG, id, id-ID, ig, ig-NG, ii, ii-CN, is, is-IS, it, it-CH, it-IT, it-SM, it-VA, jgo, jgo-CM, jmc, jmc-TZ, jv, jv-Java, jv-Java-ID, jv-Latn, jv-Latn-ID, ka, ka-GE, kab, kab-DZ, kam, kam-KE, kde, kde-TZ, kea, kea-CV, khq, khq-ML, ki, ki-KE, kk, kk-KZ, kkj, kkj-CM, kl, kl-GL, kln, kln-KE, ko-KP, kr, kr-Latn, kr-Latn-NG, ks, ks-Arab, ks-Arab-IN, ksb, ksb-TZ, ksf, ksf-CM, ksh, ksh-DE, ku-Arab-IR, kw, kw-GB, ky, ky-KG, lag, lag-TZ, lb, lb-LU, lg, lg-UG, lkt, lkt-US, ln, ln-AO, ln-CD, ln-CF, ln-CG, lo, lo-LA, lrc, lrc-IQ, lrc-IR, lt, lt-LT, lu, lu-CD, luo, luo-KE, luy, luy-KE, lv, lv-LV, mas, mas-KE, mas-TZ, mer, mer-KE, mfe, mfe-MU, mg, mg-MG, mgh, mgh-MZ, mgo, mgo-CM, mk, mk-MK, ml, ml-IN, mn, mn-Cyrl, mn-MN, ms, ms-BN, ms-MY, ms-SG, mt, mt-MT, mua, mua-CM, my, my-MM, mzn, mzn-IR, naq, naq-NA, nb, nb-NO, nb-SJ, nd, nd-ZW, nds, nds-DE, nds-NL, ne-IN, nl, nl-AW, nl-BE, nl-BQ, nl-CW, nl-NL, nl-SR, nl-SX, nmg, nmg-CM, nn, nn-NO, nnh, nnh-CM, no, nr, nr-ZA, nso, nso-ZA, nus, nus-SS, nyn, nyn-UG, oc, oc-FR, om, om-ET, om-KE, os, os-GE, os-RU, pa-Arab, pa-Arab-PK, pl, pl-PL, prg, prg-001, prs, prs-AF, ps, ps-AF, ps-PK, pt, pt-AO, pt-BR, pt-CH, pt-CV, pt-GQ, pt-GW, pt-LU, pt-MO, pt-MZ, pt-PT, pt-ST, pt-TL, rm, rm-CH, rn, rn-BI, ro, ro-MD, ro-RO, rof, rof-TZ, ru, ru-BY, ru-KG, ru-KZ, ru-MD, ru-RU, ru-UA, rw, rw-RW, rwk, rwk-TZ, sah, sah-RU, saq, saq-KE, sbp, sbp-TZ, sd, sd-Arab, sd-Arab-PK, se, se-FI, se-NO, se-SE, seh, seh-MZ, ses, ses-ML, sg, sg-CF, shi, shi-Latn, shi-Latn-MA, shi-Tfng, shi-Tfng-MA, si, si-LK, sk, sk-SK, sl, sl-SI, sma, sma-NO, sma-SE, smj, smj-NO, smj-SE, smn, smn-FI, sms, sms-FI, sn, sn-Latn, sn-Latn-ZW, so, so-DJ, so-ET, so-KE, so-SO, sq, sq-AL, sq-MK, sq-XK, sr, sr-Cyrl, sr-Cyrl-BA, sr-Cyrl-ME, sr-Cyrl-RS, sr-Cyrl-XK, sr-Latn, sr-Latn-BA, sr-Latn-ME, sr-Latn-RS, sr-Latn-XK, ss, ss-SZ, ss-ZA, ssy, ssy-ER, st, st-LS, st-ZA, sv, sv-AX, sv-FI, sv-SE, sw, sw-CD, sw-KE, sw-TZ, sw-UG, ta-LK, ta-MY, ta-SG, te, te-IN, teo, teo-KE, teo-UG, tg, tg-Cyrl, tg-Cyrl-TJ, ti, ti-ER, ti-ET, tig, tig-ER, tk, tk-TM, tn, tn-BW, tn-ZA, to, to-TO, tr, tr-CY, tr-TR, ts, ts-ZA, tt, tt-RU, twq, twq-NE, tzm, tzm-Arab, tzm-Arab-MA, tzm-Latn, tzm-Latn-DZ, tzm-Latn-MA, tzm-Tfng, tzm-Tfng-MA, uk, uk-UA, ur, ur-IN, ur-PK, uz, uz-Arab, uz-Arab-AF, uz-Cyrl, uz-Cyrl-UZ, uz-Latn, uz-Latn-UZ, vai, vai-Latn, vai-Latn-LR, vai-Vaii, vai-Vaii-LR, ve, ve-ZA, vo, vo-001, vun, vun-TZ, wae, wae-CH, wal, wal-ET, wo, wo-SN, xh, xh-ZA, xog, xog-UG, yav, yav-CM, yi, yi-001, yo, yo-BJ, yo-NG, zgh, zgh-Tfng, zgh-Tfng-MA, zh-Hans-HK, zh-Hans-MO, zu, zu-ZA,

'060c':   
dv, dv-MV, nqo, nqo-GN,

'061b':   
fa, fa-IR, ku, ku-Arab, ku-Arab-IQ,

danmoseley commented 4 years ago

Just curious @tarekgh what part of CLDR does that come from?

tarekgh commented 4 years ago

Just curious @tarekgh what part of CLDR does that come from?

Every locale has a properties listed under listPattern section, something like

        <listPattern>
            <listPatternPart type="start">{0}, {1}</listPatternPart>
            <listPatternPart type="middle">{0}, {1}</listPatternPart>
            <listPatternPart type="end">{0}, and {1}</listPatternPart>
            <listPatternPart type="2">{0} and {1}</listPatternPart>
        </listPattern>

You can reach the data if you download the CLDR http://unicode.org/Public/cldr/38/cldr-common-38.0.zip then open the main folder which include locale files. You can look at root locale files (e.g. en.xml) to find all properties including the list pattern.

What I actually did is I called ICU APIs that format the list and then I extracted the separator for every locale from the API output. let me know if you need more details.

safern commented 4 years ago

FWIW, you can also find that data here: https://github.com/unicode-org/icu/tree/master/icu4c/source/data/locales

i.e, for en locale: https://github.com/unicode-org/icu/blob/master/icu4c/source/data/locales/en.txt#L2014-L2019

KalleOlaviNiemitalo commented 4 years ago

Issue 3: Usage for Excel-COM-Interop

@ANahr, is the .NET 5 ListSeparator actually wrong for Excel COM interop? I expect this scenario would always use CultureInfo.CurrentCulture rather than look up a culture by name; then, TextInfo.ListSeparator gets the user override from [HKEY_CURRENT_USER\Control Panel\International] sList instead of using the CLDR data, so I think it will match what Excel uses. Does your application run on a user account that does not have the sList Registry value at all? AFAICT, the Region control panel applet (intl.cpl) adds or updates the sList value whenever I select a different locale from the "Format" dropdown list and click Apply.

dotnet / runtime