Open mikernet opened 4 years ago
Thanks for opening this issue, @mikernet.
@tarekgh, it seems that on Linux, even a CultureInfo whose name is "aaa" doesn't throw a CultureNotFoundException. On all platforms, does the culture returned depend on the underlying OS (on Windows) and ICU library (on Unix-based systems)?
Based on our other discussion threads and my testing, my understanding is that the exception is only thrown for invalid/badly formed culture names. Any culture that is formatted properly will not throw the exception even if it doesn't exist.
does the culture returned depend on the underlying OS (on Windows) and ICU library (on Unix-based systems)?
Yes, we just use ICU to create the culture on Linux. We just support whatever behavior Windows and ICU support.
Looking at the source code, it doesn't just pass directly to OS or ICU though...it clearly does some massaging here to give consistent behavior:
It also looks like region to culture mapping is hard-coded and not just reliant on ICU/windows:
Invariant behavior also looks hard coded for consistency:
There are a lot of spots in CultureData files that appear to be trying to minimize the differences in behavior between Unix and Windows.
it clearly does some massaging here to give consistent behavior:
You are pointing to other place which doesn't create a culture. the code you are pointing it is just a code mapping a culture to a region.
It also looks like region to culture mapping is hard-coded and not just reliant on ICU/windows:
Same answer.
Invariant behavior also looks hard coded for consistency:
That is right except for collation behavior.
Not sure if this is the most appropriate place to post, but close enough.
I concur with (user) mikernet . Also see here for instance:
This is a serious breaking change, since "System.Globalization.CultureInfo" now identifies languages as valid in Windows 10 that previously resulted in an exception in Windows 7 and earlier. I only just discovered the problem in fact and Windows 10 has been out for years now!! It was a serious accident waiting to happen and just did (for one of my customers).
How can an unknown (non-existent) language code be accepted simply because it's considered well-formed (and even that's confusing, because a language code normally originates from ISO 639-1 which consists of two lowercase characters only AFAIK, and even the "CultureInfo::Name" docs currently indicates that - a few very quick and informal tests for instance shows that the "CultureInfo" constructor accepts any language code string <= 3 characters, like "aaa" or "a", while rejecting anything with 4 or more characters, assuming you're passing just the language code only and not the country/regioncode code as well).
In any case, passing something like "aaa" to the "CultureInfo" constructor would cause an exception in Windows 7 and earlier but now succeeds in Windows 10. That's a serious breaking change for anyone relying on the exception to determine an invalid language code (and who wouldn't be). Whether "CultureNotFoundException" or just plain "ArgumentException" (which user mikernet addresses), why is an unknown (invalid) language code even being accepted at all (and as mentioned in the link I cited at the outset, the resulting "CultureInfo::DisplayName" object even shows "Unknown Language (aaa)"). That's absurd behavior.
So how do you consistently check for an invalid (unknown) language? Sometimes an exception results, but for something like "aaa", it doesn't (in Windows 10 only AFAIK). If no exception should I therefore check "CultureInfo::LCID" for LOCALE_CUSTOM_UNSPECIFIED (4096), but if so, what about user-defined languages? Can they also return LOCALE_CUSTOM_UNSPECIFIED (I need to investigate). Note that "CultureTypes::UserCustomCulture" is officially deprecated so presumably I can't (reliably) rely on "CultureInfo::CultureTypes" to check for it (and it's unclear whether it would be reliable anyway when dealing with unknown languages).
The simple exception that used to easily identify non-existent languages is no longer reliable and the situation is now very confusing (in addition to how many people's code must have been quietly broken by this change).
@lsmith999999 .NET in general depends on the underlying OS to get the globalization behavior. It is Windows which decided to go with this behavior allowing any locale name that conform to BCP-47 standard.
In .NET 5.0 we have exposed the API CultureInfo.GetCultureInfo which can help you creating only cultures that the OS has a real data for.
@tarekgh: Thanks for the feedback but the situation still remains confusing and the reason for the change bewildering, even if .NET depends on the OS itself. It doesn't have to even if it runs afoul of what the OS is doing, since it's a significant breaking change from earlier behavior going back many years. The change is also potentially very serious in my particular case (and possibly others), and it just quietly started without warning.
Programs don't read (updated) docs before they run, and even now (for us human readers), the docs don't mention that language codes which were were previously rejected as invalid are now considered valid (as if I and others routinely check the docs for unexpected breaking changes like this anyway). The "Notes to Callers" section under the "CultureInfo" constructor gives no clear indication of this whatsoever. It's unclear what the purpose of this change was anyway, even if it occurs at the OS level, and how it's beneficial. From my perspective (and others) it's problematic and confusing (see my previous response).
It also (still) remains unclear how to check for a valid (existing) language code now. Migrating to .NET 5.0 is hardly an option assuming the new "CultureInfo.GetCultureInfo" you mentioned actually works as expected, so it appears I may have to do a language lookup via "CultureInfo.GetCultures()" (the only reliable alternative I can think of off-hand).
Note that I looked at the link you posted (https://github.com/dotnet/runtime/pull/42496) and I'm not comfortable with this approach, which was in a state of flux at the time of posting anyway (and therefore potentially unstable at the time). It was a hack anyway (checking "CultureInfo.EnglishName" for a raw string like "Unknown Language" which had already occurred to me as possible solution anyway, before your post), but even relying on LOCALE_ICONSTRUCTEDLOCALE (the newer approach in the latter link) remains unclear. Where does this originate from in .NET without passing my language string to unmanaged code. For an invalid language like "aaa", the "CultureInfo.LCID" property is returning LOCALE_CUSTOM_UNSPECIFIED (as documented), not LOCALE_ICONSTRUCTEDLOCALE (this is an LCID is it not?). It's not even clear if LOCALE_CUSTOM_UNSPECIFIED can (will) be returned for custom cultures as well (like in earlier versions of Windows), but if it is, then LOCALE_CUSTOM_UNSPECIFIED can't be relied on since you then can't distinguish between a valid custom culture and one that doesn't actually exist.
The situation as it stands is ridiculous but if there's a clear fix then can someone at least document it (clearly).
It was a hack anyway (checking "CultureInfo.EnglishName" for a raw string like "Unknown Language" no-one should take dependency on "Unknown Language".
Yes, and that is why we fixed it.
but even relying on LOCALE_ICONSTRUCTEDLOCALE (the newer approach in the latter link) remains unclear.
I am not sure why you think so. it is very clear telling if the returned culture has a real data or not.
We discourage all users of using LCIDs property in general. it is kind of obsolete and you always should rely on the culture names instead. There are many scenarios that LCID wouldn't work.
edited ~Suggesting a change in the docs to clarify things is a reasonable ask, please feel free to log issue in https://github.com/dotnet/dotnet-api-docs/ and we'll be happy to do so.~
this issue is suggesting the change in the doc and we'll be happy to address that.
@tarekgh
I am not sure why you think so. it is very clear telling if the returned culture has a real data or not.
By "unclear" I meant unclear where to pull LOCALE_ICONSTRUCTEDLOCALE from in .NET (using managed code). You go on to say however:
We discourage all users of using LCIDs property in general. it is kind of obsolete and you always should rely on the culture names instead. There are many scenarios that LCID wouldn't work.
Agreed (that LCIDs are antiquated), but it seems to contradict your previous statement nevertheless, that "... it is very clear telling if the returned culture has a real data or not" (by relying on LOCALE_ICONSTRUCTEDLOCALE). Is the latter intended for your own internal use only (to drive functions like "CultureInfo.GetCultureInfo()"), or is it also intended for client developers. Not likely (you said so yourself), so what am I still misunderstanding. I still don't know what the recommended technique is for determining that "aaa" isn't a recognized language, presumably from BCP 47 normally (though other standards are sometimes mentioned such as RFC 4646, which is what "CultureInfo" is officially based on - I'm not an expert in all these standards though), or user-defined.
I can't rely on exceptions, or LCIDs, so what's the solution (in .NET 4.X)?
I am not sure why you are relating LOCALE_ICONSTRUCTEDLOCALE
with the LCIDs? LOCALE_ICONSTRUCTEDLOCALE
has nothing to do with LCIDs in general.
I can't rely on exceptions, or LCIDs, so what's the solution (in .NET 4.X)?
You can something like:
[DllImport("kernel32.dll", CharSet = CharSet.Unicode)]
internal static extern unsafe int GetLocaleInfoEx(string lpLocaleName, uint LCType, void* lpLCData, int cchData);
public static unsafe bool IsPredefinedCulture(string cultureName)
{
const uint LOCALE_ICONSTRUCTEDLOCALE = 0x0000007d;
const uint LOCALE_RETURN_NUMBER = 0x20000000;
int value = 0;
GetLocaleInfoEx(cultureName, LOCALE_ICONSTRUCTEDLOCALE | LOCALE_RETURN_NUMBER, (char*)&value, sizeof(int));
return value != 1;
}
@tarekgh
_I am not sure why you are relating LOCALE_ICONSTRUCTEDLOCALE with the LCIDs? LOCALEICONSTRUCTEDLOCALE has nothing to do with LCIDs in general
Because while I spend most of the time working in the WinAPI (in C++), I haven't looked (closely) at the Internationalization functions in some years now, and admittedly took the chance you were referring to some pre-defined LCID constant. It sounded that way though I wasn't certain (and didn't have the time confirm it).
Thank you for the function. It's appreciated. I'll take a look but am still unsure at this point. Most .NET developers aren't going to be familiar with this, and wouldn't expect to have to turn to something so low-level (and may not even have the C/C++ background usually required to deal with unsafe code). I'm only working in .NET for one particular (commercial localization) application I wrote many years ago (that I need to tend to periodically), but still find it very surprising I have to rely on something like this when dealing with a managed class like "CultureInfo" (to determine if a language string is valid). What's the point of the documented exceptions it throws if they can't be reliably used in all cases.
I'm very experienced in both the (unmanaged) WinAPI (30+ years) and .NET so can deal with the code without issue, but there's gotta be a cleaner (managed) way (one would think). Most would expect it.
@tarekgh
Just a follow-up that I tested the function you posted after reading the docs and the behavior is still mysterious. I haven't dug into the details to see what's really going on (the reason for this behavior), but it's effectively identical to "CultureInfo" itself (when passing an unknown culture name to its constructor for instance). Presumably "CultureInfo" is relying on "GetLocaleInfoEx()" (or equivalent) but again, I haven't looked into it.
For instance, the following 4 invalid (unknown) strings (cultureName) generates these results:
cultureName "GetLocaleInfoEx()" result "CultureInfo" constructor
----------- -------------------------- -------------------------
"w" Fails. "GetLastError()" returns ERROR_INVALID_PARAMETER Fails with "CultureNotFoundException"
"wx" Succeeds. "value" contains 1 ("wx" is a constructed locale) Succeeds but "CultureInfo.EnglishName" shows "Unknown Language (wx)" and CultureInfo.LCID shows 4096 (LOCALE_CUSTOM_UNSPECIFIED)
"wxy" Succeeds. "value" contains 1 ("wxy" is a constructed locale) Succeeds but "CultureInfo.EnglishName" shows "Unknown Language (wxy)" and CultureInfo.LCID shows 4096 (LOCALE_CUSTOM_UNSPECIFIED)
"wxyz" Fails. "GetLastError()" returns ERROR_INVALID_PARAMETER` Fails with "CultureNotFoundException"
Presumably you specialize in this particular area or at least have deeper knowledge than myself (haven't looked at this stuff in years), so it may all seem perfectly normal (correct). It seems to confirm much of what's been discussed here but from my perspective (a client developer), it's not normal (expected or reasonable behavior), even if there's a legitimate reason for it. It may be the OS itself but it doesn't matter. At this stage I'll have to rely on a lookup of "cultureName" in "CultureInfo.GetCultures()" which has certain drawbacks of its own (including possible perfomance issues even if I cache "CultureInfo.GetCultures()" in a "HashSet").
Anyway, thanks for your assistance (appreciated).
I tweaked the code a little bit here. This code should give you the right result.
[DllImport("kernel32.dll", CharSet = CharSet.Unicode)]
internal static extern unsafe int GetLocaleInfoEx(string lpLocaleName, uint LCType, void* lpLCData, int cchData);
public static unsafe bool IsPredefinedCulture(string cultureName)
{
const uint LOCALE_ICONSTRUCTEDLOCALE = 0x0000007d;
const uint LOCALE_RETURN_NUMBER = 0x20000000;
int value = 0;
return GetLocaleInfoEx(cultureName, LOCALE_ICONSTRUCTEDLOCALE | LOCALE_RETURN_NUMBER, (char*)&value, sizeof(int)) > 0 && value != 1;
}
@tarekgh
Better, thanks. I might have picked up on that myself but LOCALE_ICONSTRUCTEDLOCALE gets into a fuzzy area. Based on my interpretation at this point, it presumably works like this:
Is this the final story and if so, is that the (exact) same behavior of "CultureInfo" (for determining a known culture). Presumably the .NET 5.0 version of "CultureInfo.GetCultureInfo()" (with the "predefinedOnly" arg) relies on it as well (based on the earlier link you posted, and I (safely) can too now (via LOCALE_ICONSTRUCTEDLOCALE)?
Is this the final story and if so, is that the (exact) same behavior of "CultureInfo" (for determining a known culture). Presumably the .NET 5.0 version of "CultureInfo.GetCultureInfo()" (with the "predefinedOnly" arg) relies on it as well (based on the earlier link you posted, and I (safely) can too now (via LOCALE_ICONSTRUCTEDLOCALE)?
the answers for all questions here is Yes :-)
Great, thanks for all your help (appreciated!!)
@tarekgh
Sorry, last word I hope (no need to respond unless req'd). I updated (corrected) bullet 2 in my last response in case anyone else relies on it. I originally reversed the meaning of 0 and 1 due to the confusing interpretation of "Is a constructed locale" vs "Not constructed" in the LOCALE_ICONSTRUCTEDLOCALE docs ("constructed" meaning the culture actually doesn't exist, not constructed meaning it does - easy to get this backwards and I did - mea culpa)
I suggest instead of saying constructed
, you say culture which the underlying OS may not have a complete information
. This could be more explicit and easier to understand.
Yes, I actually do get it, but there's a natural tendency to think that something which is "constructed" exists and something that's not constructed doesn't (just from the raw meaning of the word "constructed" coupled with the word "exists"). It's was just my brain on auto-pilot as it processed those two words in normal everyday use. In this case things work in reverse though (or rather you have to interpret it in a different context) but as a programmer of course you have to just look at the actual (documented) codes you're dealing with to overcome what your brain is thinking :)
Anyway, I think we're finally done. Thanks again!
Just found this issue, my code expected to throw when the string in the constructure is not valid.
The docs further state that CreateSpecificCulture() is just a wrapper, but in my testing on Win11 and dotnet 7.0.3 the returned object is distinctly different.
If the behavior is expected to be that "foo" "bar" "etc" will succeed simply because it looks like it could be ok, how do I test the attributes of the returned CI to detect if the culture was created on the fly, vs. actually exists?
how do I test the attributes of the returned CI to detect if the culture was created on the fly, vs. actually exists?
Use GetCultureInfo(String, Boolean) and pass true
for the predefinedOnly
parameter.
I ended up there via StackOverflow, and it seems to mostly work mostly, but for "und" it succeeds but returns a "ZZZ" for ThreeLetterWindowsLanguageName and UserCustomCulture, so I still need to add validation logic.
I know ISO639-3 "und" may be a special case, as it could translate to "UND" in RFC 5646, but I'd still like to know it is not valid, and let my logic test for "und" or "zxx", etc.
The solution I mentioned here https://github.com/dotnet/dotnet-api-docs/issues/3110#issuecomment-1470120329 is more reliable than what is suggested in the stack overflow. Why can't you use that?
I wasn't clear, I did end up using GetcultureInfo(language, true), but it does not work as expected for "und" and "zxx", so I still either have to hard code special codes, and I don't know them all, or I generically test for "ZZZ" and UserCustomCulture.
Issue description
CultureInfo
constructor andCultureInfo.CreateSpecificCulture()
documentation is highly misleading and confusing.On both pages it states that the name parameter must be an existing culture, but it doesn't - it can be any well formed culture name, whether it exists or not.
The exceptions section for
CultureInfo.CreateSpecificCulture()
states that it throwsCultureNotFoundException
when:I think it would be useful to note that in the docs this method returns a blank named
CultureInfo
object if the culture specified does not exist because that's rather counter-intuitive. Apparently it seems like sometimes the invariant culture behaves like a neutral culture and other times it doesn't (see https://github.com/dotnet/corefx/issues/40698#issuecomment-526785405)Why the heck was this changed? This seems like a terrible design decision. A badly formatted parameter should be throwing an argument exception, not an exception that indicates that it couldn't be found. The docs for
CultureNotFoundException
states:That's clearly not what is happening here. These methods will happily work fine with cultures that are not available as long as the name is well formed.
Regardless, seems that the docs should clarify this somewhat because the current behavior is highly unexpected.
CultureInfo
constructor:As stated above, it does not do this...it returns a
CultureInfo
object with the specified name which falls back to the invariant culture regardless of whether it exists in the OS or not .Target framework