Closed mklement0 closed 1 year ago
Tagging subscribers to this area: @dotnet/area-system-globalization See info in area-owners.md if you want to be subscribed.
Author: | mklement0 |
---|---|
Assignees: | - |
Labels: | `area-System.Globalization` |
Milestone: | - |
.NET depends on the ICU version used in your system. If running on systems with different ICU versions, it is expected you can see some differences. We don't ship ICU as part of .NET as the OS's already shipping it. If you want to guarantee consistency when running on different OS's, you can use App-local ICU and ship your copy of ICU with your app.
I understand that the OS' version is used, hence one of my questions above was:
@tarekgh:
Also, my W11 22H2 machine has a 68.2.0.10
version of icu.dll
The 68.2 release of ICU has three decimal digits for en
, not two, as actually used in .NET, so there's clearly a problem here.
How do you determine the version actually in use, ideally via a .NET API?
Why do you need this version programmatically? We don't have any public API retrieve the ICU version. You can use the following code to get it though.
Type? interopGlobalization = Type.GetType("Interop+Globalization, System.Private.CoreLib");
if (interopGlobalization != null)
{
MethodInfo? methodInfo = interopGlobalization.GetMethod("GetICUVersion", BindingFlags.NonPublic | BindingFlags.Static);
if (methodInfo != null)
{
int version = (int)methodInfo.Invoke(null, null)!;
Console.WriteLine($".... ICU Version: {new Version(version >> 24, (version >> 16) & 0xFF, (version >> 8) & 0xFF, version & 0xFF)}");
}
}
The 68.2 release of ICU has three decimal digits for en, not two, as actually used in .NET, so there's clearly a problem here.
Running the following code on my Windows machine Microsoft Windows 10.0.22621
and it is producing 3
. And I have ICU Version: 68.2.0.10
Console.WriteLine($"{new System.Globalization.CultureInfo("en-US", false).NumberFormat.NumberDecimalDigits}");
Could you please run the following code and try to send the output?
public static void PrintEnvironmentInfo()
{
try { CultureInfo ci = CultureInfo.GetCultureInfo("ja-JP"); } catch {}
Console.WriteLine($"{RuntimeInformation.OSDescription}");
Console.WriteLine($"{RuntimeInformation.FrameworkDescription}");
try
{
foreach (ProcessModule module in Process.GetCurrentProcess().Modules)
{
if (module.FileName.IndexOf("icu", StringComparison.OrdinalIgnoreCase) >= 0)
Console.WriteLine($".... {module.FileName}: {module.FileVersionInfo.FileVersion}");
}
Type? interopGlobalization = Type.GetType("Interop+Globalization, System.Private.CoreLib");
if (interopGlobalization != null)
{
MethodInfo? methodInfo = interopGlobalization.GetMethod("GetICUVersion", BindingFlags.NonPublic | BindingFlags.Static);
if (methodInfo != null)
{
int version = (int)methodInfo.Invoke(null, null)!;
Console.WriteLine($".... ICU Version: {new Version(version >> 24, (version >> 16) & 0xFF, (version >> 8) & 0xFF, version & 0xFF)}");
}
}
Console.WriteLine($".... UseNls: {typeof(object).Assembly.GetType("System.Globalization.GlobalizationMode")!.GetProperty("UseNls", BindingFlags.Static | BindingFlags.NonPublic)!.GetValue(null)} ....");
Console.WriteLine($".... Invariant: {typeof(object).Assembly.GetType("System.Globalization.GlobalizationMode")!.GetProperty("Invariant", BindingFlags.Static | BindingFlags.NonPublic)!.GetValue(null)} ....");
Console.WriteLine($".... PredefinedCulturesOnly: {typeof(object).Assembly.GetType("System.Globalization.GlobalizationMode")!.GetProperty("PredefinedCulturesOnly", BindingFlags.Static | BindingFlags.NonPublic)!.GetValue(null)} ....");
}
catch
{
Console.WriteLine($".... UseNls: Couldn't Evaluate it ....");
}
Console.WriteLine($"{new System.Globalization.CultureInfo("en-US", false).NumberFormat.NumberDecimalDigits}");
}
Why do you need this version programmatically?
So that discrepancies can be diagnosed, pinpointed, and traced back to a specific ICU version.
Your ICU-version-printing code doesn't work on my Windows ARM64 VM (W11, 22H2) (running on an M1 Mac, where the code does work).
Note that Microsoft Windows 10.0.22621
is a value frozen for backward compatibility, so even a Windows 11 machine reports that value.
I'll get back to you on the other code.
Note that Microsoft Windows 10.0.22621 is a value frozen for backward compatibility, so even a Windows 11 machine reports that value.
I have 22H2 (OS Build 22621.1105)
Your ICU-version-printing code doesn't work on my Windows ARM64 VM (W11, 22H2) (running on an M1 Mac, where the code does work).
What do you get when running it?
I'll get back to you on the other code.
Thanks!
Thanks for digging deeper:
It seems that it is PowerShell that is interfering, but it would be good to understand exactly how:
In PowerShell Core 7.4.0-preview.1, the following reports 2
, not 3
:
[cultureinfo]::InvariantCulture.numberFormat.numberDecimalDigits
By contrast, compiling and running your code with a .NET SDK 7.0.102 project does yield what you expect:
Microsoft Windows 10.0.22621
.NET 7.0.2
.... C:\Windows\SYSTEM32\icu.dll: 68, 2, 0, 10 (WinBuild.160101.0800)
.... ICU Version: 68.2.0.10
.... UseNls: False ....
.... Invariant: False ....
.... PredefinedCulturesOnly: False ....
3
The baffling thing is that I did try to verify that PowerShell is using ICU, as follows, and it does indicate true:
(Add-Type -PassThru -NameSpace net.same2u -Name Aux -UsingNamespace System.Globalization -MemberDefinition @'
public static bool ICUMode(){
SortVersion sortVersion = CultureInfo.InvariantCulture.CompareInfo.Version;
byte[] bytes = sortVersion.SortId.ToByteArray();
int version = bytes[3] << 24 | bytes[2] << 16 | bytes[1] << 8 | bytes[0];
return version != 0 && version == sortVersion.FullVersion;
}
'@)::ICUMode()
However, even though this code runs in-process, perhaps it doesn't reflect PowerShell's true ICU-vs.-NLS setting? Again, it would be good to understand what, specifically, is going on.
I believe your powershell session is running with NLS mode. Could you please try to run the following in your powersheel prompt:
$wid=""
[TimeZoneInfo]::TryConvertIanaIdToWindowsId("America/Los_Angeles", [ref] $wid)
Write-Output $wid
On my machine running inside PowerShell 7.3.2
I am getting
True
Pacific Standard Time
If running with NLS, expected TryConvertIanaIdToWindowsId
to fail and return false.
On my W11 22H2 ARM64 machine I get
True
Pacific Standard Time
too, which suggests ICU, right?
(As an aside: you don't usually need an explicit output command in PowerShell; just $wid
instead of Write-Output $wid
will do.)
Here's the content of "$PSHOME/pwsh.runtimconfig.json"
, which does not suggest that an NLS opt-in is in place (there's also no DOTNET_SYSTEM_GLOBALIZATION_APPLOCALICU
environment variable defined):
{
"runtimeOptions": {
"tfm": "net7.0",
"includedFrameworks": [
{
"name": "Microsoft.NETCore.App",
"version": "7.0.1"
}
],
"rollForwardOnNoCandidateFx": 2,
"configProperties": {
"System.Reflection.Metadata.MetadataUpdater.IsSupported": false,
"System.Runtime.TieredCompilation": true,
"System.Runtime.TieredCompilation.QuickJit": true,
"System.Runtime.TieredCompilation.QuickJitForLoops": true
}
}
}
too, which suggests ICU, right?
That is right. The called methods succeed only if running with ICU.
Your PowerShell script was calling new System.Globalization.CultureInfo("en-US", false)
exact? I mean passing false
? I am asking to ensure you are not retrieving the user override values. You can also use System.Globalization.CultureInfo.GetCultureInfo("en-US")
to ensure that.
It will be good idea if you can log issue in https://github.com/PowerShell/PowerShell and get PS guys help with that.
Thanks, @tarekgh - the useroverride
parameter is indeed the culprit,
and there's either a .NET bug, or the documentation is lacking / the API is counterintuitive.
So let's get some clarity before I follow up in the PowerShell repo - note that the intended PowerShell behavior is to use ICU on Windows, and it seems that at least in some earlier version that was the case - see https://github.com/PowerShell/PowerShell/issues/12755#issuecomment-644678916; also, there's nothing I could find in the project configuration of source code that would indicate an NLS opt-in.
If I run the following, with en-US
as the current culture, in PowerShell Core 7.4.0-preview.1 on .NET 7.0.1:
Console.WriteLine($"# of decimal digits with override=false: {new System.Globalization.CultureInfo("en-US", false).NumberFormat.NumberDecimalDigits}");
Console.WriteLine($"# of decimal digits with override=true: {new System.Globalization.CultureInfo("en-US", true).NumberFormat.NumberDecimalDigits}");
Console.WriteLine($"# of decimal digits with name-only constructor: {new System.Globalization.CultureInfo("en-US").NumberFormat.NumberDecimalDigits}");
Console.WriteLine($"# of decimal digits with current culture: {System.Globalization.CultureInfo.CurrentCulture.NumberFormat.NumberDecimalDigits}");
I get:
# of decimal digits with override=false: 3
# of decimal digits with override=true: 2
# of decimal digits with name-only constructor: 2
# of decimal digits with current culture: 2
Is it really the intent that only with false
for override
that you get ICU opt-in,
and that by default you get NLS?
While the NLS settings could be considered an override of sorts - of ICU as the default - it isn't documented as such,
and shouldn't the name-only constructor and System.Globalization.CultureInfo.CurrentCulture
use ICU?
When setting useroverride
to true during the culture creation, this means you are asking to get the values of some culture properties which overridden by the user through the UI (when you run intl.cpl
). Note, useroverride
affects only the culture that the user chose to set as a default (which is CultureInfo.CurrentCulture
). useroverride
is not about opt-in to ICU or NLS. It is about choosing to get what the user preferences, or the default culture data that comes either from NLS or ICU.
The Remarks in the docs also have some useful info.
That was my original understanding and it makes sense, but the results clearly contradict that:
Clearly, NLS is the default in the above calls, not ICU - both in CultureInfo.CurrentCulture
and with new CultureInfo("en-US")
- in the absence of any NLS opt-in via environment variables or app-specific settings.
So this is a bug, right?
Based on what you're saying, the useUserOverride
parameter should work as follows, which is currently not the case:
Non-Windows
Windows:
ignored, if ICU is in effect - which should be the default honored only with NLS opt-in in effect
That is wrong. useUserOverride
is in effect whether you are running with ICU or NLS. I don't think there is any bug here.
What does useUserOverride
do if ICU is in effect, given that such user overrides are specified via Control Panel, and therefore NLS-based?
useUserOverride
is unrelated to ICU vs. NLS, why does it clearly trigger a switch to ICU when given false
, as shown above?Leaving useUserOverride
out of the picture, the bigger issue is:
$wid="";[TimeZoneInfo]::TryConvertIanaIdToWindowsId("America/Los_Angeles", [ref] $wid); $wid
in PowerShell Core 7.4.0-preview.1 on Windows 11 shows.Think about NLS and ICU as a source of globalization data. NLS/ICU have nothing to do with the user overrides. User overrides are the settings the user changes on the system and get stored in the Windows Registry. Here is the explanation of the behavior you are seeing:
On your machine, the settings of NumberDecimalDigits
is 2
:
Then when calling
Console.WriteLine($"# of decimal digits with override=false: {new System.Globalization.CultureInfo("en-US", false).NumberFormat.NumberDecimalDigits}");
You are asking to NOT getting the user override. You are running with ICU which happens to have this value 3
.
When calling:
Console.WriteLine($"# of decimal digits with override=true: {new System.Globalization.CultureInfo("en-US", true).NumberFormat.NumberDecimalDigits}");
You explicitly ask for user override value because you are passing true
for user override. The value set by the user is 2
.
When calling
Console.WriteLine($"# of decimal digits with name-only constructor: {new System.Globalization.CultureInfo("en-US").NumberFormat.NumberDecimalDigits}");
This will be same as if you are passing true
to the user override. Look at the doc which clearly state in the remarks section: The UseUserOverride property is always set to true.
When calling
Console.WriteLine($"# of decimal digits with current culture: {System.Globalization.CultureInfo.CurrentCulture.NumberFormat.NumberDecimalDigits}");
If not manually set the current culture, we read the user settings and get the default culture and then create it with new CultureInfo(name) which will pick the user overrides too.
Let me know if there is anything unclear here.
On your machine, the settings of
NumberDecimalDigits
is2
It is, but not due to customization - it is showing what I presume to be the NLS default for en-US
.
Thus, when running with useUserOverride
set to true
, there is no user override to apply.
Wy does that then exhibit NLS behavior?
(Thanks for clarifying that applying user overrides applies by default; I never noticed that CultureInfo.CurrentCulture.UseUserOverride
is always true
by default, even on Unix. However, that isn't relevant to the discussion, given that there are no overrides at play here.)
When would it ever make sense to mix ICU and NLS settings, which is what currently happens by default, as discussed (no user overrides in place, no ICU/NLS opt-ins in place)?
Are you saying that the only way to get ICU behavior consistently is by explicitly constructing a culture-info object with useUserOverride
set to false
, and assigning that to the running thread?
What appears to be happening:
In a pristine system and whenever you switch to a different predefined culture, the NLS settings get copied to HKEY_CURRENT_USER\Control Panel\International
While the registry values there can be customized by the user, e.g., via intl.cpl
, in the absence of actual customization it is inappropriate to consider the information there a user override.
The upshot of the current behavior is that the de-facto default is NLS.
It is, but not due to customization - it is showing what I presume to be the NLS default for en-US.
It is the default user settings which Windows keep the value for compatibility reason. Try to avoid using NLS
default. Windows can decide to change the NLS data at any time and keep the user settings to 2.
Thus, when running with useUserOverride set to true, there is no user override to apply.
Well, Windows is the one that sets this override value. And treat it as User overrides. Users are free to delete it or just change it.
(As an aside: isn't the NLS data stored in the registry too, not just user overrides?)
No. NLS data are stored in a file called C:\Windows\System32\locale.nls
.
Wy does that then exhibit NLS behavior?
It happened that NLS data stored in the file is the same as the user override value stored in the registry which is 2
When would it ever make sense to mix ICU and NLS settings, which is what currently happens by default, as discussed (no user overrides in place, no ICU/NLS opt-ins in place)?
You still think NLS is same as User Override which is not the case. We are not mixing NLS/ICU things. We just honor whatever user setting in the machine. Devs have the flexibility to retrieve the desired data.
Are you saying that the only way to get ICU behavior consistently is by explicitly constructing a culture-info object with useUserOverride set to false, and assigning that to the running thread?
You can ensure getting the ICU data by setting CurrentCulture
and DefaultThreadCurrentCulture
in your app startup code to ensure this will be applied to the whole app. Unfortunately, this is the behavior that will be hard to change for the compatibility reason. Note, CurrentCulture is not tied to the thread as it can travel through async calls inside the execution context.
You can ensure getting the ICU data by setting CurrentCulture and DefaultThreadCurrentCulture in your app startup
Good to know, thanks.
You still think NLS is same as User Override which is not the case.
I'm not saying it's the same thing, I'm saying that de facto you get NLS settings, based on the mechanism I described.
And it is utterly confusing to call these default values user overrides - both in general, and in particular because of how user overrides are discussed in the documentation:
The user might choose to override some of the values associated with the current culture of Windows through the regional and language options portion of Control Panel.
In .NET Framework, life was simple:
useUserOverride = false
meant: get the NLS defaults for the culture (there was no ICU) useUserOverride = true
meant: get the settings from HKEY_CURRENT_USER\Control Panel\International
, which default to the NLS defaults, but may have been modified by the user.In .NET, we now have a bewildering situation on Windows:
useUserOverride = false
means: get the ICU defaults for the culture, on all platforms.useUserOverride = true
- the default - means: get the settings from HKEY_CURRENT_USER\Control Panel\International
, which still default to the NLS defaults.
useUserOverride = false
The upshot is the following - and I hope you'll agree that neither the terminology nor the current documentation do this behavior justice. Please review for technical accuracy; I may use the result to suggest improvements to the documentation:
On Windows, the de-facto default is still NLS with respect to number / date-time / currency settings,
because the statically-copied-from-NLS-by-default values in the HKEY_CURRENT_USER\Control Panel\International
registry key are honored (irrespective of whether a user override is in effect, through subsequent customization of these values).
That is, backward compatibility was prioritized over consistent cross-platform behavior.
By contrast, string comparisons, sorting behaviors (other behaviors?) do use ICU as the default.
In effect, this makes for an awkward mix of NLS and ICU behaviors.
To fully default to ICU on Windows, opt-in is required:
By placing the following in your application's startup code (using C# syntax):
CultureInfo.CurrentCulture = CultureInfo.DefaultThreadCurrentCulture = CultureInfo.GetCultureInfo(CultureInfo.CurrentCulture.Name);
[updated per discussion below]
Note:
false
argument is passed to the useUserOverride
argument of the CultureInfo
constructor.HKEY_CURRENT_USER\Control Panel\International
a user override - even if no explicit customization of these values by the user ever took place.true
and false
argument passed to useUserOverride
if no actual override - such as via the user using Control Panel to modify settings - was in place. false
serves as a (full) ICU opt-in.false
as the useUserOverride
argument of the CultureInfo
constructorI don't really like saying the de-facto default is still NLS
. or mentioning NLS at all. The NLS data can change at any time and the user overrides can stay. Please don't try to tie both as they always be the same. Let's focus on describing the case that Windows comes with default locale settings that can override the culture property values.
By contrast, string comparisons, sorting behaviors (other behaviors?) do use ICU as the default.
Instead of saying this, we can clarify is the user overrides are only overriding some culture data related to date, time, and number formatting.
In effect, this makes for an awkward mix of NLS and ICU behaviors.
I don't think this is an awkward mix. We need to think about user overrides are the user settings that override the default culture data. We need to move away from thinking we are mixing NLS and ICU. Think about it if the users/admins changed these settings. At that time there will be no relation with NLS at all.
new CultureInfo(CultureInfo.CurrentCulture.Name, false);
Better just use CultureInfo.GetCultureInfo(...)
Confusingly, .NET considers the presence of values in HKEY_CURRENT_USER\Control Panel\International a user override - even if no explicit customization of these values by the user ever took place.
This is mostly for UI Windows settings to pick the overrides from some place.
I don't really like saying the de-facto default is still NLS. or mentioning NLS at all. The NLS data can change at any time and the user overrides can stay.
It is still of vital importance to note that by default - without any actual user overrides - you get NLS settings, albeit in the form or a static copy - locked in at the time the machine is set up or whenever you change to a different predefined locale later.
we can clarify is the user overrides are only overriding some culture data related to date, time, and number formatting.
I think it's better to explicitly enumerate which behaviors are governed by ICU by default, to understand whose rules apply where.
Better just use CultureInfo.GetCultureInfo(...)
Noted, thanks.
This is mostly for UI Windows settings to pick the overrides from some place.
I don't understand.
My point is that it is highly confusing to refer to default settings as user overrides (even though actual user overrides end up writing to the same location).
The current behavior erases the distinction between default values and intentional overrides of these values, because useUserOverride
- the default - always reads from this location (via WinAPIs, I presume), right?
While that applies to .NET Framework too, in the absence of actual modification by the user there was no effective difference between using and not using a user override, as long as the current NLS data still matched the static copy in the registry.
(While these two could diverge, it still is confusing to talk about a user override when it comes to something the user never modified.)
Now, in .NET, opting out of the "user override" results in something fundamentally different - the switch to ICU data.
So let's get some clarity before I follow up in the PowerShell repo - note that the intended PowerShell behavior is to use ICU on Windows, and it seems that at least in some earlier version that was the case - see PowerShell/PowerShell#12755 (comment); also, there's nothing I could find in the project configuration of source code that would indicate an NLS opt-in.
To be very precise, we use the .Net defaults. After .Net started using ICU by default, pwsh also uses ICU by default. There is only one exception I know of for formatting double/float numbers.
@mklement0 Let me rephrase what @tarekgh is saying. Each higher-level API can have its own defaults that change the defaults of lower-level APIs. I specifically pointed out the PR from the PowerShell repository - it clearly shows that no matter what NLS, ICU, or .Net format is - pwsh clearly changes the format of the numbers as it needs. The same is true for the Windows API - it doesn't matter what NSL or ICU is used as low level API - the Windows API is more high level and has its own configuration repository which takes precedence over low-level API settings.
@tarekgh For my education, is there any way to change the ICU settings/defaults? For example, a config file in the pwsh folder which will be the same on Windows and on Unix.
For my education, is there any way to change the ICU settings/defaults? For example, a config file in the pwsh folder which will be the same on Windows and on Unix.
Currently we don't provide any customization for ICU defaults. You'll need to handle that programmatically by creating the culture and customize whatever properties you want.
@iSazonov, yeah, I got confused by the move to ICU, which I took to mean that it is used by default, which is clearly not the case with respect to the settings that matter most:
Again, in effect you're getting (static copies of) NLS settings by default with respect to date-time / number / calendar / currency formats, and that's what must be communicated clearly.
(That you're free to implement actual user overrides later by modifying the registry location where the copied-from-NLS defaults are stored is a separate issue.)
While there is one abstract hint in https://learn.microsoft.com/en-us/dotnet/core/extensions/globalization-icu, I don't think it's enough.
Even when using ICU, the
CurrentCulture
,CurrentUICulture
, andCurrentRegion
members still use Windows operating system APIs to honor user settings.
Again, in effect you're getting (static copies of) NLS settings by default with respect to date-time / number / calendar / currency formats, and that's what must be communicated clearly.
in effect we're getting (static copies of) Windows settings by default As @tarekgh pointed out, we don't even know which defaults in NLS are used now and which ones before because the code is closed. This is true for ICU too - no doubt Windows configures ICU before using it with a specific config which is closed and we don't know what is there and what default values are changed. Moreover it does not matter which dll is used - NLS, ICU or QQQ - the Windows config/defaults will always be used. So it is obvious that .Net uses the Windows API to honor user settings.
no doubt Windows configures ICU before using it with a specific config which is closed
That would be highly surprising, given the point of ICU is standardization.
we don't even know which defaults in NLS are used now and which ones before
I don't think that is relevant; what matters is knowing that NLS is the default source, and that that default source can differ from ICU.
As a user (not concerned with customizing settings), what you want to know is: do I get ICU settings (standardized, across platforms) or do I get NLS (Windows-specific) settings? Of course, both can change over time, so that is irrelevant to the discussion.
So it is obvious that .Net uses the Windows API to honor user settings.
While that is de facto true, it is far from obvious , based on the API terminology and the current documentation - at least to me, but I suspect I'm not alone.
In particular:
CultureInfo
constructor.CultureInfo.CurrentCulture = CultureInfo.DefaultThreadCurrentCulture = CultureInfo.GetCultureInfo(CultureInfo.CurrentCulture.Name);
as the opt-in instead, which looks like a no-op, where the UseUserOverride
logic is implicit.
Description
45308 reported for .NET Core 3.1 that
new System.Globalization.CultureInfo("en-US", false).NumberFormat.NumberDecimalDigits
, i.e. the default number of decimal places used when formatting with the standardN
number format, yields platform-specific values:3
on Unix-like platforms2
on WindowsIn https://github.com/dotnet/runtime/issues/45308#issuecomment-735438395, @tarekgh states that "In .NET 5.0 we have started to use ICU on Windows too and that should show more consistency when running on different OS."
However, as of .NET 7.0.1, the above discrepancy still exists, which is surprising and inconvenient (even though it is understood that culture-specific representations shouldn't be relied on programmatically).
What accounts for this discrepancy?
Conceivably, a discrepancy can come from different platforms using different ICU library versions. How can the specific ICU version used by a given .NET application be determined?
System.Globalization.CultureInfo.CompareInfo.Version
has aFullVersion
property, but it doesn't reveal a specific version number:Reproduction Steps
Expected behavior
Consistent results across platforms, due to use of ICU (unless the NLS opt-in is in effect on Windows)
Actual behavior
Windows reports a value that differs from that on Unix-like platforms.
Regression?
No response
Known Workarounds
No response
Configuration
.NET 7.0.1
Other information
No response