dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.98k stars 4.66k forks source link

IdnMapping.GetAscii throws for some inputs in GlobalizationMode.Invariant #37349

Open stephentoub opened 4 years ago

stephentoub commented 4 years ago
using System;
using System.Globalization;

class Program
{
    static void Main()
    {
        //Environment.SetEnvironmentVariable("DOTNET_SYSTEM_GLOBALIZATION_USENLS", "1");
        //Environment.SetEnvironmentVariable("DOTNET_SYSTEM_GLOBALIZATION_INVARIANT", "1");
        Console.WriteLine(new IdnMapping().GetAscii("\xD83D\xDE00"));
    }
}

This works fine with both NLS and ICU. But when GlobalizationMode.Invariant is enabled (uncomment the relevant line above), it throws an exception:

Unhandled exception. System.ArgumentException: Left to right characters may not be mixed with right to left characters in IDN labels. (Parameter 'unicode')
   at System.Globalization.IdnMapping.PunycodeEncode(String unicode)
   at System.Globalization.IdnMapping.GetAsciiInvariant(String unicode, Int32 index, Int32 count)
   at System.Globalization.IdnMapping.GetAscii(String unicode, Int32 index, Int32 count)
   at System.Globalization.IdnMapping.GetAscii(String unicode, Int32 index)
   at System.Globalization.IdnMapping.GetAscii(String unicode)
ghost commented 4 years ago

Tagging subscribers to this area: @tarekgh, @safern, @krwq Notify danmosemsft if you want to be subscribed.

tarekgh commented 4 years ago

@stephentoub We don't promise any correct IDN behavior when enabling GlobalizationMode.Invariant. even it is not right anyone should depend on IDN behavior when GlobalizationMode.Invariant is enabled. We just got basic functionality working there and no promise to have any consistent result with disabling GlobalizationMode.Invariant. I suggest closing this one.

stephentoub commented 4 years ago

even if it is not right no one should depend on IDN behavior when GlobalizationMode.Invariant is enabled

We use this in higher-level APIs like HttpClient. This means that with invariant mode enabled, you'll get exceptions just by trying to access sites that cause IdnMapping to throw.

FYI, @richlander as you consider configurations for docker images and the tradeoffs involved.

tarekgh commented 4 years ago

I would argue it is good to get the exception at that time better than getting wrong behavior. I would worry about security more at that time.

stephentoub commented 4 years ago

My point is saying "We don't promise any correct IDN behavior when enabling GlobalizationMode.Invariant" means we also don't promise any correct behavior for anything higher in the stack, whether it be SslStream or HttpClient or anything else that uses it. I don't know how to reason about that. Can we be more specific about what will and won't work?

tarekgh commented 4 years ago

@stephentoub I understand your point. I was just commenting on what would be the choice between throwing and not throwing on the level of the IDN layer.

For the higher stack with the Invariant mode, we can have the choice to change the Invariant mode behavior when the system already has NLS/ICU. This will be the case always on Windows and MacOS and only be broken on Linux images not having ICU installed. the concern with that is we are not fully solving the problem and we are losing the consistency promise when enabling Invariant. Note, I am talking about the IDN only and not the whole globalization.

stephentoub commented 4 years ago

I understand your point.

👍

tarekgh commented 4 years ago

I am moving this to 6.0 release as this is not a new behavior for invariant mode. we can revisit later to enhance Invariant mode if needed.

ilonatommy commented 1 year ago

Update: The sample in description works already (returns xn--e28h in Invariant). The exception is thrown e.g. for àא that should be coded as xn--0ca24w according to unicode13.

Beyond throwing, there are cases where we return incorrect result and these are e.g. from Unicode13 tests:

0à.א; 0à.א; [B1]; xn--0-sfa.xn--4db; ; ;  # 0à.א  (returns: xn--0a-6tb.xn--4db)
0À.א; 0à.א; [B1]; xn--0-sfa.xn--4db; ; ;  # 0à.א  (returns: xn--0a-6tb.xn--4db)
0À.א; 0à.א; [B1]; xn--0-sfa.xn--4db; ; ;  # 0à.א  (returns: xn--0-yda.xn--4db)

all of the above, instead of returning אà.א, throw when toUnicode is done on source,;

tarekgh commented 1 year ago

@ilonatommy we cannot have a perfect IDN behavior in Invariant mode without having some way to support the normalization. Look at my comment https://github.com/dotnet/runtime/issues/37349#issuecomment-638348089.

ghost commented 1 year ago

Tagging subscribers to this area: @dotnet/area-system-globalization See info in area-owners.md if you want to be subscribed.

Issue Details
```C# using System; using System.Globalization; class Program { static void Main() { //Environment.SetEnvironmentVariable("DOTNET_SYSTEM_GLOBALIZATION_USENLS", "1"); //Environment.SetEnvironmentVariable("DOTNET_SYSTEM_GLOBALIZATION_INVARIANT", "1"); Console.WriteLine(new IdnMapping().GetAscii("\xD83D\xDE00")); } } ``` This works fine with both NLS and ICU. But when GlobalizationMode.Invariant is enabled (uncomment the relevant line above), it throws an exception: ``` Unhandled exception. System.ArgumentException: Left to right characters may not be mixed with right to left characters in IDN labels. (Parameter 'unicode') at System.Globalization.IdnMapping.PunycodeEncode(String unicode) at System.Globalization.IdnMapping.GetAsciiInvariant(String unicode, Int32 index, Int32 count) at System.Globalization.IdnMapping.GetAscii(String unicode, Int32 index, Int32 count) at System.Globalization.IdnMapping.GetAscii(String unicode, Int32 index) at System.Globalization.IdnMapping.GetAscii(String unicode) ```
Author: stephentoub
Assignees: -
Labels: `design-discussion`, `area-System.Globalization`
Milestone: Future