dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.24k stars 4.73k forks source link

Uri.IsWellFormedOriginalString false negatives #72632

Open MihaZupan opened 2 years ago

MihaZupan commented 2 years ago

IsWellFormedOriginalString (and therefore also IsWellFormedUriString) will report false negatives in the following cases:

Case 1. occurred in #70929. Case 2. occurred in #21626, #34031, #37634, #64249, VS Feedback.

My suggestion as a workaround is to not use IsWellFormedOriginalString at all. Odds are that in most cases you don't care about the sort of validation that it does anyway. Consider using something like this instead:

static bool IsValidHttpUri(string uriString, out Uri uri) =>
    Uri.TryCreate(uriString, UriKind.Absolute, out uri) &&
    (uri.Scheme == Uri.UriSchemeHttp || uri.Scheme == Uri.UriSchemeHttps);
ghost commented 2 years ago

Tagging subscribers to this area: @dotnet/ncl See info in area-owners.md if you want to be subscribed.

Issue Details
`IsWellFormedOriginalString` will report false negatives in the following cases: - Case 1 - Occurs if the Uri contains both * Anything that triggers internal re-creation of the string ([Unicode or escaped unreserved chars](https://github.com/dotnet/runtime/blob/cf0f14cd43708df7a28eb1e4309915ba51e418dd/src/libraries/System.Private.Uri/src/System/UriExt.cs#L220)) * An escaped character from the ``" <>`^{|}`` set - This happens because [`EscapeUnescapeIri`](https://github.com/dotnet/runtime/blob/5b8d9eb3876a733963ebddbb71fc4449695f59c9/src/libraries/System.Private.Uri/src/System/IriHelper.cs#L108) removes the escaping from these characters, but [`CheckCanonical`](https://github.com/dotnet/runtime/blob/5b8d9eb3876a733963ebddbb71fc4449695f59c9/src/libraries/System.Private.Uri/src/System/Uri.cs#L4362-L4379) later reports that they should have been escaped (which they were, but we unescaped them) - Example: `http://a/%41%22` (`%41` is `A` (unreserved) and `%22` is `"`) - Case 2 - Occurs if the Uri contains both * A (not escaped) non-ASCII character (except those outside the [IRI range](https://github.com/dotnet/runtime/blob/0a7d601180256e1edd83631c0a42fd9128929a22/src/libraries/System.Private.Uri/src/System/IriHelper.cs#L51-L66)) * An escaped character from the `[0, 1F]` range, the `;/?:@&=+$,#[]!'()*%\` set ([reserved + `%`, `\`](https://github.com/dotnet/runtime/blob/cf0f14cd43708df7a28eb1e4309915ba51e418dd/src/libraries/System.Private.Uri/src/System/UriHelper.cs#L501-L502)), the ``" <>`^`` set or outside the IRI range - This happens because `EscapeUnescapeIri` removes some escaping and `CheckCanonical` then sees a mismatch of escaped, unescaped and non-ASCII characters - Example: `http://a/ű%20` (`ü` is non-ASCII and `%20` is ` `) Case 1. occurred in #70929. Case 2. occurred in #21626, #34031, #37634, #64249.
Author: MihaZupan
Assignees: MihaZupan
Labels: `bug`, `area-System.Net`, `tenet-compatibility`
Milestone: -
karelz commented 2 years ago

Triage:

Given all the above, it is not worth it to fix it this late of 7.0 release (non-trivial risk, non-trivial cost, medium-benefit). We should address it early in 8.0 to get bake time due to risk of the change.

tmenier commented 2 years ago

@karelz @MihaZupan All 4 of those issues closed "in favor of" this one are referring to Uri.IsWellFormedUriString, not Uri.IsWellFormedOriginalString, so just to be clear they are not duplicates. I'm not sure if the rationale in the triage comments above applies to both or not:

We have hard time to imagine almost ANY scenario where IsWellFormedOriginalString is needed to be called by the user explicitly. We believe most of the cases use it just because it looks like a good thing to call.

I'm less familiar with IsWellFormedOriginalString, but IsWellFormedUriString is static and the use case is fairly obvious: validation. This is the one people are complaining about, and I would suggest you re-open one of the 4 that were closed, or at least provide different rationale for closing them. I'd suggest reopening #21626 as it has the most activity and upvotes. Thanks.

MihaZupan commented 2 years ago

The two methods share the same issues. IsWellFormedUriString is implemented as

Uri.TryCreate(uriString, uriKind, out result) && result.IsWellFormedOriginalString()

IsWellFormedUriString is static and the use case is fairly obvious: validation.

Right, but was the intended validation exactly what IsWellFormedOriginalString is doing, or whether a Uri seems valid and can be created via Uri.TryCreate?

tmenier commented 2 years ago

Thanks for the clarification. Karel's comment that "it looks like a good thing to call" is certainly true in the case of IsWellFormedUriString. Without that knowledge of the implementation details, it seems appropriate when you just want to make sure a user-entered URI string is valid and you don't need a Uri object in the moment. (Perhaps someone is entering their web site URL in a form and you just want validate and save it, for example.)

I just think the general tone that "we can't imagine why anyone would ever call this" is likely much more applicable to IsWellFormedOriginalString than it is to IsWellFormedUriString....again, without knowledge of the implementation. :)

AntiGuideAkquinet commented 2 years ago

Just to give some context on how people could end up experiencing this problem:

Microsofts OData.NET library uses Uri.IsWellFormedUriString (matching issue) In my case it is used to talk to Microsofts Business Central webservices which contain tenant names in their URL that can contain non ASCII characters, spaces etc.. This issue is specific to the library and could be fixed with a workaround but has a big impact on a fraction of users in the meantime.

Example URL: "http://192.168.0.1:1234/Instance/ODataV4/Company('123-Customer Place Süd-Ost')/"

MihaZupan commented 2 years ago

Thanks for linking the issue. From the linked code, it does seem like this is one of those usages that fall into the We believe most of the cases use it just because it looks like a good thing to call. category.

The OData library should consider dropping this check entirely. As-is, it is checking Uri.IsWellFormedUriString(baseUri.AbsoluteUri, UriKind.Absolute) which should just always be true, as Uri.AbsoluteUri is always generating a well-formed string. That is, this check is doing nothing in this case, except for potentially hitting this false-negative bug.

levicki commented 1 year ago

@karelz

I am not sure if I should open a new issue (this one looks similar enough), but Uri.TryCreate seems fundamentally broken to me.

Minimal repro (tested on .Net Framework 4.8):

Uri.TryCreate(@"<li>No certified downloads were found for this configuration. To include beta downloads in your search, click <a href='Find.aspx?lang=en-us'>here</a>.</li>", UriKind.RelativeOrAbsolute, out Uri Result);

To my surprise that call returns true, and the result is... well, see for yourself: image

MihaZupan commented 1 year ago

This issue is only talking about IsValidOriginalString, so it's unrelated to your problem.

If you try to create a relative Uri, we'll accept most everything as valid input. If you don't want that, you should likely specify UriKind.Absolute instead.

levicki commented 1 year ago

This issue is only talking about IsValidOriginalString, so it's unrelated to your problem.

Thanks for clarification, I will create a new issue.

If you try to create a relative Uri, we'll accept most everything as valid input. If you don't want that, you should likely specify UriKind.Absolute instead.

What should I do if I want both absolute or relative? Roll my own check? I mean, why is there even an option for relative Uri TryCreate when there's absolutely no validation for it? I'd expect at least to check against allowed character list and return false for those that aren't (taking also invalid UTF-8 encodings into account).

julian94 commented 1 year ago

Just for reference: Uri.IsWellFormedUriString(@"<li>No certified downloads were found for this configuration. To include beta downloads in your search, click <a href='Find.aspx?lang=en-us'>here</a>.</li>", UriKind.RelativeOrAbsolute); Returns False, as it should.

levicki commented 1 year ago

@julian94 I understand.

However, the documentation for Uri.TryCreate does not say what kind of validation (if any) is performed when constructing an Uri. At the minimum, this should be clarified. I also believe that Uri.TryCreate should not succeed in creating an invalid Uri (relative or absolute).

I did create an issue just in case someone would like to close it with wontfix with extreme prejudice :-) https://github.com/dotnet/runtime/issues/78381

karelz commented 1 year ago

Sadly, we won't make it in 8.0 due to circumstances. Moving to Future for now.