Redirect should encode URLs if it needs to

miloush commented 6 years ago

All these issues are result of people trying to redirect to a URL that is not all ASCII. Resolution seems to be "you should have encoded your URL".

However, the fact that the RedirectResult puts the URL into a HTTP header (which is the ultimate source of the issue) is an implementation detail that the caller cannot magically assume. It is therefore responsibility of e.g. RedirectResultExecutor to meet the requirements and ensure the URL is properly formatted in case it decides to use HTTP headers for redirection.

mkArtakMSFT commented 6 years ago

Thanks for contacting us, @miloush. @Tratcher, what are your thoughts regarding this?

Tratcher commented 6 years ago

We have helpers for generating properly encoded urls like UriHelper.BuildAbsolute. By the time you have combine everything into one string it's too late automatically encode, some parts are ambiguous like ? and # if present in the path.

Perhaps overhauling RedirectResult to take in PathString, QueryString, etc. rather than a full string?

miloush commented 6 years ago

Please note there are two ideas of "encoding" and I agree standard URL encode is too late for Redirect to do.

Perhaps overhauling RedirectResult to take in PathString, QueryString, etc. rather than a full string?

That wouldn't help much as you will face the same problem just with individual parts. The issue is you don't want a typical UrlEncode, you just want to fast-check and encode non-ASCII chars.

We have helpers for generating properly encoded urls

I am not disputing that, that is basically "you should have encoded your URL". But you can have Unicode URLs on the page itself (which is often preferred), it's only limitation of the Location HTTP header value used for redirecting that ASP.NET Core is enforcing, and the code that decides to use this mechanism should take care of that requirement.

Unless Unicode is allowed in other HTTP headers, this is a HTTP problem, not URL one. Perhaps FrameResponseHeaders.SetValueFast is the place where this should be happening rather than throwing.

Tratcher commented 6 years ago

Right, we need to be specific about the kind of encodings. To create an unambiguous url there are several syntax characters you may need to escape like /?#:%.& etc. since they affected the meaning of the individual parts. Then you may need to encode it for transport / embedding in your specific, especially unicode, but # and other symbols can also be problematic in certain context like HTML.

PathString and company are specifically designed to encode URLs for use in HTTP headers. They take care of both the syntax escaping when you combine components, as well as fully encoding any unicode. It's important to do both of these at the same time so you don't double encode anything.

Unicode has a long messy relationship with HTTP headers and it's best to be explicit about the encoding.

Tratcher commented 6 years ago

I've filed https://github.com/aspnet/Home/issues/3574 for a Response.Redirect overload that accepts the components.

miloush commented 6 years ago

Okay so nowadays when one wants to fix this issue, they need to create a special ASCII encoded URL to pass to the Redirect method, and use the Unicode URL in links and on pages, e.g. https://github.com/madskristensen/Miniblog.Core/commit/14ec33b42fcd1d9e5f44ceff85749189a614d8dd

Imagine you add the suggested overload to Response.Redirect that takes the components, how would that eliminate the need for such fix?

Tratcher commented 6 years ago

UriHelper.GetDisplayUrl is what I refer people to when they want the friendly display for a request url. The signature doesn't work well for self-generated urls (it takes an HttpRequest), but that's another case where adding an overload that takes in the same components and renders them in a user displayable format would make sense. Note that one doesn't do the syntax escaping, you'd only use it for the display text of a link and use the encoded version for the actual link.

miloush commented 6 years ago

Okay so for self-generated urls, you would recommend a sort of GetLink() method that returns (scheme, HostString, PathString, QueryString, FragmentString) and then feeding it to either Redirect or GetDisplayUrl? I am worried that just sticking to a string instead is much more convenient...

Tratcher commented 6 years ago

Understood on the convenience of a string. The trouble is that it doesn't provide any context, the user has to know if it's encoded or decoded and handle it accordingly. I'm starting to think we need a url structure that aggregates all of the components for easy flow. (Not System.Uri because that causes quite a few transformations and doesn't encode appropriately for HTTP headers or HTML pages).

Tratcher commented 6 years ago

@mkArtakMSFT Looking at RedirectResult/Executor it's a 1:1 mapping to Response.Redirect. @miloush is certainly right on one main point, these API do not tell you what to do nor help you do it successfully. The doc comments make no reference to any encoding requirements. I will dispute the bit about putting the value into a HTTP header being an implementation detail, that's the core mechanic for HTTP redirects, this API is only sugar around that. Unfortunately it's not enough sugar. Changing the API to take in PathString, QueryString, etc. or a higher level construct would give users much more guidance here.

mkArtakMSFT commented 6 years ago

Thanks @Tratcher. @javiercn, can you please handle the proposed change? Thanks!

javiercn commented 6 years ago

I understand the concerns here. As far as I understand this is just a limitation for the location header, on other places (other than HTML encoding) is totally valid to have non ASCII characters on the Uri.

That said, I'm not sure adding new APIs will do anything to solve this problem. Most people will simply keep calling the string based API (as is the most convenient one) to later realize that the Url is not properly encoded.

When that happens, I don't really see an advantage in adding one overload for each of the redirect methods that takes all the individual components as HostString, PathString, etc compared to using UriHelper before calling on the redirect methods as suggested above.

I think the main thing we can change here is to make generating correctly encoded urls in MVC easier. This is something that we can add to UrlHelper. That way, we don't needs these overloads and this just becomes or something similar.

return Redirect(UrlHelper.CreateUrl(host,port,base,path,query,fragment));

Tratcher commented 6 years ago

Efficient. It's still lacking in discoverability though.

miloush commented 6 years ago

I am not going to object any new overloads but I am with Javier that I don't think it will do much to solve the problem. "I'm starting to think we need a url structure" sounded like closest to what would help.

But I understand and value the framework is built on simple types and it is predictable what gets output.

My proposal was that instead of ValidateHeaderCharacters throwing, it would ASCII encode the header value. Why would that be a bad idea? You get zero performance loss in already conforming cases since the check needs to be done anyway.

rynowak commented 6 years ago

This would double-encode someone's already-endcoded string

miloush commented 6 years ago

This would double-encode someone's already-endcoded string

No. If somebody already encoded the string it would be all ASCII and nothing would get changed on the way.

Tratcher commented 6 years ago

Encoding deals with more than non-ascii characters. You can't really separate the encoding of syntax characters from the encoding of unicode characters, both need to be done at the same time.

miloush commented 6 years ago

Okay thank you, my bad - you cannot do it widely across all HTTP headers, because you need to decide what to do with the non-ASCII characters and URL-encoding them is only valid for URLs. Is there a reason why the RedirectResult/Executor shouldn't do that then?

javiercn commented 6 years ago

As @rynowak mentioned.

This would double-encode someone's already-endcoded string

Having RedirectResult do this for you would be a breaking change, so I don't think we want to go that way.

We should just make this simpler by adding a convenience API on UrlHelper (The place used to generate Urls in MVC) and maybe if possible an analyzer to detect simple bad patterns. (Like constructing the url on the argument to redirect through + or as a string literal, or in a variable)

miloush commented 6 years ago

As I said that is not true, unless I am missing something obvious. Could you give me an example where this would break something?

javiercn commented 6 years ago

@miloush It changes the response in an observable way. That has the potential to break clients relying on the current behavior.

This can be done, but it would require at least to be opt-in and require a compatibility flag to prevent breaking people upgrading.

miloush commented 6 years ago

The current behavior is to throw an exception - that is the only case when there would be a difference, all current cases which result in an actual response would stay intact. Does that count as changing response in an observable way and worth compatibility flag?

I am not religious about doing this, but people keep running into the problem and this seems to be the right place to fix it to make everyone's life easier.

sksk571 commented 2 years ago

Any update on this issue? We are migrating .NET framework app and this looks like a regression compared to .NET framework. Documentation for Redirect doesn't state that the URL should be properly encoded, therefore it's a responsibility of Redirect to encode it before passing into header.

Looking at the .NET4.8 source code, it appears that Response.Redirect selectively encodes whitespace and non-ASCII characters in URL by calling UrlEncodeRedirect: https://referencesource.microsoft.com/#System.Web/HttpResponse.cs,fb8b07aa51863d27

rjgotten commented 2 years ago

I fail to see how applying a percent-encoding on characters that are not ASCII-safe can lead to any kind of double-encoding bug.

Double encoding happening at all requires that (part of) the encoding's encoded sequences are again recognized as something that needs encoding. In the general sense this would be true of percent-encoding and the percentage-sign.

However the case being discussed here is different: it's only characters that are not ASCII-safe which need to be encoded. And existing percent-encoded sequences are entirely composed of ASCII-safe characters and thus would not be touched. It is literally impossible to hit a double-encoding problem.

Moreover; characters that have syntactic value in HTTP URLs are all ASCII-safe and would not be touched either, leading to zero jeopardy of accidentally producing a malformed version of the input URL.

A conceptual Uri.EscapeAsciiSafeUriString static method should have no problems whatsoever; and it should be perfectly safe to apply that automatically when setting the Location header.

Sure, have signature overloads for Controller.Redirect; HttpResponse.Redirect; etc. that take PathString ; QueryString; etc. arguments. But: the signature that takes the string type argument should do this fix-up automatically and be more permissive in what it accepts, since it's actually the signature with the least amount of confidence that the user knows what they're doing.

It's this conceptual new family of signatures that use stronger types, which should forego this fix-up and let the user burn for their mistake when the use of the stronger types indicates they knew what they were getting in to and had to be precise.

Also; fun fact:

new Uri("https://example.com/ë%20e").GetComponents(UriComponents.SerializationInfoString, UriFormat.UriEscaped)

produces:

"https://example.com/%C3%AB%20e"

Meaning it already leaves existing escape sequences alone - probably by virtue of the fact that constructing the URI unescapes the existing escape sequences. (Note the persistent %20 which is not double-encoded.) You just ... have to use it?

mirhagk commented 2 years ago

Yeah there definitely seems to be a lot of confusion in this discussion. The escaping being discussed is not a general URI escaping but just a non-ASCII specific escaping. There is no chance of a double-encode as a properly encoded string would not contain any non-ASCII characters. The only "breaking change" is that RedirectResult actually works now instead of throwing exceptions, and the chances that someone is depending on the framework throwing an exception here seems pretty low.

It's very counter-intuitive, especially as IRIs gain more popularity and browsers, javascript and the old .NET framework all support them fine without prior encoding. It seems odd that .NET core would take such a big step back in terms of internationalization.

ghost commented 2 years ago

Thanks for contacting us.

We're moving this issue to the .NET 8 Planning milestone for future evaluation / consideration. We would like to keep this around to collect more feedback, which can help us with prioritizing this work. We will re-evaluate this issue, during our next planning meeting(s). If we later determine, that the issue has no community involvement, or it's very rare and low-impact issue, we will close it - so that the team can focus on more important and high impact issues. To learn more about what to expect next and how this issue will be handled you can read more about our triage process here.

dotnet / aspnetcore

Redirect should encode URLs if it needs to #4919