dart-lang / sdk

The Dart SDK, including the VM, JS and Wasm compilers, analysis, core libraries, and more.
https://dart.dev
BSD 3-Clause "New" or "Revised" License
10.08k stars 1.56k forks source link

Uri.toString() for "mailto" does not encode spaces properly (`+` instead of `%20`) #43838

Open Delgan opened 3 years ago

Delgan commented 3 years ago

Hi.

This ticket was first opened on flutter bug tracker: https://github.com/flutter/flutter/issues/68406

It seems Uri.toString() convert spaces to + in subject and body for mailto scheme.

final uri = Uri(
  scheme: 'mailto',
  path: 'smith@example.com',
  queryParameters: {
    'subject': 'Example Subject & Symbols are allowed!'
  }
);

print(uri.toString());
// mailto:smith@example.com?subject=Example+Subject+%26+Symbols+are+allowed%21

This seems to be a problem, when the link is opened with url_launcher for example then the subject displayed in the mail app is like this:

Example+Subject+&+Symbols+are+allowed!

For some reason, the app does not handle + symbols in the uri as expected. However, it works fine if %20 is used instead.

I tested this with url_launcher v5.7.5, flutter v1.23.0, dart v2.10.0, Android v8.0.0 and opening the link with Gmail app.

Should the spaces be automatically converted to %20 instead to ensure portability?

lrhn commented 3 years ago

The Dart Uri class does not recognize most schemes specially. The mailto: scheme is treated the same as any other scheme, so the query part is encoded using URL encoding, which is why it uses + for spaces.

SamadiPour commented 3 years ago

Same here. How to fix it?

lrhn commented 3 years ago

The lazy approach would be:

print(uri.toString().replaceAll("+", "%20"));

A more intricate approach could be:

final uri = Uri.parse('mailto:smith@example.com?subject=' + 
   'Example Subject & Symbols are allowed!'.replaceAll(" ", "%20"));

print(uri.toString());

or

final uri = Uri(
  scheme: 'mailto',
  path: 'smith@example.com',
  query: 
    'subject=Example Subject & Symbols are allowed!'.replaceAll(" ", "%20")
);

print(uri.toString());

There currently is no way to use the queryParameters argument and get what you want. There is no way to configure the URI to not use standard URL encoding for query parameters.

(If we do a complete rewrite of the Uri class at some point, which we want to for several reasons, mainly compatibility with the browser Url class, then we will likely start recognizing "web" URIs - http and web sockets - specifically, and not treat all other URIs as if they are necessarily the same as HTTP. Until such time, we are unlikely to make any breaking changes to the Uri class.)

Delgan commented 3 years ago

I'm personally constructing the uri string myself by calling Uri.encodeComponent() for each parameter. It encodes " " to "%20" while Uri.encodeQueryComponent() (which I guess is used internally by Uri.toString()) encodes it to "+".

final subject = Uri.encodeComponent("Example Subject & Symbols are allowed!")
final uri = "mailto:smith@example.com?subject=$subject";

It seems to work fine.

stuartmorgan commented 3 years ago

The Dart Uri class does not recognize most schemes specially. The mailto: scheme is treated the same as any other scheme, so the query part is encoded using URL encoding, which is why it uses + for spaces.

This is backwards though, as the generic form of URL encoding is percent encoding (click on your link there, then click on "URL encoding" in that text, which leads to https://en.wikipedia.org/wiki/Percent-encoding). The issue isn't that mailto isn't getting special handling, it's that all schemes are being treated as if they are HTML form query strings, which is where the use of + comes from.

There is no way to configure the URI to not use standard URL encoding for query parameters.

The correct statement is that there is no way to configure Uri to use standard URL encoding for query parameters, rather than applying HTML form encoding to every scheme.

A more intricate approach could be:

final uri = Uri.parse('mailto:smith@example.com?subject=' + 
   'Example Subject & Symbols are allowed!'.replaceAll(" ", "%20"));

print(uri.toString());

or

final uri = Uri(
  scheme: 'mailto',
  path: 'smith@example.com',
  query: 
    'subject=Example Subject & Symbols are allowed!'.replaceAll(" ", "%20")
);

print(uri.toString());

These suggestions are both incorrect, as they will not encode anything else. Notably in this case, the subject will just be "Example Subject ", because the unencoded & will end that query argument.

Until such time, we are unlikely to make any breaking changes to the Uri class.

Perhaps an alternate constructor that does the right thing as an interim solution? This issue makes it very hard for people to correctly use the Uri class for anything that's not an http* URI. (Exhibit A: the fact that your example workarounds are not actually right, and will break things in other ways.)

Or, alternately, switch on the scheme and keep the existing behavior for http/https, and do a breaking change for any others? While it is a breaking change, it's hard to imagine many scenarios where anyone would prefer the current behavior for other schemes.

stuartmorgan commented 3 years ago

(Also, perhaps the encodeComponent docs could be updated in the short term; currently they say "For encoding the query part consider using encodeQueryComponent." which is actively misleading advice for any non-http URI. It could say "For encoding the query part of an HTTP or HTTPS URI, consider ...")

stuartmorgan commented 3 years ago

The simplest safe solution I can see at this point to use Uri for anything that has query parameters but isn't an http* URL is to manually re-implement what queryParameters should do inline, using encodedComponent. E.g.:

final parameters = <String, String>{
  'subject': 'Example Subject & Symbols are allowed!',
  'body': 'Here too! &+=',
};
final uri = Uri(
  scheme: 'mailto',
  path: 'smith@example.com',
  query: parameters.entries.map((e) => '${Uri.encodeComponent(e.key)}=${Uri.encodeComponent(e.value)}').join('&'),
);
print(uri.toString());

For url_launcher I think we'll probably need to provide a utility method to do that query encoding, since that's a lot to put in a recommended code samples.

Hixie commented 3 years ago

https://url.spec.whatwg.org/ is the controlling standard here and it describes our current behaviour as invalid. Since we are not following the standard, we should fix our behaviour accordingly.

lrhn commented 3 years ago

Changing our behavior to match the whatwg spec was one of our plans for Dart 2.0, but it didn't pan out (we significantly over-estimated how much change we could do in the time available). See https://github.com/dart-lang/sdk/issues/29420

It's still a worthy goal, and would make web compilers save a lot of code, but also a highly breaking change to anyone using the existing Uri class. One strategy (which we have used before with varying success) could be to copy the existing Uri class to a package, and re-implement the API with whatwg-URL compatible semantics in the platform libraries. People depending heavily on the existing behavior can migrate to the package then.

It's going to be a big undertaking. The Uri class is the largest class in the platform libraries, by a significant margin. The uri.dart file is larger than all other .dart files in the SDK except the auto-generated html_dart2js.dart. It's bigger than http_impl.dart!

(Reducing the complexity was one of the reasons for rewriting Uri, to not have to do copious amounts of up-front normalization by default).

stuartmorgan commented 3 years ago

Can you comment on the possiblity of doing one of the two small changes I suggested above (a new named constructor for non-web Uris, or a breaking change for the constructor just for non-web schemes)?

I understand there are other issues that make a major rewrite desirable, but if we could avoid letting the perfect being the enemy of the good in fixing this particular issue in the short term it would make it feasible to use Uri directly for common cases like mailto:, which currently is not the case.

Hixie commented 3 years ago

a highly breaking change to anyone using the existing Uri class.

I don't see how making our behaviour less broken could be considered a breaking change. Could you elaborate on the flow that you're concerned would break?

lrhn commented 3 years ago

If we don't normalize on creation (and the whatwg spec doesn't), then people relying on creating two Uris and checking whether they are equal will start seeing different results for things that were equivalent, but not equal. That is, unless we make == normalize first, or do a comparison which recognize equivalences of characters, which is probably as expensive as normalizing (except for not allocating the result, but therefore also having to do it every time you call ==).

lrhn commented 3 years ago

A new constructor which does only rudimentary normalization (and the normalize() would have to do full normalization) could be an option. Perhaps Uri.raw(String scheme, String content) where we behave "normally" for http, https, ws, wss, and file URIs, perhaps also package ones (with normal being Uri.parse("$scheme;$content")). For the rest, we create a URI with scheme as scheme and content as path, no matter what the content contains. I guess we need to make % escapes valid, and escape non-printable ASCII characters, possibly also encode non-ASCII letters. Something like:

  /// Creates a URI with the given scheme and content.
  ///
  /// The [scheme] must be a valid, non-empty scheme and
  /// the [content] is everything which occurs
  /// after the colon following the scheme.
  ///
  /// If [scheme] is a *recognized scheme*,
  /// one of `http`, `https`, `ws`, `wss`, `file` or `package`,
  /// the URI is normalized as normal for `Uri.parse("$scheme:$content")`.
  /// Otherwise the URI is assumed to be non-hierarchical and
  /// the only normalization which happens is scheme normalization,
  /// percent-escape normalization of existing escapes,
  /// and percent-escaping of non-visible-ASCII characters of the content,
  /// where non-ASCII characters are UTF-8 encoded first.
  ///
  /// The entirety of the normalized content will be accessible as the [path]
  /// of the created URI.
  ///
  /// Example:
  /// ```dart
  /// var uri = Uri.raw("Unknown", "#!%%0a?/\xa0::\n[]");
  /// print(uri.path); // #!%25%0A?/%C2%A0::%0A[]
  /// ```
  factory Uri.raw(String scheme, String content) => ...

That should cover mailto URIs and other unknown URI schemes. WDYT?

stuartmorgan commented 3 years ago

A new constructor which does only rudimentary normalization (and the normalize() would have to do full normalization) could be an option.

I'm confused; I don't think this issue has anything to do with normalization. AFAICT it's purely a question of whether queryParameters provided in the constructor are encoded using percent escaping (standard URI encoding; what encodeComponent currently does) or HTML form encoding (what encodeQueryComponent currently does).

Consider this code:

// Case a.
final parameters = <String, String>{
  'subject': 'a & b',
  'body': 'c & d',
};
var uri = Uri(
  scheme: 'mailto',
  path: 'smith@example.com',
  queryParameters: parameters,
);
print(uri.toString());

// Case b.  
uri = Uri(
  scheme: 'mailto',
  path: 'smith@example.com',
  query: parameters.entries.map((e) => '${Uri.encodeQueryComponent(e.key)}=${Uri.encodeQueryComponent(e.value)}').join('&'),
);
print(uri.toString());

// Case c.
uri = Uri(
  scheme: 'mailto',
  path: 'smith@example.com',
  query: parameters.entries.map((e) => '${Uri.encodeComponent(e.key)}=${Uri.encodeComponent(e.value)}').join('&'),
);
print(uri.toString());

It prints:

mailto:smith@example.com?subject=a+%26+b&body=c+%26+d
mailto:smith@example.com?subject=a+%26+b&body=c+%26+d
mailto:smith@example.com?subject=a%20%26%20b&body=c%20%26%20d

The first two (a and b) are identical (presumably the first is actually using encodeQueryCompontent internally?), and wrong. The third (c) is the correct encoding—if it's being normalized internally on construction, that normalization is harmless for the purposes of this issue.

I'm simply suggesting a new named constructor that is identical to the Uri constructor except that it does percent encoding—not HTML form encoding—for queryParameters.

Perhaps Uri.raw(String scheme, String content) [...] That should cover mailto URIs and other unknown URI schemes. WDYT?

That seems equivalent to Uri.parse(Uri.encodeFull('$scheme://$content')), so doesn't really solve this problem; the developer is still entirely on their own in correctly handling query arguments. (It also doesn't seem to improve at all over simply telling people to use query rather than queryParameters in the current Uri constructor.)

stuartmorgan commented 3 years ago

To elaborate on the background here, as an illustration of the use case I'm trying to solve: the url_launcher plugin for Flutter takes a URL (currently as a String, but ideally a Uri in the future), and hands it off to the OS for handling. These can be anything, but are commonly http(s), mailto, or sms. For mailto (and to a lesser extend sms) we frequently see issues because many, many developers—unsurprisingly, as it's non-trivial—do not understand how to correctly construct non-trivial URLs. The two most common failures are:

To help users avoid these pitfalls, we added an example to the plugin README that demonstrates how to correctly (we thought; I didn't realize that Uri used the wrong encoding method for queryParameters) construct a URL with query parameters using Uri without having to understand intricacies of how individual components of URLs are encoded. That example is the code block that's in the OP here.

My goal here is to have a similarly-high-level example code to replace that incorrect example. Given that the issue here is with Uri, my preference would be for that similarly-high-level code to be part of Uri rather than the plugin providing its own URL construction helper.

Anything that involves the developer already having had to correctly construct a properly encoded query string from individual, unencoded components doesn't meet that goal.

Hixie commented 3 years ago

I don't see why we need a new constructor. The current one is just behaving incorrectly. I could see an argument for accepting an optional named boolean parameter, which is basically what the standard does, that enables the +-for-space thing, but that's about it IMHO.

stuartmorgan commented 3 years ago

I could see an argument for accepting an optional named boolean parameter, which is basically what the standard does, that enables the +-for-space thing

That works for me as well. I'm just trying to find an option that doesn't involve waiting an indefinitely long time to rewrite everything, given that it sounds like changing behavior—even incorrect behavior—before that point is a non-starter. (I'm not saying I agree that we shouldn't just change the behavior to be right, just that if that's a non-negotiable, I would like some other option that can be implemented in the short term.)

Hixie commented 3 years ago

Changing broken behaviour should not be a non-starter. What we have today is a bug. We are corrupting user data.

lrhn commented 3 years ago

The Uri class was never WHATWG URL compliant, and has never tried to be. It's implementing RFC 3986, and it probably predates the WHATWG URL spec by at least a year.

The encoding of queryParameters is intended to match the behavior of a browser submitting a form with those values. It uses "www-form-urlencoded" representation of the query components deliberately to do that. It's not the only possible option, but it was the one chosen at the time.

If that's not what users are expecting, then that is a usability problem, but it is not a bug.

We can add a flag to the constructor to make it use percent normal encoding instead of +-for-space. That's non-breaking as long as it chooses www-form-urlencoded as the default. It's easy for the Uri constructor, harder for Uri.http/Uri.https because they already have an optional positional parameter, so adding another boolean parameter would need to make it positional too. For those, a breaking change might actually be easier, and then we might as well break the default for every constructor taking a queryParameters argument.

In the longer run, we should do a full breaking change to become WHATWG URL compatible. It would help our JS compilers and probably confuse users less. (Again, I'd have preferred to do that at the Dart 2.0 release, but it didn't make the cut). Maybe we should introduce a Url class for that, to follow the WHATWG naming as well. That will likely still be a breaking change if we change the signature of methods expecting Uri to expect Url, even if we keep Uri and makes it implement Url. Not an easy change, no matter how we cut it (maybe if we can get extension types with constructors).

Hixie commented 3 years ago

RFC 3986 doesn't specify any sort of + encoding for 0x20. It's actually less compliant if the target is that RFC. Indeed, that RFC defines a normalization scheme that considers + and %20 in the query component to be distinct. There is a nod to the concept of "scheme-based normalization" which could have (though wasn't) used to define this +-for-space encoding convention for http and https in section 6.2.3, but that's really the closest that the RFC comes to this concept and it's not sufficient to claim that our behaviour is conforming to that RFC.

That's non-breaking as long as

The current behaviour is already broken. This isn't a situation where we are doing something valid but want to change to something else valid and we have to worry about compatibility. We are literally misencoding ASCII space and causing user data corruption as a result. It's just a bug.

stuartmorgan commented 3 years ago

We can add a flag to the constructor to make it use percent normal encoding instead of +-for-space. That's non-breaking as long as it chooses www-form-urlencoded as the default. It's easy for the Uri constructor, harder for Uri.http/Uri.https because they already have an optional positional parameter, so adding another boolean parameter would need to make it positional too. For those, a breaking change might actually be easier, and then we might as well break the default for every constructor taking a queryParameters argument.

What's the next step here? E.g., is there a formal process for deciding whether to make the breaking change here?

stuartmorgan commented 3 years ago

What's the next step here? E.g., is there a formal process for deciding whether to make the breaking change here?

@lrhn Ping on this question? We're continuing to have users bitten by this issue, and I need to know if this is going to be fixed in Dart, or a workaround added to url_launcher.

lrhn commented 3 years ago

I guess the next step is either:

  1. A breaking change request (followed up by implementing a breaking change to the current behavior), or
  2. Deciding on a usable non-breaking workaround and implementing that.

The breaking change can be either:

A non-breaking change could be:

The easiest to implement is just a clean break switching to new behavior. If you want + in your query, you need to put it there yourself. The least-breaking-but-still-breaking approach would be to change the behavior for non-http-like URIs, but keep the current behavior for http-like ones.

So the next step is to decide which solution to go for, then we can file a breaking change request for that (if necessary, which is probably likely, the non-breaking approaches are not great).

armandojimenez commented 3 years ago

Any updates on this? having this problem with the url_launcher plugin

stuartmorgan commented 3 years ago
  • Just changing the behavior to not do form-encoding (spaces become %20 instead of + when encoding the query).
  • Do that, but add an extra flag to trigger the old behavior, so users who depend on it (if any) can keep using it.

Either of these options sounds good to me. Who are the other stakeholders who need to weigh in prior to making the breaking change request?

vincevargadev commented 1 year ago

Any updates on this? having this problem with the url_launcher plugin

Just as a side note, I created the mailto package that hopefully is "good enough" for most use-cases where you would want to launch an email client with a pre-filled message, subject, and recipients.

laszukdawid commented 7 months ago

Ping. :eyes: