Open Delgan opened 3 years ago
The Dart Uri
class does not recognize most schemes specially. The mailto:
scheme is treated the same as any other scheme, so the query part is encoded using URL encoding, which is why it uses +
for spaces.
Same here. How to fix it?
The lazy approach would be:
print(uri.toString().replaceAll("+", "%20"));
A more intricate approach could be:
final uri = Uri.parse('mailto:smith@example.com?subject=' +
'Example Subject & Symbols are allowed!'.replaceAll(" ", "%20"));
print(uri.toString());
or
final uri = Uri(
scheme: 'mailto',
path: 'smith@example.com',
query:
'subject=Example Subject & Symbols are allowed!'.replaceAll(" ", "%20")
);
print(uri.toString());
There currently is no way to use the queryParameters
argument and get what you want. There is no way to configure the URI to not use standard URL encoding for query parameters.
(If we do a complete rewrite of the Uri
class at some point, which we want to for several reasons, mainly compatibility with the browser Url
class, then we will likely start recognizing "web" URIs - http and web sockets - specifically, and not treat all other URIs as if they are necessarily the same as HTTP. Until such time, we are unlikely to make any breaking changes to the Uri
class.)
I'm personally constructing the uri
string myself by calling Uri.encodeComponent()
for each parameter. It encodes " "
to "%20"
while Uri.encodeQueryComponent()
(which I guess is used internally by Uri.toString()
) encodes it to "+"
.
final subject = Uri.encodeComponent("Example Subject & Symbols are allowed!")
final uri = "mailto:smith@example.com?subject=$subject";
It seems to work fine.
The Dart
Uri
class does not recognize most schemes specially. Themailto:
scheme is treated the same as any other scheme, so the query part is encoded using URL encoding, which is why it uses+
for spaces.
This is backwards though, as the generic form of URL encoding is percent encoding (click on your link there, then click on "URL encoding" in that text, which leads to https://en.wikipedia.org/wiki/Percent-encoding). The issue isn't that mailto
isn't getting special handling, it's that all schemes are being treated as if they are HTML form query strings, which is where the use of +
comes from.
There is no way to configure the URI to not use standard URL encoding for query parameters.
The correct statement is that there is no way to configure Uri
to use standard URL encoding for query parameters, rather than applying HTML form encoding to every scheme.
A more intricate approach could be:
final uri = Uri.parse('mailto:smith@example.com?subject=' + 'Example Subject & Symbols are allowed!'.replaceAll(" ", "%20")); print(uri.toString());
or
final uri = Uri( scheme: 'mailto', path: 'smith@example.com', query: 'subject=Example Subject & Symbols are allowed!'.replaceAll(" ", "%20") ); print(uri.toString());
These suggestions are both incorrect, as they will not encode anything else. Notably in this case, the subject will just be "Example Subject ", because the unencoded &
will end that query argument.
Until such time, we are unlikely to make any breaking changes to the
Uri
class.
Perhaps an alternate constructor that does the right thing as an interim solution? This issue makes it very hard for people to correctly use the Uri
class for anything that's not an http*
URI. (Exhibit A: the fact that your example workarounds are not actually right, and will break things in other ways.)
Or, alternately, switch on the scheme and keep the existing behavior for http/https, and do a breaking change for any others? While it is a breaking change, it's hard to imagine many scenarios where anyone would prefer the current behavior for other schemes.
(Also, perhaps the encodeComponent docs could be updated in the short term; currently they say "For encoding the query part consider using encodeQueryComponent." which is actively misleading advice for any non-http URI. It could say "For encoding the query part of an HTTP or HTTPS URI, consider ...")
The simplest safe solution I can see at this point to use Uri for anything that has query parameters but isn't an http* URL is to manually re-implement what queryParameters
should do inline, using encodedComponent
. E.g.:
final parameters = <String, String>{
'subject': 'Example Subject & Symbols are allowed!',
'body': 'Here too! &+=',
};
final uri = Uri(
scheme: 'mailto',
path: 'smith@example.com',
query: parameters.entries.map((e) => '${Uri.encodeComponent(e.key)}=${Uri.encodeComponent(e.value)}').join('&'),
);
print(uri.toString());
For url_launcher
I think we'll probably need to provide a utility method to do that query encoding, since that's a lot to put in a recommended code samples.
https://url.spec.whatwg.org/ is the controlling standard here and it describes our current behaviour as invalid. Since we are not following the standard, we should fix our behaviour accordingly.
Changing our behavior to match the whatwg spec was one of our plans for Dart 2.0, but it didn't pan out (we significantly over-estimated how much change we could do in the time available). See https://github.com/dart-lang/sdk/issues/29420
It's still a worthy goal, and would make web compilers save a lot of code, but also a highly breaking change to anyone using the existing Uri
class.
One strategy (which we have used before with varying success) could be to copy the existing Uri
class to a package, and re-implement the API with whatwg-URL compatible semantics in the platform libraries. People depending heavily on the existing behavior can migrate to the package then.
It's going to be a big undertaking. The Uri
class is the largest class in the platform libraries, by a significant margin. The uri.dart
file is larger than all other .dart
files in the SDK except the auto-generated html_dart2js.dart
. It's bigger than http_impl.dart
!
(Reducing the complexity was one of the reasons for rewriting Uri
, to not have to do copious amounts of up-front normalization by default).
Can you comment on the possiblity of doing one of the two small changes I suggested above (a new named constructor for non-web Uris, or a breaking change for the constructor just for non-web schemes)?
I understand there are other issues that make a major rewrite desirable, but if we could avoid letting the perfect being the enemy of the good in fixing this particular issue in the short term it would make it feasible to use Uri
directly for common cases like mailto:
, which currently is not the case.
a highly breaking change to anyone using the existing
Uri
class.
I don't see how making our behaviour less broken could be considered a breaking change. Could you elaborate on the flow that you're concerned would break?
If we don't normalize on creation (and the whatwg spec doesn't), then people relying on creating two Uri
s and checking whether they are equal will start seeing different results for things that were equivalent, but not equal.
That is, unless we make ==
normalize first, or do a comparison which recognize equivalences of characters, which is probably as expensive as normalizing (except for not allocating the result, but therefore also having to do it every time you call ==
).
A new constructor which does only rudimentary normalization (and the normalize()
would have to do full normalization) could be an option.
Perhaps Uri.raw(String scheme, String content)
where we behave "normally" for http
, https
, ws
, wss
, and file
URIs, perhaps also package
ones (with normal being Uri.parse("$scheme;$content")
). For the rest, we create a URI with scheme
as scheme and content
as path
, no matter what the content
contains. I guess we need to make %
escapes valid, and escape non-printable ASCII characters, possibly also encode non-ASCII letters.
Something like:
/// Creates a URI with the given scheme and content.
///
/// The [scheme] must be a valid, non-empty scheme and
/// the [content] is everything which occurs
/// after the colon following the scheme.
///
/// If [scheme] is a *recognized scheme*,
/// one of `http`, `https`, `ws`, `wss`, `file` or `package`,
/// the URI is normalized as normal for `Uri.parse("$scheme:$content")`.
/// Otherwise the URI is assumed to be non-hierarchical and
/// the only normalization which happens is scheme normalization,
/// percent-escape normalization of existing escapes,
/// and percent-escaping of non-visible-ASCII characters of the content,
/// where non-ASCII characters are UTF-8 encoded first.
///
/// The entirety of the normalized content will be accessible as the [path]
/// of the created URI.
///
/// Example:
/// ```dart
/// var uri = Uri.raw("Unknown", "#!%%0a?/\xa0::\n[]");
/// print(uri.path); // #!%25%0A?/%C2%A0::%0A[]
/// ```
factory Uri.raw(String scheme, String content) => ...
That should cover mailto
URIs and other unknown URI schemes. WDYT?
A new constructor which does only rudimentary normalization (and the
normalize()
would have to do full normalization) could be an option.
I'm confused; I don't think this issue has anything to do with normalization. AFAICT it's purely a question of whether queryParameters
provided in the constructor are encoded using percent escaping (standard URI encoding; what encodeComponent
currently does) or HTML form encoding (what encodeQueryComponent
currently does).
Consider this code:
// Case a.
final parameters = <String, String>{
'subject': 'a & b',
'body': 'c & d',
};
var uri = Uri(
scheme: 'mailto',
path: 'smith@example.com',
queryParameters: parameters,
);
print(uri.toString());
// Case b.
uri = Uri(
scheme: 'mailto',
path: 'smith@example.com',
query: parameters.entries.map((e) => '${Uri.encodeQueryComponent(e.key)}=${Uri.encodeQueryComponent(e.value)}').join('&'),
);
print(uri.toString());
// Case c.
uri = Uri(
scheme: 'mailto',
path: 'smith@example.com',
query: parameters.entries.map((e) => '${Uri.encodeComponent(e.key)}=${Uri.encodeComponent(e.value)}').join('&'),
);
print(uri.toString());
It prints:
mailto:smith@example.com?subject=a+%26+b&body=c+%26+d
mailto:smith@example.com?subject=a+%26+b&body=c+%26+d
mailto:smith@example.com?subject=a%20%26%20b&body=c%20%26%20d
The first two (a and b) are identical (presumably the first is actually using encodeQueryCompontent
internally?), and wrong. The third (c) is the correct encoding—if it's being normalized internally on construction, that normalization is harmless for the purposes of this issue.
I'm simply suggesting a new named constructor that is identical to the Uri
constructor except that it does percent encoding—not HTML form encoding—for queryParameters
.
Perhaps
Uri.raw(String scheme, String content)
[...] That should covermailto
URIs and other unknown URI schemes. WDYT?
That seems equivalent to Uri.parse(Uri.encodeFull('$scheme://$content'))
, so doesn't really solve this problem; the developer is still entirely on their own in correctly handling query arguments. (It also doesn't seem to improve at all over simply telling people to use query
rather than queryParameters
in the current Uri
constructor.)
To elaborate on the background here, as an illustration of the use case I'm trying to solve: the url_launcher
plugin for Flutter takes a URL (currently as a String
, but ideally a Uri
in the future), and hands it off to the OS for handling. These can be anything, but are commonly http(s)
, mailto
, or sms
. For mailto
(and to a lesser extend sms
) we frequently see issues because many, many developers—unsurprisingly, as it's non-trivial—do not understand how to correctly construct non-trivial URLs. The two most common failures are:
var myURL = 'mailto:$emailAddress?subject=$subject&body=$body';
- Usually there are spaces in subject
or body
so this isn't even remotely valid as a URL; on iOS at least this will simply fail to launch.var myURL = Url.encodeFull([Insert string from above here]);
- Works a lot of the time, but fails for certain special characters, most notably &
as discussed above (it will launch, but won't have either the subject or body they intended).To help users avoid these pitfalls, we added an example to the plugin README that demonstrates how to correctly (we thought; I didn't realize that Uri
used the wrong encoding method for queryParameters
) construct a URL with query parameters using Uri
without having to understand intricacies of how individual components of URLs are encoded. That example is the code block that's in the OP here.
My goal here is to have a similarly-high-level example code to replace that incorrect example. Given that the issue here is with Uri
, my preference would be for that similarly-high-level code to be part of Uri
rather than the plugin providing its own URL construction helper.
Anything that involves the developer already having had to correctly construct a properly encoded query
string from individual, unencoded components doesn't meet that goal.
I don't see why we need a new constructor. The current one is just behaving incorrectly. I could see an argument for accepting an optional named boolean parameter, which is basically what the standard does, that enables the +
-for-space thing, but that's about it IMHO.
I could see an argument for accepting an optional named boolean parameter, which is basically what the standard does, that enables the
+
-for-space thing
That works for me as well. I'm just trying to find an option that doesn't involve waiting an indefinitely long time to rewrite everything, given that it sounds like changing behavior—even incorrect behavior—before that point is a non-starter. (I'm not saying I agree that we shouldn't just change the behavior to be right, just that if that's a non-negotiable, I would like some other option that can be implemented in the short term.)
Changing broken behaviour should not be a non-starter. What we have today is a bug. We are corrupting user data.
The Uri
class was never WHATWG URL compliant, and has never tried to be. It's implementing RFC 3986, and it probably predates the WHATWG URL spec by at least a year.
The encoding of queryParameters
is intended to match the behavior of a browser submitting a form with those values. It uses "www-form-urlencoded" representation of the query components deliberately to do that. It's not the only possible option, but it was the one chosen at the time.
If that's not what users are expecting, then that is a usability problem, but it is not a bug.
We can add a flag to the constructor to make it use percent normal encoding instead of +
-for-space. That's non-breaking as long as it chooses www-form-urlencoded
as the default.
It's easy for the Uri
constructor, harder for Uri.http
/Uri.https
because they already have an optional positional parameter, so adding another boolean parameter would need to make it positional too.
For those, a breaking change might actually be easier, and then we might as well break the default for every constructor taking a queryParameters
argument.
In the longer run, we should do a full breaking change to become WHATWG URL compatible. It would help our JS compilers and probably confuse users less. (Again, I'd have preferred to do that at the Dart 2.0 release, but it didn't make the cut).
Maybe we should introduce a Url
class for that, to follow the WHATWG naming as well.
That will likely still be a breaking change if we change the signature of methods expecting Uri
to expect Url
, even if we keep Uri
and makes it implement Url
. Not an easy change, no matter how we cut it (maybe if we can get extension types with constructors).
RFC 3986 doesn't specify any sort of +
encoding for 0x20. It's actually less compliant if the target is that RFC. Indeed, that RFC defines a normalization scheme that considers +
and %20
in the query component to be distinct. There is a nod to the concept of "scheme-based normalization" which could have (though wasn't) used to define this +
-for-space encoding convention for http
and https
in section 6.2.3, but that's really the closest that the RFC comes to this concept and it's not sufficient to claim that our behaviour is conforming to that RFC.
That's non-breaking as long as
The current behaviour is already broken. This isn't a situation where we are doing something valid but want to change to something else valid and we have to worry about compatibility. We are literally misencoding ASCII space and causing user data corruption as a result. It's just a bug.
We can add a flag to the constructor to make it use percent normal encoding instead of
+
-for-space. That's non-breaking as long as it chooseswww-form-urlencoded
as the default. It's easy for theUri
constructor, harder forUri.http
/Uri.https
because they already have an optional positional parameter, so adding another boolean parameter would need to make it positional too. For those, a breaking change might actually be easier, and then we might as well break the default for every constructor taking aqueryParameters
argument.
What's the next step here? E.g., is there a formal process for deciding whether to make the breaking change here?
What's the next step here? E.g., is there a formal process for deciding whether to make the breaking change here?
@lrhn Ping on this question? We're continuing to have users bitten by this issue, and I need to know if this is going to be fixed in Dart, or a workaround added to url_launcher
.
I guess the next step is either:
The breaking change can be either:
%20
instead of +
when encoding the query).http
/https
schemes (maybe a small list of other similar schemes). Likely needs the flag.A non-breaking change could be:
replace
).Url
class with different behavior. (Long term goal!)The easiest to implement is just a clean break switching to new behavior. If you want +
in your query, you need to put it there yourself.
The least-breaking-but-still-breaking approach would be to change the behavior for non-http
-like URIs, but keep the current behavior for http
-like ones.
So the next step is to decide which solution to go for, then we can file a breaking change request for that (if necessary, which is probably likely, the non-breaking approaches are not great).
Any updates on this? having this problem with the url_launcher plugin
- Just changing the behavior to not do form-encoding (spaces become
%20
instead of+
when encoding the query).- Do that, but add an extra flag to trigger the old behavior, so users who depend on it (if any) can keep using it.
Either of these options sounds good to me. Who are the other stakeholders who need to weigh in prior to making the breaking change request?
Any updates on this? having this problem with the url_launcher plugin
Just as a side note, I created the mailto
package that hopefully is "good enough" for most use-cases where you would want to launch an email client with a pre-filled message, subject, and recipients.
Ping. :eyes:
Hi.
This ticket was first opened on
flutter
bug tracker: https://github.com/flutter/flutter/issues/68406It seems
Uri.toString()
convert spaces to+
insubject
andbody
formailto
scheme.This seems to be a problem, when the link is opened with
url_launcher
for example then the subject displayed in the mail app is like this:For some reason, the app does not handle
+
symbols in the uri as expected. However, it works fine if%20
is used instead.I tested this with
url_launcher
v5.7.5,flutter
v1.23.0,dart
v2.10.0, Android v8.0.0 and opening the link with Gmail app.Should the spaces be automatically converted to
%20
instead to ensure portability?