Open bortzmeyer opened 5 years ago
Related Twitter thread https://twitter.com/bortzmeyer/status/1093062473475862528
This issue is related to the colon in the URL. When encoded (https://lg.op.example/.well-known/looking-glass/v1/ping/2001%3Adb8%3A%3A35), it works. It probably needs to be patched (by introducing a higher level function) in https://github.com/yosida95/uritemplate (ping @yosida95).
Hi, thanks for notifying me.
Hmm... I (probably) understand your needs and it seems patching https://github.com/yosida95/uritemplate would be most efficient way to fulfill it.
But I think /v1/ping/2001:db8::35
did not actually match against /v1/ping/{host}
. RFC 6570 says characters not in unreserved
in variables with normal operator should be percent encoded. So correct expansion of the template should be /v1/ping/2001%3Adb8%3A%3A35
and it match against the template as expected.
If we will implement that behavior I think patching .Match()
and the matcher VM behind it is not the way. I think we can agree on this point, as @dunglas mentioned "by introducing a higher level function" above. If doing so, we will be no longer able to differentiate between, for example, {var}
and {+var}
, we must treat non-Latin characters (because a variable takes values consist of any valid UTF-8 characters) properly, and the matcher VM's stack will grow a lot.
Therefore we should encode input ("raw") URL to percent encoded with some exceptions (e.g. slashes, colons in scheme://host:port, crosshatches, etc...) first, then construct *Template
from the encoded URL and call .Match()
on it. But characters to be encoded is not deterministic. Let's think we have a template /~user/{path}
and {path}
have hello/world
. The URL expanded form the template and what .Match()
accepts is /~user/hello%2Fworld
, So we must have a function only URL encodes only the third slash.
For the record, URI.js also behaves as described by @yosida95 https://codepen.io/dunglas/pen/OJyGmKj. It would be nice to test if Spring and Microsoft implementations are consistent or not.
I tested the most popular implementations with matching support I found. Here are the results:
Name | Match | Code | Note | |
---|---|---|---|---|
Addressable (Ruby) | no | code | The most popular implementation across all languages | |
Rize (PHP) | yes | code | Even has dedicated tests for that | |
geraintluff/uri-templates (JS) | yes | code | ||
Tavis.UriTemplates (DotNet) | yes | code | ||
Doesn't follow strictly the RFC, so may be not relevant |
RFC 3986 specifies that :
and @
are not reserved in URL paths and should not be encoded. So it's not clear if this should or shouldn't match to me.
RFC 3986 specifies that
:
and@
are not reserved in URL paths and should not be encoded.
No, it's the opposite, they are reserved:
2.2. Reserved Characters
gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@"
But it does not automatically imply that they should be pct-encoded, the rules are complex, and RFC 3986 and 6570 do not seem fully in agreement.
@bortzmeyer according to section 3.3, :
and @
are allowed in (most parts of) paths:
segment = *pchar
pchar = unreserved / pct-encoded / sub-delims / ":" / "@"
https://tools.ietf.org/html/rfc3986#section-3.3
Actually, RFC 3986 implies that colons in paths shouldn't be percent-encoded:
URI producing applications should percent-encode data octets that
correspond to characters in the reserved set unless these characters
are specifically allowed by the URI scheme to represent data in that
component.
On the other hand, as pointed by @yosida95, RFC 6570 explicitly asks to encode colons when expanding a variable:
reserved = gen-delims / sub-delims
gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@"
snip
The allowed set for a given expansion depends on the expression type:
reserved ("+") and fragment ("#") expansions allow the set of
characters in the union of ( unreserved / reserved / pct-encoded ) to
be passed through without pct-encoding, whereas all other expression
types allow only unreserved characters to be passed through without
pct-encoding. Note that the percent character ("%") is only allowed
as part of a pct-encoded triplet and only for reserved/fragment
expansion: in all other cases, a value character of "%" MUST be pct-
encoded as "%25" by variable expansion.
https://tools.ietf.org/html/rfc6570#section-3.2.1
(There is no reference to :
and @
as in RFC 3986).
For this reason, in the Mercure spec I plan to force applications to percent-encode :
characters: https://github.com/dunglas/mercure/pull/298/files#diff-cf39eeec4efb7fde29e3720f47313ce3R471-R473
I'm not sure if it's the best theoretical solution, but it has the benefit of being compatible with all Template libraries I tested (they all encode colons), while still being easy to do implement using the standard library of virtually all languages (encodeURIComponent
and the like).
Match URI Template https://lg.op.example/.well-known/looking-glass/v1/ping/{host} A URI template.
URI https://lg.op.example/.well-known/looking-glass/v1/ping/2001:db8::35 Test URI to test against the template.
Match?
The answer is "Doesn't match 😐"
(Example taken from RFC 8522)