dunglas / uri-template-tester

Test if a URI matches a given URI template (RFC6570)
https://uri-template-tester.mercure.rocks
GNU Affero General Public License v3.0
7 stars 1 forks source link

The online tester doesn't work for me #3

Open bortzmeyer opened 5 years ago

bortzmeyer commented 5 years ago

Match URI Template https://lg.op.example/.well-known/looking-glass/v1/ping/{host} A URI template.

URI https://lg.op.example/.well-known/looking-glass/v1/ping/2001:db8::35 Test URI to test against the template.

Match?

The answer is "Doesn't match 😐"

(Example taken from RFC 8522)

dunglas commented 5 years ago

Related Twitter thread https://twitter.com/bortzmeyer/status/1093062473475862528

This issue is related to the colon in the URL. When encoded (https://lg.op.example/.well-known/looking-glass/v1/ping/2001%3Adb8%3A%3A35), it works. It probably needs to be patched (by introducing a higher level function) in https://github.com/yosida95/uritemplate (ping @yosida95).

yosida95 commented 5 years ago

Hi, thanks for notifying me.

Hmm... I (probably) understand your needs and it seems patching https://github.com/yosida95/uritemplate would be most efficient way to fulfill it.

But I think /v1/ping/2001:db8::35 did not actually match against /v1/ping/{host}. RFC 6570 says characters not in unreserved in variables with normal operator should be percent encoded. So correct expansion of the template should be /v1/ping/2001%3Adb8%3A%3A35 and it match against the template as expected.

If we will implement that behavior I think patching .Match() and the matcher VM behind it is not the way. I think we can agree on this point, as @dunglas mentioned "by introducing a higher level function" above. If doing so, we will be no longer able to differentiate between, for example, {var} and {+var}, we must treat non-Latin characters (because a variable takes values consist of any valid UTF-8 characters) properly, and the matcher VM's stack will grow a lot.

Therefore we should encode input ("raw") URL to percent encoded with some exceptions (e.g. slashes, colons in scheme://host:port, crosshatches, etc...) first, then construct *Template from the encoded URL and call .Match() on it. But characters to be encoded is not deterministic. Let's think we have a template /~user/{path} and {path} have hello/world. The URL expanded form the template and what .Match() accepts is /~user/hello%2Fworld, So we must have a function only URL encodes only the third slash.

dunglas commented 4 years ago

For the record, URI.js also behaves as described by @yosida95 https://codepen.io/dunglas/pen/OJyGmKj. It would be nice to test if Spring and Microsoft implementations are consistent or not.

dunglas commented 4 years ago

I tested the most popular implementations with matching support I found. Here are the results:

Name Match Code Note
Addressable (Ruby) no code The most popular implementation across all languages
Rize (PHP) yes code Even has dedicated tests for that
geraintluff/uri-templates (JS) yes code
Tavis.UriTemplates (DotNet) yes code
Spring (Java) yes code Doesn't follow strictly the RFC, so may be not relevant

RFC 3986 specifies that : and @ are not reserved in URL paths and should not be encoded. So it's not clear if this should or shouldn't match to me.

bortzmeyer commented 4 years ago

RFC 3986 specifies that : and @ are not reserved in URL paths and should not be encoded.

No, it's the opposite, they are reserved:

2.2. Reserved Characters

  gen-delims  = ":" / "/" / "?" / "#" / "[" / "]" / "@"

But it does not automatically imply that they should be pct-encoded, the rules are complex, and RFC 3986 and 6570 do not seem fully in agreement.

dunglas commented 4 years ago

@bortzmeyer according to section 3.3, : and @ are allowed in (most parts of) paths:

segment       = *pchar
pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"

https://tools.ietf.org/html/rfc3986#section-3.3

Actually, RFC 3986 implies that colons in paths shouldn't be percent-encoded:

   URI producing applications should percent-encode data octets that
   correspond to characters in the reserved set unless these characters
   are specifically allowed by the URI scheme to represent data in that
   component.

On the other hand, as pointed by @yosida95, RFC 6570 explicitly asks to encode colons when expanding a variable:

     reserved       =  gen-delims / sub-delims
     gen-delims     =  ":" / "/" / "?" / "#" / "[" / "]" / "@"

snip

   The allowed set for a given expansion depends on the expression type:
   reserved ("+") and fragment ("#") expansions allow the set of
   characters in the union of ( unreserved / reserved / pct-encoded ) to
   be passed through without pct-encoding, whereas all other expression
   types allow only unreserved characters to be passed through without
   pct-encoding.  Note that the percent character ("%") is only allowed
   as part of a pct-encoded triplet and only for reserved/fragment
   expansion: in all other cases, a value character of "%" MUST be pct-
   encoded as "%25" by variable expansion.

https://tools.ietf.org/html/rfc6570#section-3.2.1

(There is no reference to : and @ as in RFC 3986).

For this reason, in the Mercure spec I plan to force applications to percent-encode : characters: https://github.com/dunglas/mercure/pull/298/files#diff-cf39eeec4efb7fde29e3720f47313ce3R471-R473

I'm not sure if it's the best theoretical solution, but it has the benefit of being compatible with all Template libraries I tested (they all encode colons), while still being easy to do implement using the standard library of virtually all languages (encodeURIComponent and the like).