Existing pct-encoding in values given to {+} or {#} templates should NOT be re-encoded

kmashint commented 8 years ago

First, thank you for the URI Template library for Java!

In using it at Adobe (my other email is kmashint@adobe.com) we found an issue with URI {+} templates and values that already had pct-encoded values (e.g. %20 in the query string) where according to RFC-6570 (we asked Roy Fielding who works at Adobe) the values with pct-encoded triplets given to {+} or {#} templates should NOT be re-encoded. Only % characters that are not part of a pct-encoded triplets should be encoded.

More details are below, and I'll try to dig into the Java source to see where to adjust.

Question to RFC-6570 author: Should applying a URI template such as {+uri} to uri="http://foo/bar?message=hello%20world" result in "http://foo/bar?message=hello%2520world"?

Answer from Roy Fielding at Adobe, an author of RFC-6570 URI Templates: """ No, it shouldn't. The + operator means the URI is already encoded, so it should be

http://foo/bar?message=hello%20world/status

assuming the implementation isn't buggy. See the definition in

https://tools.ietf.org/html/rfc6570#section-3.2.3

The %20 is not re-encoded because it is already a pct-encoded triplet. """

From http://tools.ietf.org/html/rfc6570#section-3.2.1 :

The allowed set for a given expansion depends on the expression type: reserved ("+") and fragment ("#") expansions allow the set of characters in the union of ( unreserved / reserved / pct-encoded ) to be passed through without pct-encoding, whereas all other expression types allow only unreserved characters to be passed through without pct-encoding. Note that the percent character ("%") is only allowed as part of a pct-encoded triplet and only for reserved/fragment expansion: in all other cases, a value character of "%" MUST be pct- encoded as "%25" by variable expansion.

(meaning leave it pct-encoded if it matches that %[0-9A-Fa-f]{2} pattern, otherwise pct-encode it)

From http://tools.ietf.org/html/rfc6570#section-3.2.3 Reserved Expansion: {+var}:

For each defined variable in the variable-list, perform variable expansion, as defined in Section 3.2.1, with the allowed characters being those in the set (unreserved / reserved / pct-encoded).

(gives example for half="50%", {+half} gives "50%25", but that % is encoded since it is at the end of a string and not already pct-encoded)

From http://tools.ietf.org/html/rfc6570#appendix-A Implementation Notes:

... (U+R) means any character not in the union of (unreserved / reserved / pct- encoding) will be encoded.

   .------------------------------------------------------------------.
   |          NUL     +      .       /       ;      ?      &      #   |
   |------------------------------------------------------------------|
   | first |  ""     ""     "."     "/"     ";"    "?"    "&"    "#"  |
   | sep   |  ","    ","    "."     "/"     ";"    "&"    "&"    ","  |
   | named | false  false  false   false   true   true   true   false |
   | ifemp |  ""     ""     ""      ""      ""     "="    "="    ""   |
   | allow |   U     U+R     U       U       U      U      U     U+R  |
   `------------------------------------------------------------------'

damnhandy commented 8 years ago

Thanks @kmashint ! I'll have a look at this over the weekend. I think this also fell through the cracks with the uritemplate-test suite as well. That should have catch issues like this.

damnhandy commented 8 years ago

@kmashint do you a more specific set of test cases for this issue? What I'm looking for is specific templates and variables beyond just something like {+uri}. I've been mulling over the uritemplate-test and nothing seems to cover this particular issue, but I think this something that is lacking in the test suite. Some real-world test cases would be helpful.

damnhandy commented 8 years ago

The problem likely lies here in the pUriUtil class. That is called from the expandStringValue method in the UriTemplate class. What's supposed to happen is that when you git the {+} or {#} operator, it invokes different encoding rules, but clearly it's not doing everything correctly. This class was submitted a contributor to fix other issues with URI encoding, so i have a spend sometime looking at how to refactor it to look for already encoded values.

kmashint commented 8 years ago

The standard gives the example: if half="50%", then "{+half}" => "50%25" since it's not already encoded

A counter example would be: if halfEnc="50%25", then "{+halfEnc}" => "50%25", no change since it's already encoded

Another unchanging example since it's already encoded: if helloEnc="http://localhost/hello%20world", then "{+hellEnc}/goodbye" => "http://localhost/hello%20world/goodbye"

In other implementations I've seen a RegExp check that "if it looks like pct-encoding, leave it as such", i.e. searching for "%.." but only pct-encoding if not already matching "%[0-9A-Fa-f]{2}". Or some kind of 2-char lookahead if you're processing char-by-char.

damnhandy commented 8 years ago

This is fixed in 2.1.6

kmashint commented 8 years ago

Confirmed with a {+} template in our testing, thank you!

damnhandy / Handy-URI-Templates

Existing pct-encoding in values given to {+} or {#} templates should NOT be re-encoded #51