eclipse-ee4j / jersey

Eclipse Jersey Project - Read our Wiki:
https://github.com/eclipse-ee4j/jersey/wiki
Other
691 stars 352 forks source link

UriBuilder.fragment always encodes the complete fragment, but must only encode parameter values #5269

Closed mkarg closed 1 year ago

mkarg commented 1 year ago

According to Wikipedia, the following is a valid URI: http://www.example.org/foo.xml#xpointer(//Rube)

(I understand that Wikipedia is non-normative, but I assume we agree that it is correct in this particular case.)

In particular this example proofs that the fragment my contain unencoded slashes, just as the path.

When trying to produce this exact URI using UriBuilder, then the result depends of the way how the URI builder is used:

    final URI uri = new URI("http://www.example.org/foo.xml#xpointer(//Rube)").normalize();
    System.out.println(uri); // prints "http://www.example.org/foo.xml#xpointer(//Rube)"
    System.out.println(UriBuilder.fromUri(uri).build()); // prints "http://www.example.org/foo.xml#xpointer(//Rube)"
    System.out.println(UriBuilder.fromUri(uri).fragment("xpointer(//{type})").build("Rube")); // prints "http://www.example.org/foo.xml#xpointer%28%2F%2FRube%29"

If the UriBuilder is initialized from a URI, then it keeps the correct syntax.

If the UriBuilder is overwriting the fragment part, then it not only encodes the parameter values (which is correct), but in fact also encodes the static parts of the contains (which is incorrect).

Apparently there is no way to create the correct syntax (forward slash instead of %2F) using the .fragment() method!

jansupol commented 1 year ago

Trying to decode http://www.example.org/foo.xml#xpointer%28%2F%2FRube%29, I get http://www.example.org/foo.xml#xpointer(//Rube). Both URIs are equivalent, one percent-encoded and one not.

Why do you think the behaviour is incorrect?

mkarg commented 1 year ago

Because not everything right of the hash character is data. According to the URI spec, the same syntax rules do apply on the right just as on the left. It is incorrect to "arbitrarily" encode URIs. This is used by several specifications and applications. For example: Gitpod. If my JAX-RS server wants to return a URI pointing to my-URI within GitPod, then GitPod expects to find a URI as the fragment part: https://gitpod.io#my-uri. It is a difference if my-uri contains a slash in the syntax (not encoded), or a slash in data (encoded). A browser will not automatically assume that it must decode my-uri before it can call it, and always decoding fragments would be just wrong (in the Gitpod example, unencoding would produce invalid my-uri, as data MUST stay encoded). Neither will an application know. It is essential for the browser / application to know, if it must decode the URI. Due to that, no decoding happens. Due to that, it is essential to have control over encoding, i. e. that UriBuilder only encodes data, not templates.

So the correct implementation is that only data is encoded, but not the template.

jansupol commented 1 year ago

The HTML 5 spec says:

  1. Let fragment be document's URL's fragment.

....

  1. Let fragmentBytes be the result of percent-decoding fragment.

So fragment should be percent-encoded.

This seems to conflict with RFC7303, which refers to XPointerFramework which does not percent-encode for certain media types (XMLs).

If I understand that well, the UriBuilder would need to know what media type the encoding is for, or a boolean argument to encode, or not to encode.

mkarg commented 1 year ago

Yes and no. What I take from this conflicting definitions is: It is up to the application's current use case, whether the fragment is to be encoded or not. JAX-RS runtime should need to know the difference between HTML and XPointer use case, but should be neutral (not support "both" cases, but actually not supporting "any" particular case).

As a result, I repeat by claim: The application must have sole control if or if not to percent-encode the fragment.

This is easily to achive if you follow my original proposal at the top of this issue: Always encode the parameters of .build(), but never encode the parameter of .fragment(). As a result, HTML5 and XPointer is working by full discretion of the calling application:

jansupol commented 1 year ago

This looks hacky. Why fragment should not encode and build should and not otherwise? Why for XML the user cannot use fragment({e}).build("someFancyMethod()")?

The RFC 6570 seems more of a solution here; while fragment("{e}").build("Hello World!") encodes, fragment("{+e}").build("Hello World!") would not.

mkarg commented 1 year ago

JAX-RS 3.1's JavaDocs of URIBuilder (which are normative) unambiguously mandates the following:

Builder methods perform contextual encoding of characters not permitted in the corresponding URI component following ... RFC 3986 .... Note that only characters not permitted in a particular component are subject to encoding.

RFC 3986 explicitly allows slashes to be part of the fragment component.

As a consequence, this proofs that Jersey has a bug in the following case, as Jersey does encode characters permitted in fragments according to RFC 3986, but MUST NOT encode them: System.out.println(UriBuilder.fromUri(uri).fragment("xpointer(//{type})").build("Rube")); // prints "http://www.example.org/foo.xml#xpointer%28%2F%2FRube%29"

Regarding "hacky": The proposal was an attempt to let users of JAX-RS 3.1 unambiguously decide which is to be encoded and which is not, without the need to change the JAX-RS specification. It might not be perfect, but it is much less "hacky" than the corrently inconsequent outcome of Jersey, which sometimes does encode and sometimes does not:

    final URI uri = new URI("http://www.example.org/foo.xml#xpointer(//Rube)").normalize();
    System.out.println(uri); // prints "http://www.example.org/foo.xml#xpointer(//Rube)"
    System.out.println(UriBuilder.fromUri(uri).build()); // prints "http://www.example.org/foo.xml#xpointer(//Rube)"
    System.out.println(UriBuilder.fromUri(uri).fragment("xpointer(//{type})").build("Rube")); // prints "http://www.example.org/foo.xml#xpointer%28%2F%2FRube%29"

So I would propose to separate this issue into two issues:

jansupol commented 1 year ago

RFC 3986:

From this, I read that a slash is a delimiter and should be percent-encoded.


The support for RFC 6570 does not seem to bring any backward incompatibility (unlike stopping the encoding for fragment) and can be introduced in any version of Jersey as a new feature.

mkarg commented 1 year ago

IMHO you incorrectly assume that. Section 3.5 tells the explicitly allowed characters for fragment, which says that it allows any number of pchars, slashes and questionmarks. That is done to override the limitations given by 1.2.3.

mkarg commented 1 year ago

I have proposed support for RFC 6570 for Jakarta REST 4.0. Thank you for this great idea! :-)

Nevertheless, RFC 6570 explicitly supports my claim that you misunderstood section 3.5 of RFC 3986 (hence that Jersey does it wrong currently), as it cleary proofs in section 3.2.4 "Fragment Expansion: {#var}" that in slashes in fragments MUST NOT get encoded:

{#path:6}/here     #/foo/b/here
jansupol commented 1 year ago

Yes, we would need to revisit UriBuilder.