eclipse-ee4j / jersey

Eclipse Jersey Project - Read our Wiki:
https://github.com/eclipse-ee4j/jersey/wiki
Other
690 stars 351 forks source link

JerseyUriBuilder incorrectly encodes fragment using application/x-www-form-urlencoded rules #4018

Open glerup opened 5 years ago

glerup commented 5 years ago

The Javadoc for javax.ws.rx.core.UriBuilder:

Builder methods perform contextual encoding of characters not permitted in the corresponding URI component following the rules of the application/x-www-form-urlencoded media type for query parameters and RFC 3986 for all other components.

https://docs.oracle.com/javaee/7/api/javax/ws/rs/core/UriBuilder.html

But org.glassfish.jersey.uri.internal.JerseyUriBuilder applies the rules for application/x-www-form-urlencoded for the fragment component rather than RFC 3986.

RFC 3986 defines the following syntax:

fragment      = *( pchar / "/" / "?" )
pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"
unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~"
pct-encoded   = "%" HEXDIG HEXDIG
sub-delims    = "!" / "$" / "&" / "'" / "(" / ")"
                 / "*" / "+" / "," / ";" / "="

https://tools.ietf.org/html/rfc3986#section-3.5

The RFC 3986 syntax permits most of the delimiters that are percent encoded by the application/x-www-form-urlencoded rules.

Example:

System.out.println( UriBuilder.fromPath("http://test.com").fragment("/?:@valid42-._~%20!$&'()*+,;=").build() );

Expected:

http://test.com#/?:@valid42-._~%20!$&'()*+,;=

Actual:

http://test.com#%2F%3F%3A%40valid42-._~%20%21%24&%27%28%29%2A+%2C%3B=
jansupol commented 5 years ago

By RFC 3986 the query starts with first "?" character. Your fragment contains the question mark, so the UriBuilder applies the rules for application/x-www-form-urlencoded for your fragment correctly, I assume?

glerup commented 5 years ago

The fragment component is NOT part of the query component. From RFC 3986 Section 3.4:

The query component is indicated by the first question mark ("?") character and terminated by a number sign ("#") character or by the end of the URI.

The issue is that the application/x-www-form-urlencoded rules are applied both for the query component AND the fragment component. The JAX-RS interface spec requires that only the query component must be encoded as per application/x-www-form-urlencoded while the fragment component must be encoded as per RFC 3986.

glerup commented 5 years ago

Or do you mean that because my example of a URI with a fragment component contains a question mark, that question mark indicates the start of a query component?

That is not the case.

From RFC Section 3.5:

A fragment identifier component is indicated by the presence of a number sign ("#") character and terminated by the end of the URI.

And:

The characters slash ("/") and question mark ("?") are allowed to represent data within the fragment identifier.

The fragment component is only terminated by the end of the URI. So even if a question mark is present within the fragment component it does not indicate the start of the query component.

This is also correctly reflected by URI.create().

glerup commented 5 years ago

The core issue here is that Jersey's implementation of UriBuilder is doing a lot of unnecessary encoding of characters in the fragment component in violation of the specification in the JAX-RS Javadoc. By mistake the implementors have applied the wrong set of reserved characters.

This leads to correct, but ugly URLs. The main impact is that the URLs can become unsuitable to present in output, e.g. to an end-user.

rmcdouga commented 4 years ago

This issue is having a larger impact on our project. We're using UriBuilder to construct an URL that appears in an HTML link. The URL has a query parameter that may have reserved characters in it (for example https://example.com/foo?mail=foo@example.com). There's a Spring app at the other end of the link with a WebSecurityConfigurerAdapterthat doesn't like the escaped reserved characters. It works fine when the characters are not escaped.

While I agree that the WebSecurityConfigurerAdapteris misbehaving, It is being triggered by the unnecessary over-encoding of characters in the URIBuilder.