akka / akka-http

The Streaming-first HTTP server/module of Akka
https://doc.akka.io/libraries/akka-http/current/
Other
1.34k stars 594 forks source link

Problem with query parameter separator #2854

Open rtfpessoa opened 4 years ago

rtfpessoa commented 4 years ago

After moving Play Framework from version 2.4 to 2.7 and starting to use akka-http as the server we encountered one problem with some weird URLs that we were mistakenly generating.

This url http://localhost:9000/issue/1300068591272/_lines?projectId=22877&commitId=15565520 reached the server with query parameter called projectId and commitId while this url http://localhost:9000/issue/1300068591272/_lines?projectId=22877&commitId=15565520 does not get the second.

Seems like this might be related to the way akka-http parses URLs. Apparently the second URL is also valid because at least ; is a supported separator.

I would gladly fix this if you point me in the right direction.

References:

raboof commented 4 years ago

Interesting, I wasn't aware of that recommendation but indeed it seems somewhat widespread: from your message I understand Netty does this, it is mentioned on https://en.wikipedia.org/wiki/Query_string#Web_forms and for example the Apache http client code seems to do it as well:

@ import $ivy.`org.apache.httpcomponents:httpclient:4.5.9`
@ import java.net.URI
@ import java.nio.charset.Charset
@ val params = URLEncodedUtils.parse(new URI("https://akka.io?x=1&y=2"), Charset.forName("UTF-8"))
params: java.util.List[org.apache.http.NameValuePair] = [x=1, amp, y=2]

I think it might be reasonable to support this in Akka HTTP as well. There are 2 possible consequences:

I think the first consequence might be OK: they should have been encoding those as %3B anyway, so this behavior change might be acceptable for the upcoming 10.2.0 release.

The second consequence is trickier: it might be good to make sure to benchmark the parsing performance before and after the change. You can have a look at the akka-http-bench-jmh subproject.

For the actual code change akka.http.impl.model.parser.UrlParser is probably where you should look.

Thanks for bringing this up, I'm looking forward to seeing what you come up with!

rtfpessoa commented 4 years ago

Just as a note, the class seems to be akka.http.impl.model.parser.UriParser

rtfpessoa commented 4 years ago

@raboof, imagine that I want to run the benchmarks to check the code, how would be the command? I tried sbt "akka-http-bench-jmh/runMain akka.BenchRunner quick" I could not find docs about it, maybe I am looking in the wrong place.

Also, how would you suggest for me to compare both implementations? Should I leave the old version somehow, or should I compare against the previous baseline?

raboof commented 4 years ago

imagine that I want to run the benchmarks to check the code, how would be the command?

There are some short notes in akka-http-bench-jmh/README.md

Also, how would you suggest for me to compare both implementations? Should I leave the old version somehow, or should I compare against the previous baseline?

Perhaps you could split this in 2 commits: 1 commit adding the benchmark (so we can run it against the old implementation) and 1 commit changing the implementation.

jrudolph commented 4 years ago

html401 is superseded by now. Are there more recent docs which would still recommend supporting ;?

I had a quick look in https://url.spec.whatwg.org/#concept-url-parser and https://html.spec.whatwg.org/multipage/urls-and-fetching.html#resolving-urls but didn't find anything.

nkmittal commented 4 years ago

Is this available in Play 2.6 ? I recently migrated to 2.6 from 2.5. For legacy reasons I need ';' separator support.

raboof commented 4 years ago

Is this available in Play 2.6?

This issue is still open, so AFAICS the behavior has not yet changed.

jrudolph commented 4 years ago

This is the latest version of the spec:

https://url.spec.whatwg.org/#application/x-www-form-urlencoded

It doesn't support this syntax any more. Browsers don't generate this syntax any more. It took almost a decade until someone even noticed. So, I'd say, we shouldn't do anything here. The workaround is to use the Uri.rawQueryString to do your own parsing if you need to support a niche and/or archaic query syntax.

I guess you could try to convince us otherwise by providing a PR that proves the low impact of the change (or bring more evidence that the legacy syntax is more relevant than I think it to be).

I hope that's reasonable.