AsyncHttpClient / async-http-client

Asynchronous Http and WebSocket Client library for Java
Other
6.29k stars 1.59k forks source link

Decoding of response header "location" is incorrect #1750

Open bblack opened 3 years ago

bblack commented 3 years ago

Unfortunately I'm using this library from JRuby, so either that layer will turn out to be responsible for my issue, or it will add a bit of difficulty in tracking things down.

Problem: I'm making a request. The response is 301 with a location header that contains non-latin characters (specifically, Thai). Something (either AHC, or some lib it uses, or something in the JRuby-java translation) is returning the header as an improperly decoded string.

If on the RequestBuilder instance I've called setFollowRedirect(false), I can see the improperly-decoded Location header:

java_import 'org.asynchttpclient.AsyncCompletionHandler'
java_import 'org.asynchttpclient.DefaultAsyncHttpClient'
java_import 'org.asynchttpclient.DefaultAsyncHttpClientConfig'
java_import 'org.asynchttpclient.DefaultAsyncHttpClientConfig$Builder'

class Handler < AsyncCompletionHandler
  def initialize(options={})
    super()

    @on_throwable = options[:on_throwable]
    @on_headers_received = options[:on_headers_received]
    @on_completed = options[:on_completed]
  end

  def onThrowable(exception)
    @on_throwable.call(exception)
  end

  def onHeadersReceived(headers)
    @on_headers_received.call(headers)
  end

  def onCompleted(netty_response)
    @on_completed.call(netty_response)
  end
end

method = 'GET'
url = 'https://www.facebook.com/100005359001746'
request_builder = RequestBuilder.new(method).
  setUrl(url).
  setFollowRedirect(false)
request = request_builder.build
handler = Handler.new(
  {
    on_throwable: -> (e) { puts "throwable: #{e}" },
    on_headers_received: -> (h) {
      puts "headers_received: #{h}"
      location = h.get('Location')
      puts "Location: #{location} (encoding: #{location.encoding})"
    },
    on_completed: -> (res) { puts "completed: #{res.inspect}" }
  }
)
builder = DefaultAsyncHttpClientConfig::Builder.new()
client = DefaultAsyncHttpClient.new(builder.build())

client.executeRequest(request, handler)

which outputs:

headers_received: DefaultHttpHeaders[Location: https://www.facebook.com/people/प्रवेश-कुमार-सनातनी/100005359001746, Strict-Transport-Security: max-age=15552000; preload, Content-Type: text/html; charset="utf-8", X-FB-Debug: o8uQCGiu/Fq/ixVADK5b7YmL4laDuTKy4hT8CkC5SEc0UE5/shmYrFIsoySCMppLvJfrEtOPbjpDTOziteOyJw==, Date: Tue, 01 Dec 2020 17:15:08 GMT, Alt-Svc: h3-29=":443"; ma=3600,h3-27=":443"; ma=3600, Connection: keep-alive, Content-Length: 0]
Location: https://www.facebook.com/people/प्रवेश-कुमार-सनातनी/100005359001746 (encoding: UTF-8)
completed: #<Java::OrgAsynchttpclientNetty::NettyResponse:0x10cbb284>

If instead, I've called the RequestBuilder's setFollowRedirect(true), I get onThrowable called with a MaxRedirectException. The reason is that in the case of this URL, when the request for the poorly-encoded URL is received by the server, it responds with a 301 redirect to the properly-encoded URL, whereupon the same misinterpetation of the encoding is made, and an infinite loop of "request wrong URL; get redirected to right URL; misinterpret as wrong URL" occurs.

I can use curl to inspect the behavior of the server in fulfilling this request, to see that the redirect is taken successfully there:

$ curl -I -L --include "https://www.facebook.com/100016673803484"
HTTP/2 301
location: https://www.facebook.com/people/अभिषेक-सिंह/100016673803484
strict-transport-security: max-age=15552000; preload
content-type: text/html; charset="utf-8"
x-fb-debug: tcsOzDNjy2Y/SoHPvIsDfI1eEumptXWlOXlFHbxspemPIl+0B7YS7hzHg02uq+/+fFuVCNpIX6t/QGOmUprDxw==
content-length: 0
date: Tue, 01 Dec 2020 17:04:52 GMT
alt-svc: h3-29=":443"; ma=3600,h3-27=":443"; ma=3600

HTTP/2 200

And likewise in Chrome.

I'm open to the possibility that the server at www.facebook.com isn't respecting some restriction on charsets allowed for the Location header (I'm not familiar with those RFCs), in which case:

TomGranot commented 3 years ago

@bblack Interestingly enough, I've just worked on a different problem regarding the setFollowRedirect(true) here, in that case regarding the method used for downstream requests. Not strictly relevant here, though, it appears.

What I think from a (very) initial observation is that there are actually quite a few different failure points (even more than the ones you mentioned) that we should isolate here. From what I see, the first step should be to identify the full flow from a request made in JRuby all the way down to how Netty interprets thins.

Will investigate and get back to you with some better answers. I think I'll try and replicate the behaviour locally, but since I've never used JRuby (or Ruby, for that matter), I'd appreciate a bit of direction in getting set up. A bit of info about the exact setup you're running on (Java version, OS, containerised?, etc...) will be highly appreciated too.

BTW - please excuse any delay in getting back to you on this - long work week + I'm a new maintainer of this library, which has its own bottlenecks:) Please ping if I forget to respond!

TomGranot commented 3 years ago

@bblack - I played around quite a bit with JRuby & Maven. It might be just me, but I can't seem to get the dependencies of the library on the same classpath in order for JRuby to load them.

Well, to be more precise, I can get the AHC library itself under the same classpath, by compiling and then dropping the .class files next to the ruby script before running jruby.

But anything more complicated than that requires working with Maven & JRuby (for example, when you're downloading from Maven Central), which I haven't yet been able to get quite right.

If you can share how your setup looks like that would shave off quite a bit of config time so I can take a look at the underlying problem.

slandelle commented 3 years ago

Here's what happens https://github.com/netty/netty/issues/10831 I'm quite pessimistic that such crap (yes, it's crap, even if it's FaceBook) will be supported in Netty anytime soon.

TomGranot commented 3 years ago

@slandelle Thanks! I assume this, unfortunately, means that AHC's at fault here (well, transitively so) @bblack.

slandelle commented 3 years ago

The real fault is that browser vendors decided to silently violate the spec instead of pushing the change into a new version. Google is at the origin of HTTP/2 and HTTP/3 and still they have pushed such change into the new versions of the spec.

bblack commented 3 years ago

I suspected that might be the case. I can provide details of my setup, but it would be a bunch of extra layers (e.g. jruby itself, jbundle which is a java dependency manager for ruby which honestly I have problems with anyway). It sounds like you could more directly reproduce this by an analogous piece of Java code.

Does AHC (or netty, I guess?) offer an interface for providing my own definition of how to decode response headers? That would at least let me decode such nonconforming headers in a way similar to common browsers.