Kong / unirest-java

Unirest in Java: Simplified, lightweight HTTP client library.
http://kong.github.io/unirest-java/
MIT License
2.6k stars 594 forks source link

Content-Encoding Header omitted #295

Closed maxemann96 closed 4 years ago

maxemann96 commented 5 years ago

The header is omitted here (returns empty list) Unirest.get("https://google.de/").header("Accept-Encoding", "gzip").asString().headers.get("Content-Encoding")

The following prints nothing (header also omitted)

Unirest.get("https://google.de/").header("Accept-Encoding", "gzip").thenConsume {
    println(it.encoding)
}
ryber commented 5 years ago

Hi @maxemann96, in both cases you are looking at the response headers. If the server does not return that header, then it would not be present. Just because you added the header to the request does not require that the server also have the header present.

see this test: https://github.com/Kong/unirest-java/blob/master/unirest/src/test/java/BehaviorTests/ResponseHeaderTest.java#L38-L58

maxemann96 commented 5 years ago

@ryber Thx for your fast answer. The google answer contains the Content-Encoding header:

curl --header "Accept-Encoding: gzip" -I https://www.google.de/
HTTP/2 200 
date: Sun, 15 Sep 2019 18:13:37 GMT
expires: -1
cache-control: private, max-age=0
content-type: text/html; charset=ISO-8859-1
p3p: CP="This is not a P3P policy! See g.co/p3phelp for more info."
content-encoding: gzip
server: gws
x-xss-protection: 0
x-frame-options: SAMEORIGIN
set-cookie: 1P_JAR=2019-09-15-18; expires=Tue, 15-Oct-2019 18:13:37 GMT; path=/; domain=.google.de; SameSite=none
set-cookie: NID=188=Evn78VrHCbYg9WHERmpXNH5CkXmDSZzqoDqJZNkxRJgY2CbmLuc36Ul-tgsFOBi1TCFH0dKeLpe28BN3_CjSVJRGFEd8MfgtqyYmJLuwHvTcdPYzThblytOIzhSKd7tt4b5Oq138ScTBsDKPG4T96hYmPa4YmVXe5YbaIV2qfLA; expires=Mon, 16-Mar-2020 18:13:37 GMT; path=/; domain=.google.de; HttpOnly
alt-svc: quic=":443"; ma=2592000; v="46,43,39"
ryber commented 5 years ago

the header appears to be getting removed by Apache Http Client which is the engine that Unirest uses. My theory is that it does this because the client auto deflates gzip encoded responses, so it removes to so that consumers are not confused and believe that the body is still encoded.

I'm not sure there is much I can do about this short of documentation

ryber commented 4 years ago

I was able to confirm that this is expected behavior. Apache Http Client removes the content-encoding because it deflates the entity so it is no longer gziped. There are several tests to this effect in the Apache project.

This is not the case with the async client where apache doesn't automatically decompress the body. In this case Unirest does the decompression but it doesn't remove the header. I'm going to switch this issue to track removing the encoding in that case to match the behavior of synchronous client.

ryber commented 4 years ago

switched and opened a new issue https://github.com/Kong/unirest-java/issues/299