Open syeopite opened 3 years ago
I reported a similar issue to @straight-shoota yesterday where HTTP:Client response's body is binary. I have yet to determine whether it's gibberish or if it didn't deflate the gzip encoding. It's very intermittent, and happens for small data (1kb) for a specific HTTP domain (api.airtable.com). Other domains used are fine.
The issue here is that you write the content to the HTTP response before setting the Content-Encoding
header. That means the content is sent without that header and the client has no idea that it is supposed to decompress it. You can test that with any HTTP client.
It works with smaller content sizes because the data is buffered.
You have to move the configuration lines setting content encoding and content type above the IO copy.
@ysbaddaden This appears to be unrelated to your problem.
This is not a bug, but I believe we can improve the developer experience by trying to raise an error on incorrect usage.
Response#content_type=
could easily raise an error if headers have already been written (we already have check_headers
for that).
This is not easy to implement with setting arbitrary headers (such as in response.headers["content-encoding"] = "gzip"
) because the mutation happens in the Headers
object which has no reference to the request and thus can't know whether it still accepts headers.
Maybe we could consider adding set_header
and get_header
methods to Response
which could check for headers written like content_type
.
The issue here is that you write the content to the HTTP response before setting the Content-Encoding header. That means the content is sent without that header and the client has no idea that it is supposed to decompress it. You can test that with any HTTP client.
Oh thanks! But, the server code above was actually an attempt to replicate the issue I was getting locally, and it looks like I messed up on that. Oops 😅. From your explanation, the actual problem I'm getting might be more related to what @ysbaddaden is experiencing.
But the gist of it is the same; HTTP::Client
doesn't seem to decompress correctly. Here's a example source code that relies on the website I was querying:
require "http"
require "json"
require "uri"
client = HTTP::Client.new(URI.parse("https://www.youtube.com"))
client.compress = true
post_data = {"context" => {"client" => {"hl" => "en", "gl" => "US", "clientName" => "WEB",
"clientVersion" => "2.20210721.00.00", "clientScreen" => "WATCH_FULL_SCREEN"}},
"continuation" => "4qmFsgJIEhhVQ1h1cVNCbEhBRTZYdy15ZUpBMFR1bncaLEVnWjJhV1JsYjNNd0FqZ0JZQUZxQUxnQkFDQUE2Z01JUTJkU1JGRlZSVGs9",
}.to_json
yt_headers = HTTP::Headers{
"Content-Type" => "application/json",
"Accept-Encoding" => "gzip",
}
results = client.post("/youtubei/v1/browse?key=AIzaSyAO_FJ2SlqU8Q4STEHLGCilw_Y9_11qcW8", body: post_data,
headers: yt_headers)
# Should be string but instead it's some random binary data
results.body
This returns gibberish binary data, but replicating the same query on curl does result in the correct (and decompressed) JSON response.
You're actively opting out of the automatic decompression by setting the Accept-Encoding
header.
This is documented here: https://crystal-lang.org/api/1.1.1/HTTP/Client.html#compression
You don't need to assign client.compress = true
because that's the default. But you must not set a custom Accept-Encoding
header or the client won't automatically decompress.
You're actively opting out of the automatic decompression by setting the
Accept-Encoding
header. This is documented here: https://crystal-lang.org/api/1.1.1/HTTP/Client.html#compression
Though, it is not explained that passing the Accept-Encoding
header disables the automatic decompression!
If [...] no ˋAccept-Encodingˋ header is explicitly specified, ...
I am hitting this or #11354, In the body the first bytes in the returned string are reported as "\u001F\x8B". I'm a lttle confused about whether this is reporting an extra zero byte or not. The Conttent-Encoding header says "gzip". It's intermittent - does not always occurr on the same server response.
Bug Report
If I have this server:
And a client requesting it like so:
I'd expect Crystal to decompress the response automatically, and give me the resulting data. While this is the behavior for smaller responses, it seems like anything larger than ~13.8kb causes this step to be skipped. The above code would just return random gibberish binary data, instead of the long hex string.