crystal-lang / crystal

The Crystal Programming Language
https://crystal-lang.org
Apache License 2.0
19.33k stars 1.61k forks source link

Error parse DEFLATE compression format #5221

Open mamantoha opened 6 years ago

mamantoha commented 6 years ago

Just a simple example:

require "http"

response = HTTP::Client.get("http://httpbin.org/deflate")
puts response.body
flate: invalid stored block lengths DATA_ERROR (Flate::Error)
  from /opt/crystal/src/flate/reader.cr:104:9 in 'read'
  from /opt/crystal/src/io.cr:560:29 in 'gets_to_end'
  from /opt/crystal/src/http/client/response.cr:73:15 in 'consume_body_io'
  from /opt/crystal/src/http/client/response.cr:97:9 in 'from_io?'
  from /opt/crystal/src/http/client.cr:502:5 in 'exec_internal_single'
  from /opt/crystal/src/http/client.cr:486:5 in 'exec_internal'
  from /opt/crystal/src/http/client.cr:482:5 in 'exec'
  from /opt/crystal/src/http/client.cr:585:5 in 'exec'
  from /opt/crystal/src/http/client.cr:612:7 in 'exec'
  from /opt/crystal/src/http/client.cr:329:3 in 'get'
  from get_request.cr:3:1 in '__crystal_main'
  from /opt/crystal/src/crystal/main.cr:11:3 in '_crystal_main'
  from /opt/crystal/src/crystal/main.cr:112:5 in 'main_user_code'
  from /opt/crystal/src/crystal/main.cr:101:7 in 'main'
  from /opt/crystal/src/crystal/main.cr:135:3 in 'main'
  from __libc_start_main
  from _start
  from ???

The only thing I can say is broken after this commit https://github.com/crystal-lang/crystal/commit/bda40f805ace44df932571b4de93c3d79400993b#diff-72e61251c7081b61beddd7e51e55dae3.

On Crystal 0.20.5 everything works fine.

I don't know how critical this bug is because I've never seen a website which used Deflate Compression :)

z64 commented 6 years ago

It would appear that the stdlib isn't detecting the zlib header? :thinking:

client = HTTP::Client.new("httpbin.org")
client.compress = false

response = client.get("/deflate")

p response.headers
io = IO::Memory.new(response.body)
io.skip(2)
reader = Flate::Reader.new(io, sync_close: true)
puts reader.gets_to_end

This will work (in this specific case), skipping over the two byte header of 78 9c (hexdump with headers). Alternatively instead of skipping you can use Zlib::Reader.

Changing compress back to true throws the exception in the OP.

So, unless I'm missing something, the header isn't handled anywhere and Flate dies expecting data; https://github.com/crystal-lang/crystal/blob/master/src/http/common.cr#L46-L50 https://github.com/crystal-lang/crystal/blob/master/src/http/client/response.cr#L73

Whether it should, I don't know - I would think so; it seems like HTTP::Client.get would always die on Content-Encoding: deflate this way

asterite commented 6 years ago

It works if we change Flate::Reader to Zlib::Reader it works. But flate and zlib are two different things!

Then I found this:

https://en.wikipedia.org/wiki/HTTP_compression#Problems_preventing_the_use_of_HTTP_compression

Another problem found while deploying HTTP compression on large scale is due to the deflate encoding definition: while HTTP 1.1 defines the deflate encoding as data compressed with deflate (RFC 1951) inside a zlib formatted stream (RFC 1950), Microsoft server and client products historically implemented it as a "raw" deflated stream,[19] making its deployment unreliable.[20][21] For this reason, some software, including the Apache HTTP Server, only implement gzip encoding.

Maybe we should fix it to check the zlib header, I don't know.

asterite commented 6 years ago

For starters, maybe we should change this:

https://github.com/crystal-lang/crystal/blob/99c3d2b624db0d0c95eeb97497ed04a9a78d6291/src/http/common.cr#L48-L49

to use Zlib::Reader because of what it says above.

mamantoha commented 6 years ago

Yep, Zlib::Reader should work.

But if you change to Zlib::Reader in a client, you should also change to Zlib::Writer in server implementation (HTTP::CompressHandler). I don't know.

z64 commented 6 years ago

@asterite if you see that reference in the Wiki article for "raw" deflate, it's a comment from Mark Adler on SO:

It is apparently due to a misunderstanding resulting from the choice of the name "Deflate". The http standard clearly states that "deflate" really means the zlib format: [ ... ] However early Microsoft servers would incorrectly deliver raw deflate for "Deflate" (i.e. just RFC 1951 data without the zlib RFC 1950 wrapper). This caused problems, browsers had to try it both ways, and in the end it was simply more reliable to only use gzip.

https://stackoverflow.com/a/9186091/2717676

Sounds to me like using a higher level Flate is the way to go for sure. To me this says if something says encoding deflate, it should be readable with Zlib::Reader. If this would fail, I don't think there's a way we can safely know what we were given and this is an issue with the server.

If it did fail, it would likely be the Zlib invalid header exception which I think is easier to digest ("this isn't zlib wrapped!") than invalid stored block lengths DATA_ERROR ("something is corrupt!").

The user can always set #compress = false and parse the body themselves correctly, however the server is shipping them compressed data.

I agree with @mamantoha ; if Flate::Writer would not be emitting the zlib wrapper, we should use Zlib::Writer as well

RX14 commented 6 years ago

Why not just detect the header and do the right thing, if it's that easy

asterite commented 6 years ago

@RX14 Sounds good. The main difficulty is where to put that check. But yeah, PRs fixing this will be welcome.

HCLarsen commented 2 months ago

I'm getting this error in one of my apps, so I'm guessing this issue hasn't been fixed yet?