elixir-mint / mint

Functional HTTP client for Elixir with support for HTTP/1 and HTTP/2 🌱
Apache License 2.0
1.36k stars 112 forks source link

Update Decompression.md - drop deflate, mention zstd #405

Closed wojtekmach closed 1 year ago

wojtekmach commented 1 year ago

First of all, we can't use :zlib.unzip/1 for deflate. If anything, it's :zlib.uncompress/1:

iex> Req.get!("http://httpbin.org/deflate", raw: true).body |> :zlib.unzip()
** (ErlangError) Erlang error: :data_error
    :zlib.inflate_nif(#Reference<0.3067689971.3529113605.174556>, 8192, 16384, 0)
    :zlib.dequeue_all_chunks_1/3
    :zlib.inflate/3
    :zlib.unzip/1
    iex:3: (file)
iex> Req.get!("http://httpbin.org/deflate", raw: true).body |> :zlib.uncompress()
"{\n  \"deflated\": true, \n  \"headers\": {\n    \"Accept-Encoding\": \"zstd, br, gzip\", \n    \"Host\": \"httpbin.org\", \n    \"User-Agent\": \"req/0.3.11\", \n    \"X-Amzn-Trace-Id\": \"Root=1-64d6b50d-4c634ab23ae6d5fe3dab6694\"\n  }, \n  \"method\": \"GET\", \n  \"origin\": \"89.73.45.237\"\n}\n"

But I don't think it's quite that simple.

Bandit has support for deflate: https://github.com/mtrudel/bandit/blob/1.0.0-pre.11/lib/bandit/compression.ex#L15 but using :zlib.uncompress/1 there crashes:

iex> Bandit.Compression.compress("hello", "deflate", []) |> :zlib.uncompress()
** (ErlangError) Erlang error: :data_error
    (erts 14.0.2) :zlib.inflateEnd_nif(#Reference<0.2420980051.2993029125.66777>)
    (erts 14.0.2) :zlib.uncompress/1
    iex:2: (file)

I trust @mtrudel to have followed the RFCs and implemented this correctly. :) So if you want to mention deflate, we should follow his implementation, do the opposite.

But, and maybe I'm just burned out with this a bit, I don't think it's worth it. None of the big sites seem to support it. And then there's https://zlib.net/zlib_faq.html#faq39

What's the difference between the "gzip" and "deflate" HTTP 1.1 encodings?

"gzip" is the gzip format, and "deflate" is the zlib format. They should probably have called the second one "zlib" instead to avoid confusion with the raw deflate compressed data format. While the HTTP 1.1 RFC 2616 correctly points to the zlib specification in RFC 1950 for the "deflate" transfer encoding, there have been reports of servers and browsers that incorrectly produce or expect raw deflate data per the deflate specification in RFC 1951, most notably Microsoft. So even though the "deflate" transfer encoding using the zlib format would be the more efficient approach (and in fact exactly what the zlib format was designed for), using the "gzip" transfer encoding is probably more reliable due to an unfortunate choice of name on the part of the HTTP 1.1 authors.

Bottom line: use the gzip format for HTTP 1.1 encoding.

And then there's this excerpt from zlib:deflateInit/6

WindowBits - The base two logarithm of the window size (the size of the history buffer). It is to be in the range 8 through 15. Larger values result in better compression at the expense of memory usage. Defaults to 15 if deflateInit/2 is used. A negative WindowBits value suppresses the zlib header (and checksum) from the stream. Notice that the zlib source mentions this only as a undocumented feature.

Again I think the implementation is correct because I saw setting -MAX_BITS in Python or Ruby too.

I had (wrong) support for it in Req and decided to just drop it (https://github.com/wojtekmach/req/commit/e19d89f6c6eb1dd21a373c192f50afa6026fdbac, https://github.com/wojtekmach/req/issues/215) fwiw.

mtrudel commented 1 year ago

Thoughts in no particular order.

wojtekmach commented 1 year ago

Thank you for the cache/deflate link, very interesting read.

Regarding brotli, I think the brotli package looks pretty good. I wouldn't include it by default but I'd consider optionally supporting it.

I think something like http_content_encoding sounds appealing.