clj-commons / aleph

Asynchronous streaming communication for Clojure - web server, web client, and raw TCP/UDP
http://aleph.io
MIT License
2.54k stars 241 forks source link

Add support for `wrap-decompression` middleware #683

Open KingMob opened 1 year ago

KingMob commented 1 year ago

clj-http has middleware support for automatically decoding compressed body data via wrap-decompression. As aleph's HTTP client has the explicit goal of clj-http compatibility, we should add this too.

Right now, if you send the header "accept-encoding: gzip, deflate" and get a compressed body back, you have to manually decompress it yourself.

gzip and deflate are the minimum necessary; brotli support should also be added, if available.

KingMob commented 1 year ago

~HTTP2 may complicate this, as it appears the primary way to decompress is to wrap the codec in the Netty pipeline with a decompression version.~ Nah, that's solved.

rajcspsg commented 1 year ago

I'm interested in working on this ticket.

KingMob commented 1 year ago

Great! Welcome to aleph.

First, we need a plan before writing code.

Not all Aleph middleware is implemented as actual middleware. It's meant to have API parity with clj-http, but the underlying architectures are different. Pretty sure the codec code will be like that.

I'll share some more ideas when I'm back at my computer.

On Sun Nov 5, 2023, 11:09 PM GMT, RAJKUMAR NATARAJAN @.***> wrote:

I'm interested in working on this ticket.

— Reply to this email directly, view it on GitHub https://github.com/clj-commons/aleph/issues/683#issuecomment-1793876359, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHHB5LXQDIP2GQ4QPAFRFTYDAFBJAVCNFSM6AAAAAAZK2RIUSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOJTHA3TMMZVHE . You are receiving this because you authored the thread.Message ID: @.***>

KingMob commented 1 year ago

Preparation

First, I would do your work against the feature/http2-server branch. Once #687 merges, you might have to rearrange some code, and there's a new compression ns in that PR that you might need to use or add to.

HTTP/1

For the H1 client code, what we'll probably want to do is, look for the wrap-decompression symbol in the middleware list, and if present, add an instance of the HttpContentDecompressor handler after the HttpClientCodec in setup-http1-pipeline.

HTTP/2

For the H2 client, I believe AlephHttp2FrameCodecBuilder will already decorate the decompression codec properly. You might not have to do anything extra as long as :compression? is true.

Though on a side note, we probably shouldn't add h2-compression-handler to client pipelines, just servers.

Zstd issues

One catch is, Netty doesn't support Zstd decoding yet. We have several options:

  1. Don't support Zstd at all. It's not too popular yet, and most browsers don't support it (https://caniuse.com/zstd). Should probably parse and strip zstd from the accept-encoding header.
  2. Wait for https://github.com/netty/netty/pull/13531 to add it. The PR has been in-progress since August, and it's unclear when it will land.
  3. Update the byte-transforms library to support Zstd and Brotli, and use that to transform the InputStream in Aleph code before handing off the body InputStream to users. This will require extra work for raw mode, since we'd then have to convert back to ByteBufs.
  4. Add Apache commons-codec support. I think this is what clj-http uses. Will require another dep, and probably hsa the same issue as (3) with raw mode.

Might be simplest to do (1), and add full support when (2) is ready.

rajcspsg commented 11 months ago

@KingMob Thanks for your suggestions. Now #687 is merged, I looked into the code and couldn't find wrap-decompression middleware. Am I missing anything here?

Also, I didn't get this what do you mean regarding the note -

we probably shouldn't add h2-compression-handler to client pipelines, just servers.

KingMob commented 11 months ago

@rajcspsg wrap-decompression doesn't exist yet. It's in clj-http, and needs to be added to Aleph. Or to be more explicit, support for it needs to be added, but not all of it will be proper middleware.

Part of this will have to be done with Netty. Compression in Netty isn't done with a pipeline handler, but by injecting codec classes into the Http2FrameCodec handler. So you'll need to look at AlephHttp2FrameCodecBuilder. Some of the decompression code you need is already in place, but I haven't looked at which pieces are missing for H2, if any.

If you call (aleph.http.AlephHttp2FrameCodecBuilder/setCompression builder true some-options-here) it will add the necessary DelegatingDecompressorFrameListener which should already look for the "content-encoding" header and decompress correctly.

So for H2, I think this PR will need to

  1. Look for the wrap-decompression symbol in the user's client middleware list
  2. Call .setCompression(true, ...) on AlephHttp2FrameCodecBuilder
  3. Add a real middleware fn that mimics the update-in of clj-http's wrap-decompression to update the "accept-encoding" header's list of supported codecs.
  4. Potentially remove the "content-encoding" header after decompression, if it's not already done by Netty. (This is to discourage anyone downstream from trying to decompress a second time.) Or better still, remove it and add an "original-content-encoding" header for debugging purposes.

Unlike clj-http, we probably want to add Brotli to the default list. clj-http defaults to adding "gzip, deflate" to whatever's already in the header (if anything), which is not quite correct, but good enough 99.99% of the time. We should leave out zstd and snappy from the defaults, I think. Zstd is promising but still only has partial support. Snappy is niche, and not supported by browsers.

we probably shouldn't add h2-compression-handler to client pipelines, just servers.

This is a little out-of-date. Right now, we don't add it to clients, but the docs and key names used aren't very clear that it's server-only.

The h2-compression-handler is about parsing the client's "accept-encoding" header to determine what encoding to return. In theory, this could be symmetrical, and clients could encode their bodies, but for pragmatic and security reasons, client request body encoding isn't a thing, and we should change the code to make it clearer that it's server-only, and prevent misuse. (E.g., rename the keys, update AlephHttp2FrameCodecBuilder to compress or decompress, but not both simultaneously, etc.) You don't have to do that in this PR, though I think maybe I should update the key names before we do a full release of 0.7.0.

KingMob commented 11 months ago

BTW, did you vote on https://clojurians.slack.com/archives/C0G922PCH/p1702723929287379 ?

rajcspsg commented 11 months ago

BTW, did you vote on https://clojurians.slack.com/archives/C0G922PCH/p1702723929287379 ?

yes I did :)

KingMob commented 10 months ago

Hmm, I don't see your name on the vote emojis...