caddyserver / caddy

Fast and extensible multi-platform HTTP/1-2-3 web server with automatic HTTPS
https://caddyserver.com
Apache License 2.0
55.71k stars 3.92k forks source link

Shared dictionary compression support #6204

Open nickchomey opened 3 months ago

nickchomey commented 3 months ago

Shared Dictionary Compression is a new technique that is starting to become available in browsers, which can reduce data transfer by up to 98%! It does this by leveraging brotli (not relevant to caddy) and ztsd's ability to use custom compression dictionaries when compressing a file - using, for example, the previous version of a script to compress the new version and sending the difference.

More on it all here https://developer.chrome.com/blog/shared-dictionary-compression

It is still in an Origin Trial in Chrome until April 30, 2024, so perhaps the Caddy team could inquire about it such that support could be implemented when appropriate?

The zstd package used by caddy already supports custom dictionaries https://github.com/klauspost/compress/blob/de4073a3abdd00a2a95e608f9fcaf6ebf9141cc0/zstd/README.md?plain=1#L324

So, it would be a matter of adding and listening for the relevant headers to and from the browser client.

Thanks for the consideration!

mholt commented 3 months ago

We could do that. (I'm curious how they solved the privacy problems associated with shared dictionaries. I haven't read up on it.)

Doesn't surprise me that zstd supports it, but that's the least common encoding browsers accept, currently. We'd need support in gzip and brotli libs as well.

Will keep an eye on it!

nickchomey commented 3 months ago

Brotli does support shared dictionary compression, but caddy doesn't (and seemingly won't, for performance reasons) support brotli. I doubt gzip would ever support this - I don't think it has any such custom compression dictionary mechanism.

It seems to me that zstd will continue to become more ubiquitous in browsers, cdns etc... (chromium 123 stable has it now https://caniuse.com/zstd, so edge etc have it too), so shared dictionary compression can just be a sort of "progressive enhancement" when a browser supports it (which, it seems to me, is how much of the browser header stuff works - be it content-encoding, languages, etc)

I'm sure the chrome team would be happy to explain if you join the origin trial! https://developer.chrome.com/origintrials/#/view_trial/2583940286203822081

Here's some more detailed resources

ottenhoff commented 3 months ago

This is exciting especially for those of us that pay egress bills to the cloud providers.

The custom dictionary is public with no ability for authentication. Ensuring privacy of information inside the dictionary is completely up to the implementor. Command-line training on a few dozen files looks like the way most would implement it.

francislavoie commented 3 months ago

Brotli is supported via https://github.com/dunglas/caddy-cbrotli if you're willing to build Caddy with CGO. We can't add it to the website though because we disable CGO for our build server.

nickchomey commented 3 months ago

The custom dictionary is public with no ability for authentication. Command-line training on a few dozen files looks like the way most would implement it.

The main point of shared dictionaries is that there's no singular custom dictionary. Literally every file/script can be its own custom dictionary, allowing up to like 99% compression on file changes.

The various links I shared above go into plenty more detail about this.

ottenhoff commented 3 months ago

The main point of shared dictionaries is that there's no singular custom dictionary. Literally every file/script can be its own custom dictionary, allowing up to like 99% compression on file changes.

Agreed, but implementing this in a load balancer means holding state of former files and that sounds like a lot of work. Implementing one single dictionary as an argument to Caddy's existing "encode zstd" sounds relatively straightforward?

nickchomey commented 3 months ago

Yes, I suppose a singular custom dictionary mechanism could be made available (as well as other ztsd config options, such as compression level - which doesn't appear to be possible currently).

But thats not really the point of this Issue. It's to implement whatever might be necessary to make full use of browsers' Shared Dictionary mechanisms.

I don't really see why it should be done at the load balancer level - the application servers that are being load balanced should have sufficient state and can return the relevant files.

Moreover, it doesn't seem relevant to me whether this might be more difficult to implement with some architectures - if it doesn't work for some, so be it.