expected implementation location

MarshallOfSound commented 4 months ago

Hey folks, tracking this proposal as it seems like a huge way to cut down on our CDN traffic and get people loading updated JS bundles faster.

From an implementation perspective, is there a recommended / expected location for the implementation in standard flows? I can see two places to do this:

The CDN itself, instead of proxying a static resource bucket instead proxy a service which can dynamically generate files on the fly based on the dictionary the client has available. The CDN will then cache these dynamically generated assets for future requests.
At build time, when generating final assets also generate "diff" assets based on the last N builds and store them statically. This still requires support at the CDN layer for serving these files dynamically but it at least doesn't require dynamic generation.

There are pros and cons to both approaches but wondering if from a spec perspective there is an "ideal" approach here or specifically what was envisaged while writing the spec.

For full context we ship like ~15-20 builds a day which means that solving the "how do we generate these files" is not a trivial problem to solve in either case (dynamic vs build time). But looking to go down the path most trodden.

pmeenan commented 4 months ago

Both were envisioned as likely popular options during spec work with neither really "favored".

My thoughts were that generating diffs during build time would probably be best since you can target Brotli 11 and get the best possible compression and can pick ahead-of-time how far back to go in generating deltas (and that is how everyone has done it during the Chrome origin trials because there is no CDN support yet).

The CDN path makes adoption much easier and it's likely devs will eventually be able to just specify the match pattern rules and the middleware (CDN or worker) would take care of generating deltas on-demand if it has the original file available. This would likely be async for static files so the first few requests wouldn't benefit and it may or may not be at the highest compression level (several only compress brotli level 1 dynamically for example).

pmeenan commented 4 months ago

btw, one thing that can help with doing the compression on the edge is to include the original file URL as the "ID" of the dictionary so when the client announces "Available-Dictionary" it will echo the dictionary URL back in the "Dictionary-ID" field so the CDN can fetch it from cache (or the origin) to do the offline compression. There's still some work to make sure the compression job isn't duplicated but you don't have to build a hash->URL mapping.

MarshallOfSound commented 4 months ago

btw, one thing that can help with doing the compression on the edge is to include the original file URL as the "ID" of the dictionary

Yup, I've been toying with a local proxy and trying to see how this works, came up with the Dictionary-ID trick to make things easier. Is there a world in the spec where that is just codified, not as the ID but maybe Dictionary-Source-Url or something that indicates where the dictionary was originally fetched from, this information appears to be stored so should be easy enough to re-transmit without everyone doing this trick?

There are implementation issues though as you've noted:

This would likely be async for static files so the first few requests wouldn't benefit and it may or may not be at the highest compression level (several only compress brotli level 1 dynamically for example).

You kind of have to serve the whole file back to cloudfront but say "only cache this for 5 minutes" or something. Which contradicts some things like global min_ttl settings on cloudfront distributions that enforce things are cached for at least an hour or something.

Will probably take some at-scale experimentation to figure out how best to handle these scenarios, maybe it's better to lock the first few requests for the same file and actually wait for the compression to complete in order to ensure that the CDN always has the "right" thing instead of temporarily having the wrong thing.

We at Slack also have a problem that some of our JS bundles exceed the "automatic" compression thresholds for services like Cloudfront, this means we actually upload multiple variants of our assets pre-compressed to brotli 11 and serve different asset manifests depending on Accept-Encoding. That system meant that we didn't need a dynamic thing between CDN <-> Storage. Understand why it's not possible, but it's a shame that in order to support this you need some kind of dynamic routing at a minimum, and dynamic compression aswell in the easy-to-adopt case 😆

pmeenan commented 4 months ago

The dictionary ID was the generic solution we came to during spec discussions for echoing the URL back. The server can choose to use the URL or a different cache key and each can decide how they want to reference it.

For the custom logic, we're hoping that an early CDN support item would be to rewrite the request to include the dictionary hash in the file name and automatically serve a diff that ends with the matching hash. Much like they serve .gz or .br variants automatically based on content encoding.

pmeenan commented 4 months ago

For cloudfront specifically, I have some notes here. Depending on how it is done, you should only need to run the lambda pretty infrequently and let the response be cache with Vary: available-dictionary, accept-encoding so the cache version would be sent for future requests.

As far as sending the "wrong" thing, hopefully that would never happen. If you don't have a delta available for a given dictionary you just serve the full resource (with brotli) and that will also get cached.

One risk is the number of permutations in cache based on the dictionaries that clients have available so cache lifetimes become more important (so it can expire as a dictionary and you don't get clients advertising weeks-old dictionaries).

A reasonable strategy would be to target the frequent users that presumably come back at least once a week and generate deltas going back a week (and set the cache lifetimes for the static resources to a week as well). That's still a lot of builds but it's bounded (and you can tighten it up if the typical pattern is daily or every few days).

Another strategy that might work better would be to generate a stand-alone dictionary based on samples of the code from, say, the last 4 weeks (1-2 releases per week should be enough to get the common bits). You generate a single dictionary that includes all of the code across bundles. You reference the dictionary with <link rel="compression-dictionary" href="..."> and have it specify a match pattern that covers all of the code.

That way you can use the same dictionary for all of your builds, assuming the vast majority of the code doesn't actually change over the course of a month and you only have one delta per bundle. You can refresh the dictionary as it starts to get stale (say, monthly) to keep the compression rates up.

Bonus, if you serve the dictionary from the same path as the bundles (or at least covered by the same match pattern) then you can use the dictionary as a dictionary for the next version and just send dictionary deltas.

The main downside with this strategy is that the first time a client downloads the dictionary they will be downloading the extra bytes (while regular delta reuses the code they were downloading anyway). but it makes managing the build and deltas MUCH easier (and improves the cache hit rate for deltas).

In pretty much all cases, the deltas work best if the minification and function/variable renaming are relatively consistent from build to build. If everything gets renamed when a function is added then the deltas will not be as effective as they could be.

MarshallOfSound commented 4 months ago

A reasonable strategy would be to target the frequent users that presumably come back at least once a week and generate deltas going back a week (and set the cache lifetimes for the static resources to a week as well). That's still a lot of builds but it's bounded (and you can tighten it up if the typical pattern is daily or every few days).

We already kind of do this, the assets live in HTTP cache for a long time but if we have our own caching logic via a service worker which kicks you to newer assets if you're more than N days old. I think N is around a week but haven't checked in a while. Codifying this into http cache lifetime wouldn't be a bad idea but would require some thinking.

That is an astonishing number of builds (unfortunately) and generating Brotli 11 compressed versions of a few hundred JS files compared to probably nearly ~100+ builds is uh, probably gonna be quite slow if you do it upfront on the build machine.

In pretty much all cases, the deltas work best if the minification and function/variable renaming are relatively consistent from build to build. If everything gets renamed when a function is added then the deltas will not be as effective as they could be.

Based on my testing this is true, even our builds from a few weeks apart from very small "diffs" when compressed using the 2 week old build as the dictionary.

Another strategy that might work better would be to generate a stand-alone dictionary based on samples of the code from, say, the last 4 weeks (1-2 releases per week should be enough to get the common bits). You generate a single dictionary that includes all of the code across bundles. You reference the dictionary with and have it specify a match pattern that covers all of the code.

Is there a part of the spec that dictates when the compression dictionary would get downloaded 🤔 I don't want to front-load folks with basically "all the JS but in dictionary form" as that would negatively impact first-load performance. If we could do first load and then at somepoint, seconds, mintues after download the dictionary that could work if we could figure out a good system for generating a dictionary from a single pile of assets that doesn't result in like a 10MB dictionary file 🤔

MIght have to go talk to some of our infra / build folks and see if they have any good ideas. But for sheer size-of-matrix reasons I think doing this all dynamically is probably the easiest path forward.

pmeenan commented 4 months ago

Is there a part of the spec that dictates when the compression dictionary would get downloaded 🤔 I don't want to front-load folks with basically "all the JS but in dictionary form" as that would negatively impact first-load performance. If we could do first load and then at somepoint, seconds, mintues after download the dictionary that could work if we could figure out a good system for generating a dictionary from a single pile of assets that doesn't result in like a 10MB dictionary file 🤔

The spec isn't explicit about when it is downloaded and the goal is to have it download at "idle" and at the lowest priority but "idle" for a SPA can be a complicated topic. You can trigger it manually by either fetching directly or inserting the link tag when you think it would be a good time (the link tag isn't necessary, it's just a way to trigger the fetch - the important part is the response headers that make it a dictionary).

There is a dictionary_generator that is part of the brotli repository that will generate a dictionary of whatever size you specify given a collection of files. You just drop all of the files in one place and test different dictionary sizes to see where you start to get diminishing returns.

I have a hosted version of it that runs here that you can pass it a list of URLs and have it generate a dictionary. It can also run in a mode where it iterates through all of the files, testing how each compresses against a dictionary generated based on the other files (but that can be slow since it needs to generate N dictionaries).

pmeenan commented 4 months ago

You could also have several external dictionaries as long as the match patterns are separate so it doesn't necessarily all have to be in one single file.

pmeenan commented 2 months ago

Closing this out as there's no spec work required.

WICG / compression-dictionary-transport

expected implementation location #58