This explainer outlines the benefits of compression dictionaries, details the different use case for them, and then proposes a way to deliver such dictionaries to browsers to enable these use cases.
The HTTP headers and negotiation are specified in the IETF Draft document for Compression Dictionary Transport.
This proposal adds support for using designated previous responses as an external dictionary for HTTP responses for compression schemes that support external dictionaries (e.g. Brotli and Zstandard).
HTTP Content-Encoding
is extended with new encoding types and support for allowing responses to be used as dictionaries for future requests. All actual header values and names still TBD:
Use-As-Dictionary: <options>
response header.match
URL pattern for the resource with the cached response to identify it as a dictionary.match
URL patterns. If multiple patterns are matched, the most-specific match is used. If a dictionary is available for a given request, the client will add an appropriate compression scheme (e.g. br-d
for shared brotli) to the Accept-Encoding
request header as well as an Available-Dictionary: <sf-binary SHA-256>
header with the hash of the best available dictionary. The hash is sent as a Structured Field Byte Sequence (base64-encoded, enclosed by colons). e.g. Available-Dictionary: :pZGm1Av0IEBKARczz7exkNYsZb8LzaMrV7J32a2fFG4=:
.Content-Encoding:
(e.g. br-d
) and Vary: Accept-Encoding,Available-Dictionary
.For interop reasons, dictionary-based compression is only supported on secure contexts (similar to brotli compression).
There are also some browser-specific features independent of the transport compression:
<link rel=dictionary href=[dictionary_url]>
.Compression dictionaries are bits of compressible content known ahead of time. They are being used by compression engines to reduce the size of compressed content.
Because they are known ahead of time, the compression engine can refer to the content in the dictionary when representing the compressed content, reducing the size of the compressed payload. The decompression engine can then interpret the content based on that pre-defined knowledge..
Taken to the extreme, if the compressed content is identical to the dictionary, the entire delivered content be a few bytes referring to the dictionary.
Now, you may ask, if dictionaries are so awesome, then...
To some extent, they are. The brotli compression scheme includes a built-in dictionary that was built to work reasonably well for HTML, CSS and JavaScript. Custom (shared) dictionaries have a more complicated history.
At some point, Chrome did support a shared compression dictionary. When Chrome was first released, it supported a dictionary compression method called SDCH (Shared-dictionary Compression over HTTP). That support was unshipped in 2016 due to complexities around the protocol’s implementation, specification and lack of an interoperability story.
SDCH enabled Chrome and Chromium-based browsers to create origin-specific dictionaries, that were downloaded once for the origin and enabled multiple pages to be compressed with significantly higher rates. That's one use case for compression dictionaries we will call the "Shared dictionary" use case.
There's another major use case for shared dictionaries that was never supported by browsers - delta compression.
That use-case would enable the browser to reuse past resources (e.g. your site's main JS v1.2) in order to compress future ones (e.g. main JS v1.3). But traditionally, this use-case raised complexities around the abilities of the browser to coordinate its cache state with the server, and agree on what the dictionary would be. It also raised issues with both sides having to store all past versions of each resource in order to successfully be able to compress and decompress it.
The common thread is that the use of compression dictionaries had run into various complexities over the years which resulted in deployment issues.
A few things about this current proposal are different from past attempts, in ways we're hoping are meaningful:
There are two primary models for using shared dictionaries that are similar but differ in how the dictionary is fetched:
In both cases the client advertises the best-available dictionary that it has for a given request. If the server has a delta-compressed version of the resource, compressed with the advertized dictionary, it can just send that delta-compressed diff. It can also use that advertized dictionary (if available) to dynamically compress that resource.
With the Delta compression
use case, a previously-downloaded version of the resource is available to use for future requests as a dictionary. For example, with a JavaScript file, v1 of the file may be in the browser's cache and available for use as a dictionary to use when fetching v2 so only the difference between the two needs to be transmitted.
In the Shared dictionary
use case, the dictionary is a purpose-built dictionary that is fetched using a <link>
tag and can be used for future requests that match the match
URL pattern covered by the dictionary. For example, on a first visit to a site, the HTML response references a custom dictionary that should be used for document
fetches for that origin. The dictionary is downloaded at some point by the browser and, on future navigations through the site, is advertised as being available for document requests that match the URL pattern that the dictionary applies to.
The Shared Brotli draft does a good job describing the security risks. In summary:
Dictionaries will need to be cached using a triple key (top-level site, nested context site, URL) similar to other cached resources (or any other partitioning scheme that’s good enough for cached resources and cookies from a privacy and security perspective). That’s not an issue for the delta compression use case, but can become a burden fast for the out-of-band dictionaries, as multiple nested contexts may need to download the same dictionary multiple times.
Note: Common payload caching may be useful in such cases.
There’s also the issue of users advertising resource versions in their cache to servers as part of the request. This already has a precedence in terms of cache validators (ETags, If-Modified-Since), so maybe that’s fine, given that the cache is partitioned.
Downloading an out-of-band dictionary means that the site owner is making a certain bet regarding the amount of visits that would enable the user to amortize that dictionary’s cost.
At worst, if the user never visits the site again until the dictionary’s lifetime expires, the user has paid the cost of downloading the dictionary with no benefits.
For some large and heavily trafficked sites, that case is rare. For others, it’s extremely common, and we should be wary of both the tools we’d be putting in developers’ hands, as well as the messaging we’re providing them regarding when to use them.
In this flow, we’re reusing static resources themselves as dictionaries that would be used to compress future updates of themselves, or similar resources.
Use-As-Dictionary: <options>
response header. The options are a structured field dictionary that includes the ability to set a URL-matching pattern, matching fetch destination, and an opaque identifier. More details here.Access-Control-Allow-Origin:
response header that makes the response readable by the document.Available-Dictionary:
request header, which lists a single hash (encoded as a Structured Field Byte Sequence).
Available-Dictionary:
request header (to limit variations in the Vary
caches).Dictionary-ID
request header.Available-Dictionary
header in it:
sec-fetch-mode: cors
request header then the dictionary should be ignored unless the response will have an Access-Control-Allow-Origin:
response header that includes the origin of the page the request was issued from (*
or matched against the origin:
or referer:
).Content-Encoding
header as well as a Content-Dictionary
response header with the hash of the dictionary that was used (must match the hash from the Available-Dictionary
request header).
Accept-Encoding: deflate, gzip, br, br-d
request would respond with Content-Encoding: br-d
.Access-Control-Allow-Origin:
response header that makes the response readable by the document.Link:
header on the document response or <link>
HTML tag with a rel=dictionary
type.
Cache-Control: private
headers).Use-As-Dictionary: <options>
header, appropriate cache lifetime headers and will be used for future requests using the same process as the Static resources flow.
Access-Control-Allow-Origin:
response header that makes the response readable by the document.The Use-As-Dictionary:
response header is a structured field dictionary that allows for setting multiple options and for future expansion. The supported options and defaults are:
URLPattern(patternString, baseURL)
constructor where the baseURL is the URL of the request and where support for regexp tokens is disabled. URLPattern allows for absolute or relative URLs. e.g. /app1/main*
will match https://www.example.com/app1/main_12345.js
and main*
in response to https://www.example.com/app1/main_1.js
will match https://www.example.com/app1/main.xyz.js
. Dictionaries will only match requests from the same origin as the dictionary.()
) which will match all request destinations.Dictionary-ID
request header when the dictionary matches an outbound request. The default value is an empty string (""
).For example: use-as-dictionary: match="/app1/main*", match-dest=("script"), id="xxx"
would specify matching on a path prefix of /app1/main
for script requests and to send Dictionary-ID: "xxx"
for any requests that match the dictionary.
The dictionary negotiation is independent of the compression algorithm that is used for compressing the HTTP response and is designed to support any compression scheme that supports using external compression dictionaries. Currently that includes Brotli and Zstandard but it is not limited to those (and depends on the what the client and server both support). It is likely that, in the future, content-specific compression schemes that handle delta-compression better may be built (i.e. code-aware Wasm compression).
The compression algorithm negotiation uses the regular Accept-Encoding:
/Content-Encoding:
negotiation that is used for non-dictionary compression. It is important that new names are registered with the HTTP Content Coding Registry for algorithms that use an external dictionary to prevent situations where processing along the request flow may attempt to decode a response using just the algorithm without being dictionary-aware. That way, if anything in the request flow needs to operate on the decoded content, it can either be made aware of the dictionary-based compression or it can modify the Accept-Encoding:
request header to only support schemes that it is aware of (already common practice).
The examples in this document will use br-d
for dictionary-based Brotli compression but the actual algorithm(s) negotiated could be anything that the client supports.
The compression API can also expose support for using caller-supplied dictionaries but that is out-of-scope for this proposal.
Websocket support is out-of-scope for this proposal but there is nothing in the current dictionary negotiation that precludes websockets from being able to build dictionary-based compression (either by leveraging parts of what is provided here or building something separate).
Since the contents of the dictionary and compressed resource are both effectively readable through side-channel attacks, this proposal makes it explicit and requires that both be CORS-readable from the document origin. The origin for the URL the dictionary was served from and the origin of the match
pattern for URLs MUST be the same (i.e. the dictionary and compressed resource must both be from the same origin).
For dictionaries and resources that are same-origin as the document, no additional requirements exist as both are CORS-readable from the document context. For navigation requests, their resource is by definition same-origin as the document their response will eventually commit. As a result, the dictionaries that match their URL pattern are similarly same-origin.
For dictionaries and resources served from a different origin than the document, they must be CORS-readable from the document origin. e.g. Access-Control-Allow-Origin: <document origin or *>
. This means that any crossorigin content that is fetched in no-cors
mode by default must enable CORS-fetching (usually with the crossorigin
attribute).
When sending a CORS request with an available dictionary, a browser should only include the Available-Dictionary:
header if it is also sending the sec-fetch-mode:
header so a CORS-readable decision can be made on the server before responding.
In order to prevent sending dictionary-compressed responses that the client will not be able to process, when a server receives a request with sec-fetch-mode: cors
as well as a Available-Dictionary:
dictionary, it should only use the dictionary if the response includes a Access-Control-Allow-Origin:
response header that includes the origin of the page the request was made from. Either by virtue of Access-Control-Allow-Origin: *
covering all origins or if Access-Control-Allow-Origin:
includes the origin in the origin:
or referer:
request header. If there is no origin:
or referer:
request header and Access-Control-Allow-Origin:
is not *
then the dictionary should not be used.
To discourage encoding user-specific private information into the dictionaries, any out-of-band dictionaries fetched using a <link>
will be uncredentialed fetches.
These protections against compressing opaque resources make CORB and ORB considerations unnecessary as they are specific to protecting opaque resources.
The existence of a dictionary is effectively a cookie for any requests that match it and should be treated as such:
The existence of support for dictionary-based Accept-Encoding:
has the potential to leak client state information if not applied consistently. If the browser supports dictionary-based compression algorithms encoding then it should always be advertised, independent of the current state of the feature. Specifically, this means that in any private browsing mode (Incognito in Chrome), dictionary-based algorithm support should still be advertised even if the dictionaries will not persist so that the state of the private browsing mode is not exposed.
The explicit fetching of a dictionary through a <link rel=dictionary>
tag or Link:
header is functionally equivalent to <link rel=preload>
with different priority and should be treated as such. This means that the Link:
header is only effective for document navigation responses and can not be used for subresource loads.
This prevents passive resources, like images, from using the dictionary fetch as a side-channel for sending information.
Any caches between the server and the client will need to be able to support Vary
on both Accept-Encoding
and Available-Dictionary
, otherwise the responses will be either corrupt (in the case of serving a dictionary-compressed resource with the wrong dictionary) or ineffective (serving a non-dictionary-compressed resource when dictionary compression was possible).
Any middle-boxes in the request flow will also need to support the dictionary-compressed content-encoding, either by passing it through unmodified or by managing the appropriate dictionaries and compressed resources.
In this example, www.example.com will use a bundle of application JavaScript that they serve from a separate static domain (static.example.com). The JavaScript files are versioned and have a long cache time, with the URL changing when a new version of the code is shipped.
On the initial visit to the site:
<script src="https://github.com/WICG/compression-dictionary-transport/raw/main//static.example.com/app/main.js/123" crossorigin>
(where 123 is the build number of the code).Accept-Encoding: br-d,br,gzip
.Use-As-Dictionary: match="/app/main.js*"
, Access-Control-Allow-Origin: https://www.example.com
and Vary: Accept-Encoding,Available-Dictionary
.https://www.example.com/app/main.js*
URL pattern.sequenceDiagram
Browser->>www.example.com: GET /
www.example.com->>Browser: ...<script src="https://github.com/WICG/compression-dictionary-transport/raw/main//static.example.com/app/main.js/123" crossorigin>...
Browser->>static.example.com: GET /app/main.js/123<br/>Accept-Encoding: br,gzip
static.example.com->>Browser: Use-As-Dictionary: match="/app/main.js"<br/>Access-Control-Allow-Origin: https://www.example.com<br/>Vary: Accept-Encoding,Available-Dictionary
At build time, the site developer creates delta-compressed versions of main.js using previous builds as dictionaries, storing the delta-compressed version along with the SHA-256 hash of the dictionary used (e.g. as main.js.<hash>.br-d
).
On a future visit to the site after the application code has changed:
<script src="https://github.com/WICG/compression-dictionary-transport/raw/main//static.example.com/app/main.js/125" crossorigin>
.https://www.example.com/app/main.js/125
request with the https://www.example.com/app/main.js*
URL pattern of the previous dictionary response that is in cache and requests https://static.example.com/app/main.js/125 with Accept-Encoding: br-d,br,gzip
, sec-fetch-mode: cors
and Available-Dictionary: :pZGm1Av0IEBKARczz7exkNYsZb8LzaMrV7J32a2fFG4=:
. For this example, the hash value from the header would need to be re-encoded as a filesystem-safe version of the hash before looking for the file (bas64-decode the header value and hen hex-encode the hash).Content-Encoding: br-d
, Access-Control-Allow-Origin: https://www.example.com
, Vary: Accept-Encoding,Available-Dictionary
, and Content-Dictionary: :pZGm1Av0IEBKARczz7exkNYsZb8LzaMrV7J32a2fFG4=:
response headers.It could have also included a new Use-As-Dictionary: match="/app/main.js*"
response header to have the new version of the file replace the old one as the dictionary to use for future requests for the path but that is not a requirement for the existing dictionary to have been used.
sequenceDiagram
Browser->>www.example.com: GET /
www.example.com->>Browser: ...<script src="https://github.com/WICG/compression-dictionary-transport/raw/main//static.example.com/app/main.js/125" crossorigin>...
Browser->>static.example.com: GET /app/main.js/125<br/>Accept-Encoding: br-d,br,gzip<br/>sec-fetch-mode: cors<br/>Available-Dictionary: :pZGm1Av0IEBKARczz7exkNYsZb8LzaMrV7J32a2fFG4=:
static.example.com->>Browser: Content-Encoding: br-d<br/>Content-Dictionary: :pZGm1Av0IEBKARczz7exkNYsZb8LzaMrV7J32a2fFG4=:<br/>Access-Control-Allow-Origin: https://www.example.com<br/>Vary: Accept-Encoding,Available-Dictionary
In this example, www.example.com has a custom-built dictionary that should be used for all navigation requests to /product.
On the initial visit to the site:
<link rel=dictionary href="https://github.com/WICG/compression-dictionary-transport/blob/main/dictionaries/product_v1.dat">
.use-as-dictionary: match="/product/*", match-dest=("document"), id="product_v1"
and appropriate caching headers.https://www.example.com/product/*
URL pattern, the document
destination and the product_v1
dictionary ID.sequenceDiagram
Browser->>www.example.com: GET /
www.example.com->>Browser: ...<link rel=dictionary href="https://github.com/WICG/compression-dictionary-transport/blob/main/dictionaries/product_v1.dat">...
Browser->>www.example.com: GET /dictionaries/product_v1.dat<br/>Accept-Encoding: br,gzip
www.example.com->>Browser: use-as-dictionary: match="/product/*", match-dest=("document"), id="product_v1"
At some point after the dictionary has been fetched, the user clicks on a link to https://www.example.com/product/myproduct:
/product/myproduct
request with the https://www.example.com/product/*
URL pattern of the previous dictionary request as well as the document
request destination and requests https://www.example.com/product/myproduct with Accept-Encoding: br-d,br,gzip
, Available-Dictionary: :pZGm1Av0IEBKARczz7exkNYsZb8LzaMrV7J32a2fFG4=:
and Dictionary-ID: "product_v1"
request headers.Content-Encoding: br-d
and Content-Dictionary: :pZGm1Av0IEBKARczz7exkNYsZb8LzaMrV7J32a2fFG4=:
response headers.sequenceDiagram
Browser->>www.example.com: GET /product/myproduct<br/>Accept-Encoding: br-d,br,gzip<br/>Available-Dictionary: :pZGm1Av0IEBKARczz7exkNYsZb8LzaMrV7J32a2fFG4=:<br/>Dictionary-ID: "product_v1"
www.example.com->>Browser: Content-Encoding: br-d<br/>Content-Dictionary: :pZGm1Av0IEBKARczz7exkNYsZb8LzaMrV7J32a2fFG4=:
These are the changes that have been made to the specs as it has progressed through various standards organizations and based on developer feedback during browser experiments.
Sec-Available-Dictionary
request header changed to Available-Dictionary
.Available-Dictionary
request header changed to be a Structured Field Byte Sequence (base-64 encoding of the dictionary hash, surrounded by colons) instead of hex-encoded string.sbr
to br-d
.match
field of the Use-As-Dictionary
response header is now a URLPattern.expires
.id
in the Use-As-Dictionary
response header which is echoed in the Dictionary-ID
request header by the client in future requests.Content-Dictionary
response header with the hash of the dictionary used when compressing a response with a dictionary (must match the Available-Dictionary
from the request).match-dest
was added to the Use-As-Dictionary
response header to allow for matching on fetch destinations (e.g. match-dest="document"
and have the dictionary only be used for document requests).