A "perfect match" scenario

yoavweiss commented 5 months ago

Hey!

Imagine an Edge based deployment of compression dictionaries, where the resources themselves are in a cloud-based storage. Every time the CI runs, it adds a new resource to the pile, and calculates the diffs between it and N previous versions of that same resource. All of these diffs are stored in the same bucket in the cloud.

Now, whenever a resource is served, it uses a use-as-dictionary value that matches the various resource versions. What happens when that same resource gets reloaded?

Its matches value definitely matches itself, so it's getting a request with a SHA-256 signature in its sec-available-dictionary with its own signature. That kind of 0 sized diff does not exist in the cloud storage, because the CI didn't create diffs from the resource to itself. That means the request either fails, or is retried without the dictionary. (adding delay)

What's the right way to tackle such a scenario?

One option would be to provide some signal on the request that the SHA in sec-available-dictionary is of an exact match of the URL. That would enable the edge to do something smarter about this than to fail and retry.
Another option would be for such deployments to store a "diff" from the file to itself, and unify these flows without retries. At the same time, it feels odd to add such diffs.

I'd love thoughts on the right thing here for the protocol (and developer advice that will be derived from it).

pmeenan commented 5 months ago

I wouldn't ever expect it to fail. Hopefully the edge would fall back to the uncompressed version (like it would with .br or .gz versions of a file) and potentially edge-cache that version based on the vary so it only has to do it once.

That said, the latest IETF draft has an id field that is echoed in Dictionary-Id which would let you store the URL of the original dictionary (or whatever identifier you'd like) and at rewrite time you could detect they were the same.

You also have the no-cache request header in the request flow that you could potentially use as a signal to not rewrite the URL, but that has the potential to put the full resource in the edge cache for whatever dictionary happened to be requested (presumably it would only be the same file but it's an edge case to be careful of).

The current origin trial for Chrome ends in 122 and we're hoping to have a new OT with the spec changes ready for 123 (at which point I'll update the explainer, just didn't want to confuse people relative to the current Chrome OT).

yoavweiss commented 5 months ago

That said, the latest IETF draft has an id field that is echoed in Dictionary-Id which would let you store the URL of the original dictionary (or whatever identifier you'd like) and at rewrite time you could detect they were the same.

Ooh,id definitely solves this issue. I missed it somehow.

WICG / compression-dictionary-transport

A "perfect match" scenario #53