Closed LPardue closed 1 year ago
Also, the server could send an equivalent to an Integrity preference field https://httpwg.org/http-extensions/draft-ietf-httpbis-digest-headers.html#section-4 to signal to the client what hash algorithms it uses, which could help picking the most suitable compatible algorithm
Allowing different algorithms makes sense. I assume the hash algorithm negotiation would happen at the time of setting the dictionary as available so that could also be done cleanly.
I'm worried about allowing multiple values though and impact on Vary: cardinality. If the delta-compressed asset is stored in an edge cache varied on the requesting available-dictionary then things could get out of hand with combinations unless there is some way to signal a specific value of the request header that the response matched for the Vary.
The caching aspects seem like a valid consideration, it might benefit to put that in the explainer to cover why just a single algorithm is currently picked.
Then we work in parallel to figure out if agility can be implemented vs the tradeoffs.
Allowing different algorithms makes sense. I assume the hash algorithm negotiation would happen at the time of setting the dictionary as available so that could also be done cleanly.
+1 to that.
Is there a specific reason why algorithm agility is not built in to the protocol? In simple terms, the ability to migrate to other algorithms as the security environment evolves.
I thought it's not strictly necessary, as we're not relying on the hash for cryptographic purposes, so aren't really concerned with collisions. With that said, the cost of negotiating a protocol seem low enough..
I'm worried about allowing multiple values though and impact on Vary: cardinality. If the delta-compressed asset is stored in an edge cache varied on the requesting available-dictionary then things could get out of hand with combinations unless there is some way to signal a specific value of the request header that the response matched for the Vary.
Yeah, I won't be supportive of multiple values. It would add a lot of complexity, for no apparent reason.
Just to double check my understanding and to clarify thngs. When I said multiple values, I meant multiple different hashes of the same content that use strictly different algorithms. I didn't mean sending hashes of different content using the same algorithm (HTTP digests doesn't permit that by virtue of the SF dictionary type).
SRI does allow both of these modes but I dont see a strong reason for the latter.
And while I mention SRI, there's potentially some things we could borrow. Note that it prohibits MD5 or other weak algos, while requiring user agents to support sha-256, sha-384, and sha-512
I think this is fundementally different from SRI.
With SRI, we are trying to cryptographically protect a resource, so cryptographic strength matters, and collisions put users at risk. (attackers can switch files on them and replace an innocous payload with a malicious one)
Here, collisions are significantly less likely (the collision space is dictionaries on that particular origin/scope), and if they happen, they result would most likely be a corrupted resource, rather than a malicious one. More importantly, no one would be incentivized to find and "exploit" such collisions. (if you pwned the delivery server, there are easier ways to DoS the vistim site)
So I don't think we'd have a constant need of upgrading the hash strength as new ways to manufacture collisions arise.
Thanks for the explanation. I tend to agree. It might be good to capture some of this threat modellong in the doc so others can assess.
The hypothetical threat I was thinking is where a collision occurs and can somehow manipulate the outcome of the decompression by affecting what was retrieved. But yeah, the origin scoping probably makes this fine because at the point a server can be manipulated like that, there's even more trivial attacks.
The explainer describes that the client/and server generate SHA-256 hashes and then use those to coordinate. Is there a specific reason why algorithm agility is not built in to the protocol? In simple terms, the ability to migrate to other algorithms as the security environment evolves.
The more I look at this aspect, the more it gets me thinking about whether the design has some overlap with the HTTP digests specification https://httpwg.org/http-extensions/draft-ietf-httpbis-digest-headers.html
The explainer hints at wanting to constrain the size of the
sec-bikeshed-available-dictionary
field value viabut I wonder how much this really matters in practice.
If we adopted a similar approach that digests use, you could make
sec-bikeshed-available-dictionary
be a Structured Fields dictionary that can convey 1 or more hash values alongside their indicated algorithm e.g.Even if you restrict to only adding one hash, you can still benefit from agility via sending the algorithm