WICG / webpackage

Web packaging format
Other
1.23k stars 118 forks source link

How are duplicate response headers handled in the same-origin response? #183

Open twifkak opened 6 years ago

twifkak commented 6 years ago

Google App Engine apparently has a small (undocumented) per-header-line size limit [1]. I imagine other such containers might have similar limits. This will be a problem with Signature (esp. if multiple sigs) and Link, unless a signer is allowed split them by newline.

https://wicg.github.io/webpackage/draft-yasskin-http-origin-signed-responses.html#cbor-representation doesn't specify a canonical way to join duplicate response headers; could/should it?

[1] Was 497, per stackoverflow, but may have increased to 1k or so.

davidben commented 6 years ago

This is sort of issue is part of why we advocated changing the style of signature in issue #148. You don't need to worry as much about being able to round-trip what you got to and from the signed representation. Not that it's impossible to solve, but why solve a problem when you can just walk around it? :-)

When what you get is the signed representation, it's more obvious how to verify it and that what you will process is exactly what you verified.

twifkak commented 6 years ago

IIUC, #148 changes the packaged application/signed-exchange format but not the https://wicg.github.io/webpackage/draft-yasskin-http-origin-signed-responses.html#same-origin-response. I think there is value in publishers being able to deliver a same-origin response containing a Signature/Signed-Headers that is ignored by browsers and consumed by crawlers:

The application/signed-exchange is one that not all browsers understand (and likely suboptimal same-origin even for those that do). Therefore, there would need to be a discovery mechanism, by which the crawler knows a signed variant of a resource exists and how to get it. There are approximately two ways to do this:

  1. By URL - the original page specifies via link tag where the htxg lives.
  2. By header - the server Varies its response based on an as-yet-undefined request header

The first is bad for crawlers because it requires they maintain mappings between non-htxg and htxg, and handle all sorts of edge cases there. It's bad for publishers and search users because crawlers respect hostload maximums, and this would effect the rate at which updates could be crawled (and, in turn, search quality).

The second is bad for publishers because varying by request header is something that many frontends don't do well, or need special configuration for. (For instance, I don't know of any that let you specify a custom cache-key for request coalescing, so it would need to be disabled.) In the same-origin model, if a publisher is unable to configure Varying, that means they'll just be signing more responses than they need to. In the application/signed-exchange model, it means they'll be unable to serve signed exchanges.

(In the end, there may be a need for a "by URL" mechanism, but I'd like to minimize the reasons publishers would want to take it, because of its downsides.)

jyasskin commented 6 years ago

On the original question, I'm relying on https://tools.ietf.org/html/draft-ietf-httpbis-header-structure-04#section-3 to allow senders to split each signature into its own header line. A single RSA signature might still go over the limit.

It's possible we'll eventually decide against implementing the header field, but I currently think it's useful for same-origin responses. It's necessary for signature-based SRI, although if we find that @twifkak's use cases for signed-exchanges don't actually need it, we could simplify the SRI header.

twifkak commented 6 years ago

@jyasskin That covers Signature, but not the headers that would be signed over, which are not httpbis structured headers. e.g. What is the signed message given this response?

Signed-Headers: "link"
Link: <foo.js>;rel=preload
Link: <foo.css>;rel=preload

The section I linked to above says "the header field’s value as a byte string". Is this enough to imply that the "MAY" in https://tools.ietf.org/html/rfc7230#section-3.2.2 becomes a "MUST"? I'd suggest this be explicit (if the header field remains).

jyasskin commented 6 years ago

'k, I'll make that more explicit. Yes, in all of the signing situations, I'm expecting any duplicated headers to be joined with commas.