WICG / webpackage

Web packaging format
Other
1.23k stars 116 forks source link

When to fallback redirect? #397

Open irori opened 5 years ago

irori commented 5 years ago

In the current loading spec, redirect to the fallback URL happens only when the signed exchange version is not supported.

However, Chromium implementation performs fallback redirect in the following cases (corresponding to the error codes here):

Chromium doesn't fallbeck on the following errors:

We should clarify that which error should cause a fallback redirect, and spec them.

kinu commented 5 years ago

/cc @twifkak for consumer side feedback

twifkak commented 5 years ago

Thanks for opening the issue. Sorry for the delay; I've been sick and this slipped my radar.

As a consumer & distributor, my product preference is to make as many things as possible a fallback. That said, I understand there may be reasons not to prefer that, and I would like to know more about them:

From your list above, I think there are some that I would feel safe removing fallback for today:

Essentially these are all the "outer exchange" things that are well-trodden and I feel confident we'll do correctly; also, they require that we generate the correct response rather than that we validate (but don't modify) a given response.

I'm also okay with not falling back for Merkle integrity (this one was simple enough to check) or parse error (fallback seems dangerous).

For a distributor with the privacy-preserving-prefetch use case, the downsides of having a network error page are:

The potential reasons that our cache may serve an invalid SXG include:

At this point, I'm mostly worried about the first two. But I'm mostly guessing. After #374 (and after we implement server-side monitoring of those reports), I'll be able to respond more usefully.

twifkak commented 5 years ago

(I'm presuming that the fallback for SXG version mismatch happens before the network error for MICE decode failure. Otherwise, future Chromium versions will need to continue supporting old MICE versions.)

jyasskin commented 5 years ago

Thank you for collating all the error cases so we can just go through them. My suggestion is:

Situation Behavior Notes
SXG was served from non-secure origin. 🛑Error
Unsupported version of SXG (could extract fallback URL). ↪️Fallback
SXG parse error (could extract fallback URL). 🛑Error
Network error while streaming SXG header or body. 🛑Error
Failed to fetch certificate chain. 🛑Error
Failed to parse certificate chain. 🛑Error
Signature parsing failed (This includes fetching and parsing the certificate) 🛑Error
Signature isn't valid ↪️Fallback This could error instead on the last step, which actually runs the validation algorithm?
Certificate doesn't have the CanSignHttpExchanges OID 🛑Error Maybe we shouldn't make this exception from the next step.
Certificate chain does not have a trusted leaf for the origin for another reason ↪️Fallback This includes cert path building, OCSP checking, and CT checking.
Certificate chain does not establish cross-origin trust for another reason 🛑Error The non-certificate reasons can be checked on the distributor.
SXG was served without X-Content-Type-Options: nosniff header. 🛑Error
SXG parse error (couldn't extract fallback URL). 🛑Error
Merkle integrity error. 🛑Error

I basically want anything the distributor could catch to be an Error, while anything that clients might vary on (mostly time & crypto validity) should fall back. @sleevi, how do you feel about this list?

sleevi commented 5 years ago

I basically want anything the distributor could catch to be an Error, while anything that clients might vary on (mostly time & crypto validity) should fall back.

At first glance, that sounds like a great principle and I think your analysis is reasonable and spot-on. Signature isn’t valid seems a reasonable error?

jyasskin commented 5 years ago

@sleevi https://wicg.github.io/webpackage/loading.html#validating-signature includes the check against the client's clock (which I think should definitely fall back) and the check that it's a secp256r1 key (where I want to have clients fall back when they remove broken crypto, but maybe we should do that with a version increment instead).

We should also make sure that the reporting API distinguishes between the cases that error vs fall back. Right now there are distinctions in the table above that aren't captured in the reporting API results.

sleevi commented 5 years ago

Yeah, the approach for changing crypto primitives was definitely seen as a version-bumpable thing. Perhaps it's worth splitting out "validate a signature" and "validate the timestamp" as two separate things, the former being an error, the latter being a fallback?

twifkak commented 5 years ago

Initial NEL analysis from AMP Cache is available to Googlers in http://b/126787670. Overall, error rate is relatively low, but I'd say not low enough to serve network errors in all cases.

The two most prevalent errors are cert_fetch_error & mi_error. My understanding is that these can be due to a bad connection, which is beyond the control of the distributor (e.g. may be due to ISP issues or spotty cell coverage). I'm thinking that these cases should continue to fall back. Otherwise, spotty connnections would disproportionately affect SXG, because browsers are willing to render unsigned HTML that was truncated or had failed subresources. Are there downsides to falling back in these cases?

sleevi commented 5 years ago

Cert Fetch Errors are more similar to failures to fetch AIA (which causes hardfail) or OCSP (for which behaviour varies between browsers), right?

As to whether it’s safe, the analysis would be:

irori commented 5 years ago

Reg. mi_error:

From the implementation's POV, browser sends SXG's inner response header to the renderer as soon as the SXG header is validated. Merkle integrity error is detected after that, so it's too late to chagne the response to 303 redirect at that pont.

jyasskin commented 5 years ago

cert_fetch_error: My instinct is that a failure to fetch the certificate from the same origin that's already transferring the signed exchange is similarly likely to a failure to fetch the signed exchange itself or the content from the publisher. Since we don't auto-reload on a network error, we shouldn't auto-redirect on one either.

mi_error:

  1. It seems confusing to have a network error that truncates the SXG be expressed as mi_error. I think I'd want to see one of https://w3c.github.io/network-error-logging/#transmission-of-request-and-response-errors instead? @irori/@horo-t Is that plausible?

  2. I hope the browser can render the parts of the content that have already been verified, even if the connection is broken which truncates and invalidates the last chunk.

irori commented 5 years ago

mi_error:

  1. It seems confusing to have a network error that truncates the SXG be expressed as mi_error. I think I'd want to see one of https://w3c.github.io/network-error-logging/#transmission-of-request-and-response-errors instead? @irori/@horo-t Is that plausible?

IIUC, currently both a mi_error and a transmission error are reported. I think it's possible to stop sending mi_error for network error cases, but I'll defer to @horo-t.

  1. I hope the browser can render the parts of the content that have already been verified, even if the connection is broken which truncates and invalidates the last chunk.

Chrome works in this way, although it's not guaranteed that response is loaded to the last validated chunk (internal buffer is discarded upon network error).

horo-t commented 5 years ago

mi_error:

  1. It seems confusing to have a network error that truncates the SXG be expressed as mi_error. I think I'd want to see one of https://w3c.github.io/network-error-logging/#transmission-of-request-and-response-errors instead? @irori/@horo-t Is that plausible?

IIUC, currently both a mi_error and a transmission error are reported. I think it's possible to stop sending mi_error for network error cases, but I'll defer to @horo-t.

I agree. We should not send mi_error when caused by a transmission error. We need to update both the loading spec and Chromium implementation.

twifkak commented 5 years ago

Thanks for the clarifications; looks like I misunderstood what mi_error meant for the user. Re: cert_fetch_error; I'll defer to you all, unless I learn that these errors are disproportionately large and can't be mitigated at implementation side, and I can come up with a good reason they're different from AIA fetch. (The only difference I can think of is that a secure fallback seems possible in this case.)