All 2xx Responses Succeed, 304 fails

MildMax commented 2 months ago

We are seeing an issue with the Speculation API in Chrome where all 2xx responses are considered successful in the Speculation window, but 304s are considered failures. We are also seeing a message in the Speculation window for 304 responses on prefetched resources that show this is likely intentional:

Failure reason: The prefetch failed because of a non-2xx HTTP response status code.

Also, when we return a 2xx response that is not a 200, like a 204 with the an empty response body, for instance, the browser will attempt to load the resource from the prefetch cache, and in the instance of the 204, will attempt to render from the prefetch cache causing the browser to remain on the same page. For something like an HTTP response with a 206 status code and a partial body, the browser will attempt to render whatever is in the response body from the prefetch cache on navigation to the resource.

The problem with this behavior is that the documentation states that only 200 and 304 are considered successful responses, and that any other status code will not be prerendered and any prefetch page will be discarded:

Could we get some clarity on why the browser is attempting to render all 2xx responses and failing 304s seemingly contrary to the documentation?

SulemanAhmadd commented 2 months ago

https://github.com/WICG/nav-speculation/issues/138 mentions using 204 as error code for disabling prefetching/prerendering but not seeing that from Chrome at the moment.

jeremyroman commented 2 months ago

@tunetheweb fyi

Apologies for the confusion here. The spec and Chromium implementation have both been consistent in accepting only "ok statuses" (200-299) as of https://github.com/WICG/nav-speculation/pull/144, aside from following redirects.

Ok statuses are generally understood by Fetch to indicate a successful final response (e.g., in the CORS algorithm), though of course 200 is by far the most common, which is probably how it ended up explicitly in the documentation you linked. 204 and 205 additionally fail for prerendering because they do not enclose a response body which could be prerendered, but generating a 204 when one isn't semantically correct still isn't what I would recommmend.

Any request with an error status (400-599) will not be stored in the prefetch cache (because it's more likely that conditions surrounding the prefetch will have changed by the time of the navigation, and possibly avoiding an error if it was a result of the prefetch is likely to be a better user outcome than being shown an error quickly).

304 Not Modified is interesting. We don't currently handle it (and I don't remember how it ended up mentioned there) so it should end up being rejected as well. I believe this should be safe (insofar as the request will be retried), but of course means that revalidation will likely be necessary when the user navigates, losing some of the speed benefit of prefetch (and oddly making it slower than if it weren't in cache because a conditional request would have succeeded).

There might be scope to improve that if you're having issues with 304 responses in practice (likely by making the prefetch algorithm explicitly aware of conditional requests). At the moment, our telemetry shows 304 responses as extremely rare on prefetch requests, and occurring only about 2% of the time on requests generally).

It should be possible to temporarily work around that by sending a full response even to a conditional request, if the request has Sec-Purpose: prefetch (or the similar prerender header), e.g. by internally stripping If-None-Match, If-Modified, etc. This would, though, mean sending data the client could have served from the HTTP cache, which I recognize isn't optimal.

Happy to go and fix any documentation which is misleading or outdated, if that all makes sense to you.

SulemanAhmadd commented 2 months ago

Thanks @jeremyroman, this is very helpful.

So when approaching this problem from the server-side, It seems that using 304 status code will reject the prefetch but it can cause clients to incur the unnecessary cache revalidation cost upon navigation. Do you have any recommendation on what HTTP response status code should be used for speculation rejection? (based on our experience, customers complained about seeing 404 status code on browser console error logs due to rejected speculation rule requests).

Also, it might be worth highlighting in the documentation (and the specification) that 204 and 205 case only rejects prerendering request, and not prefetches. Otherwise, all 200-299 are accepted for prefetching and prerendering.

jeremyroman commented 2 months ago

Do you have any recommendation on what HTTP response status code should be used for speculation rejection? (based on our experience, customers complained about seeing 404 status code on browser console error logs due to rejected speculation rule requests).

Yeah, it's certainly awkward that nothing is a precisely good fit (in #138 we talked about creating a "309 Not Ready") or similar status, but didn't end up doing it because it wouldn't behave much differently from error statuses.

Aside from avoiding the ones that have a specific meaning in HTTP, like 416 Range Not Satisfiable and whatnot, I don't have a strong technical reason to prefer a particular status code. 503 Service Unavailable strikes me as one of the closest to being semantically meaningful aside from 404 Not Found, depending on the underlying reason why the request is being rejected.

I'm not sure what the best way to mitigate customer complaints is; you probably know your customers better than I do. Maybe particular response codes are less alarming, but some other ideas (just brainstorming):

a response header explaining why the request was rejected might deal with confusion, if customers typically look at such things (Cf-Prefetch-Refused: [human-readable reason])
perhaps the developer tools frontend could in some way make errors in response to prefetch and preload requests look less visually scary (by changing the icons shown or similar)

Also, it might be worth highlighting in the documentation (and the specification) that 204 and 205 case only rejects prerendering request, and not prefetches. Otherwise, all 200-299 are accepted for prefetching and prerendering.

I think the spec is technically correct here (insofar as those are only mentioned in the prerendering spec), but perhaps adding a non-normative note to the in this section of the prefetch spec would be helpful?

We'll certainly try to clarify the developer.chrome.com and MDN docs though I'm not certain exactly how to describe it concisely enough.

tunetheweb commented 2 months ago

Chrome docs corrected: https://github.com/mdn/content/pull/35624

And I've raised this for MDN: https://github.com/mdn/content/pull/35624

SulemanAhmadd commented 2 months ago

Thanks, both. Those changes look good!

Also, we are planning to test out both 421 and 503 for rejecting prefetching. Based on our experience, we can follow up back on this thread on what worked for our deployment. Though it would be good to later standardize on a single HTTP status code for uniformity across deployment. We can close out the issue.

jeremyroman commented 2 months ago

I'd watch out for the possibility of software misconstruing 421 Misdirected Request, at least based on the RFC 9110 wording that suggests clients retry and proxies never emit it.

I believe Chromium's network stack will see that as a signal to retry a request over a new connection (without alt services like QUIC and IP pooling), which probably isn't desirable here.

If we do find that existing response codes really do all not do what we want, we can talk to HTTPWG about a new one; it's just not trivial.

SulemanAhmadd commented 2 months ago

Good catch! We will update on 503 in that case and report back. But, yes, having a dedicated response code would be helpful. As you clarified, having the suggestion to use any status code apart from 200-299 to discard a speculation request can have side-effects that may not be known to many.

aseure commented 2 months ago

As discussed before, we started making use of 503 response codes for speculation requests which we rejected. This is less confusing via the DevTools and requests are indeed correctly marked as failed. We are now discussing the discussing the possibility to report the reason of the rejection. All options are still on the table at the moment, including yours @jeremyroman using a custom response header. We are also considering populating the response body or passing an extra cookie in the response, however, since there is no direct benefit compared to response headers, and because response headers are a bit more accessible from the DevTools, we are currently leaning towards this option.

Before implementing anything custom, we wanted to double-check with you to know if you were considering any option to report the cause of rejection that would end up in the standard?

jeremyroman commented 2 months ago

I'm not aware of concrete plans to specify anything more specific. I'm willing to consider something, but mainly if it would add value over a custom solution (like, if user agents need to do something differently, or because there's some other software that would process info from different vendors uniformly). At first blush I don't think what you're likely to do would preclude also adding a common solution in the future (because multiple response headers can always be sent), but if you do anticipate that it would also be good to know. Absent that, a custom solution probably gives you the most flexibility and agility right now.

domenic commented 2 months ago

Hey folks, I wanted to join the conversation and ask for us to step back a bit, so that we can get an idea of where to best design the solution.

Am I understanding the problem statement correctly that this is about customers looking at DevTools and not understanding what's going on?

If so, what would the ideal DevTools display look like to you?

My guess based on some of the above is (a) some clear signal that this is about the server deciding not to do a speculative load, and not a different sort of error; (b) maybe some additional information, delivered somewhere, that could point to server-specific docs or configuration points.

If that's the case, then we'd welcome thoughts on concrete ideas for and places in DevTools for (a) and (b) to be signaled. If there's more beyond that which would form your ideal developer experience, then let us know that too!

aseure commented 1 month ago

@domenic The DevTools panel is already already able to let users inspect individual speculation requests via the Application tab and click on each individual requests:

IMHO what I would like to have is a Rejection cause under the Detailed information in case of failures, which are currently reported in red in table when they exist. At Cloudflare, we agreed to start with a custom response header to report the cause, but we would be willing to align with whatever solution is preferred on the long term. We picked this solution over response cookie or body in our 503 because we thought that response headers were more discoverable in the current UX of the DevTools.

tunetheweb commented 1 month ago

Just to be clear it DOES give a Failure reason currently:

But it is not possible for servers to populate this with additional information.

So currently this is more useful for other reasons, such as when the user disables this:

What you're asking for is for DevTools to be able to pick up a reason server-side, and enhancing the non-2XX response in DevTools to also show that. Is that correct?

MildMax commented 1 month ago

Yes, we want a formal method that allows the server to provide a reason for the speculation prefetch/prerender failure beyond what the browser can determine from an HTTP status code. Enhancing the non-2xx response in DevTools would be a good way to accomplish this for us.

If possible, would we be able to do something like populating the response body of a 503 (or other error response code) with some error message corresponding to a custom tag like speculation_err? According to the IETF RFC-7231, 4xx and 5xx responses should display any included representation of the error to the user. RFC 2616 also states that any entity returned with the response should be presented to the user since it is likely to contain human-readable information that explains the unusual status. Entity Bodies in section 7 of the same document shows that the body may be present in the message body. It seems like including a response body with an error message still follows the protocols laid out in the RFCs for HTTP error responses, but does this seem like a solution that is possible to implement for speculation at the browser?

domenic commented 1 month ago

Got it, thanks for clarifying! I also want to check: do you think red is an appropriate color still, or would something else make sense? Similarly, we can change strings like "Failure" or "Speculative load failed."

Looking a bit more toward the solution space, I think basically the server will communicate two things to us:

Is this a "failure" or a "rejection"? (By default, assume a failure.)
Some additional information for display in DevTools, at least in the rejection case. (Although we could also display it in the failure case.)

You can imagine multiple ways of signaling this. For example:

A new custom status code, or maybe response MIME type, with the entire body shown in DevTools as the additional information.
An existing status code, with the entire body shown in DevTools as additional information.
An existing status code, with a structured data like JSON.

Adding a new status code is a lot of work, and I don't think it's necessary: either 422 or 503 seem appropriate. (Probably 503 is best.)

Here is one idea: we could use 503 + Problem Details for HTTP. This gives us clear guidance on MIME types (application/problem+json), format (title + detail), and extensibility. /cc @mnot

The only downside of that is that the HTTP problems spec suggests the problem ID be a resolvable URL, and I don't feel like we have a great namespace for that right now. WICG is supposed to be a temporary home for specifications before they graduate to a full standards body, so using something like https://wicg.github.io/nav-speculation/speculation-rules.html#server-rejection-problem-type would feel a bit sad. I guess we could do something like about:speculation-rules-server-rejection, and just lose a bit of the self-documenting benefit that would come from using a resolvable URL.

MildMax commented 1 month ago

Hmm, so the status code + Problem Details sounds like a viable solution for the problem, though we probably want to spend a little time talking it out about what this might look like from our side, but would it be possible to use both 503 and 422 with problem details for HTTP, or would it be a great deal of work to handle problem details for both? Like, would it be extensible such that we could try several different error codes in the future, but by nature of being a failed speculation request (either prefetch or prerender) with the application/problem+json content type in the response header fields, still report the nature of the failure independent of the status code?

Also, regarding the resolvable URL, is this generally intended to be a static URL, or is this something that the server could potentially provide? That way finding the namespace to support the failure type is primarily on the server and does not rely on Chrome to provide this information. Maybe if the field is missing, there could be some default page that explains speculation failures in general, but would ideally opt for a server-provided URI to explain the failure. The RFC states that the URI provided should correlate to the document's URI which supports this kind of pattern: https://www.rfc-editor.org/rfc/rfc9457.html#section-3.1.1-4

domenic commented 1 month ago

Personally, I think making it status code agnostic is reasonable. But https://www.rfc-editor.org/rfc/rfc9457.html#section-4-7 seems to imply a problem detail format should only match with one status code, so I'm not sure it would be the best practice. Is there anything specifically bad about 503 that makes you want to experiment with different options?

Also, regarding the resolvable URL, is this generally intended to be a static URL, or is this something that the server could potentially provide?

The problem details object contains two URL-valued fields:

type, which tells you what type of problem this is. It doesn't have to be a resolvable URL; it's just used as a unique identifier. We want this to be the same for all of these server-rejects-speculation-rules problems. We're just trying to figure out whether something like about:seculation-rules-rejection or https://wicg.github.io/nav-speculation/speculation-rules.html#server-rejection-problem-type would be better. We would write this down in the speculation rules spec and everyone would know about it.

If it's resolvable, I'm not sure we'd want to link to it in DevTools, since it'll be a generic cross-browser URL. Chrome DevTools would probably want to link to Chrome-maintained documentation with more details.
instance, which contains information about this specific instance of the problem. This is where you could put a URL provided by the server.

We could possibly surface this URL in the DevTools interface as well. I'd have to double-check with the security folks, as there might be some concern about showing server-provided URLs in semi-trusted UI like DevTools? But DevTools already shows tons of server-provided URLs, e.g. in the network panel, so my guess is that this won't be a big issue. They might just want to control the wording, e.g. instead of "Click here to learn more", they'd prefer something like "The server provided the following additional information: https://..."

jeremyroman commented 1 month ago

According to the IETF RFC-7231, 4xx and 5xx responses should display any included representation of the error to the user.

I believe that applies for typical cases where a successful response would have been displayed to a user. For example, when an XHR/fetch, stylesheet, or other subresource produces an error, its representation is not automatically shown.

instance, which contains information about this specific instance of the problem. This is where you could put a URL provided by the server.

It seems to me that instance is meant to be "here is a unique identifier, like a crash ID, that you can use to report/debug this problem", rather than a vendor-provided description (though it might include one).

I assume if we did use problem details, we'd either:

let a vendor-specific one be used, and display/link the URL if it's resolvable, or
define a "not ready for preloading" type which has an extension field that links to a vendor-specific reason

though in either case it would purely reflect in developer tooling and not affect web-facing behavior, I think.

domenic commented 1 month ago

In Matrix, annevk (not tagging) brought up that the two existing problem type registrations in https://www.iana.org/assignments/http-problem-types/http-problem-types.xhtml just refer to that document itself. We could follow that pattern to sidestep the WICG issue while still getting a resolvable URL. The resolvable URL could then point to WICG, or to any future destination, by using the Reference column of the table.

aseure commented 1 month ago

Just shared some updates since we started rolling out Speculation Rules for all Free plans at Cloudflare since last week. The main complaint some users have at the moment is the fact that 503 responses we currently send (to report ineligible prefetch requests) are somewhat scary (i.e. they see red / extra "failed" requests from the Network tab of DevTools). Last time we checked, it was the case for all 4xx/5xx in the Network tab.

Currently, the only way to have prefetch requests being rejected and having non-red/failed requests in the Network tab would be to use the 3xx. However, our team already discussed and withdrew some of those errors in the past because we were about to use those 3xx in non-standard ways, or this were triggering some infinite redirect loops.

So besides what we want to change in the Speculation network tab, and how we want to report the reason for failure, could we also discuss the possibility of using 2xx status code to report failures, or even register a new 1xx informal status code (or repurpose 103 Early Hints maybe?) so that end users do not see failed requests in their Network tab, but rejected requests instead?

domenic commented 1 month ago

We can't change 2xx or 1xx to mean something they don't, but we can talk about modifying the network tab to be less scary if there's specific response body formats for 503, of the type we discussed above!

mnot commented 1 month ago

@domenic well said. Just to remind folks, these responses will be seen by other generic HTTP implementations (think: corporate firewalls with caches, virus scanners, CDNs, etc.), and will handle them according to the semantics of the status codes used. 5xx is correct because it tells them that the response might be different if the same request is repeated.

SulemanAhmadd commented 1 month ago

do you think red is an appropriate color still, or would something else make sense? Similarly, we can change strings like "Failure" or "Speculative load failed."

We currently have a broad rule for speculation rules applications, which causes a lot of 503 to show up, causing concerns of "failures" which in reality are just prefetch rejections due to safety guardrails.

While we work towards coming up with a standardized approach for speculation rejection semantics, an immediate step we can take is to change how prefetch rejection or failures are displayed in the DevTools. Instead of showing them as red, alarming 503 failures, we could use a 'warning' icon. For example, when prefetch or prerender requests result in a non-2XX status, we can show a caution symbol in the Network tab instead of a red failure cross. This would help reduce confusion for developers at least that are misinterpreting speculation rejections as hard failures (and we have been getting such complaints).

WICG / nav-speculation

All 2xx Responses Succeed, 304 fails #330