Allow identification of which speculation rules triggered a speculation

domenic commented 2 weeks ago

A strategy we're seeing deployed more frequently recently is for platforms to add very broad speculation rules, and then use server responses to avoid speculation in cases that the platform is unsure is safe.

However, in some cases the website knows more than the platform it is running on, and is able to guarantee that speculations are safe. In this case the platform's server responses can interfere with the website's own speculation rules.

A possible solution to this would be to add an identifier to the speculation rules, which is sent along with any speculative load HTTP request. The platform can then only reject speculations which come from the platform's speculation rules, but let through any that come from the website.

As one possible API, we could add a top-level key "tag": "any-string" to the speculation rules, and include this information in the HTTP request, e.g. as Sec-Speculation-Rules-Tags: "any-string".

(I thought at first it would be more natural to include this with Sec-Purpose, e.g. as Sec-Purpose: prefetch;tag="any-string". But structured headers don't do nested lists very well.)

Spelling out the whole scenario in that case, we would have something like:

Platform speculation rules

{
  "tag": "awesome-platform",
  "prefetch": [
    {
      "eagerness": "conservative",
      "source": "document",
      "where": { "href_matches": "/*", "relative_to": "document" }
    }
  ]
}

Site speculation rules

(The site doesn't add a tag)

{
  "prefetch": [
    {
      "eagerness": "moderate",
      "source": "document",
      "where": { "href_matches": "/*", "relative_to": "document" }
    }
  ]
}

Flow before this proposal

The user hovers their mouse over a link to /somewhere. This sends the server

Sec-Purpose: prefetch

The platform running the server doesn't know whether /somewhere is safe to prefetch, so it responds with an HTTP error code, e.g. a 503. No speculative loading happens. Sad.

Flow after this proposal

The user hovers their mouse over a link to /somewhere. This sends the server

Sec-Purpose: prefetch
Sec-Speculation-Rules-Tags: null

The platform running the server doesn't know whether /somewhere is safe to prefetch. But it notices that in the Sec-Speculation-Rules-Tags header, a value that is not "awesome-platform" is present: in other words, this speculation was initiated by something besides Awesome Platform's speculation rules feature. So, it lets the speculation through. Yay!

tunetheweb commented 2 weeks ago

See also #298 for other use cases for this.

aseure commented 2 weeks ago

We would indeed be interested in such feature. We currently have to rely on detecting Sec-Purpose: prefetch header as well as many other checks to confirm that the prefetch request should be rejected/503. If we had such a tag in the Speculation Rules our platform is injecting, we could simply reduce those checks with a second header check.

SulemanAhmadd commented 2 weeks ago

I would like to support this use-case from Cloudflare side (as mentioned by @aseure). We want to allow customers to override our rules if they believe the speculative request (either prefetch or prerender) is safe and should reach their origin server. This approach helps ensure we respect their preferences on per-page basis. We should consider a solution that can differentiate speculative requests for both in-line and in header-based speculation rules. The suggested additional tag addresses this need, but we must ensure that it doesn’t contribute to client fingerprinting due to reflecting the arbitrary string value from the client (especially for cross-origin requests).

jeremyroman commented 2 weeks ago

My thinking was very similar to this, though I think this should be a property of the individual rule, as this allows you to distinguish rules which are differently eager or differently aggressive, even if they're in the same ruleset.

{
  "prefetch": [
    {
      "eagerness": "conservative",
      "source": "document",
      "where": { "href_matches": "/*", "relative_to": "document" },
      "tag": "awesome-platform"
    }
  ]
}

There's also an interesting question about the case where a speculation is possible due to multiple different rules (or rule sets) with different tags (e.g., a CDN and the site both have rules that permit a speculation). One option would be to look for all of the candidates' tags that are possible and send all of them (including a placeholder for no tag); another would be to establish some kind of priority system among them.

I don't follow what you mean by "[not doing] nested lists very well". I don't feel strongly but Sec-Purpose: prefetch; speculation-rules-tags=("awesome-platform") doesn't seem bonkers. Fine with a separate header field, too, though.

Would these tags only be sent same-origin (or same-site), or would we send them to all origins (since speculations can, in general, be to any origin). I suspect we'd lean toward not sending them cross-site to reduce the possibility of its use as a tracking vector, even though URL decoration exists. I'm sure that's unfortunate for some uses, but it seems fine for the immediate ones.

domenic commented 2 weeks ago

Great to hear that there's interest in this!!

The suggested additional tag addresses this need, but we must ensure that it doesn’t contribute to client fingerprinting due to reflecting the arbitrary string value from the client (especially for cross-origin requests).

Could you say more about the threat model here? Is it along the lines of what @jeremyroman mentioned, in that it doesn't contribute to fingerprinting in itself, but is a possible additional cross-site communications channel which could be used to pass along fingerprint information gathered elsewhere?

My thinking was very similar to this, though I think this should be a property of the individual rule, as this allows you to distinguish rules which are differently eager or differently aggressive, even if they're in the same ruleset.

Good point! (Although maybe we could cascade from the top level for convenience?)

One option would be to look for all of the candidates' tags that are possible and send all of them (including a placeholder for no tag);

That was my original thinking.

I don't follow what you mean by "[not doing] nested lists very well". I don't feel strongly but Sec-Purpose: prefetch; speculation-rules-tags=("awesome-platform") doesn't seem bonkers.

I believe parameter values must be "bare items", i.e., integers, decimals, strings, tokens, binary, or booleans. They cannot be inner lists.

SulemanAhmadd commented 1 week ago

in that it doesn't contribute to fingerprinting in itself, but is a possible additional cross-site communications channel which could be used to pass along fingerprint information gathered elsewhere?

Exactly, yes. Since the server is allowed to specify any arbitrary value in the tag, reflecting that value in cross-site requests can enable a tracking vector. Imagine an adversary controlling A.example and B.example. Assume, when the client connects to A.example, the returned speculation ruleset has a tag based on a client fingerprint. As the client is interacting with the page, the prefetch/prerender requests for B.example can be generated using pre-embedded links on the page which will contain the same fingerprint tag in the sec-purpose header for speculative requests landing on B.example. This can help the adversary confirm that the speculative requests landing on B.example is from the same visitor.

WICG / nav-speculation

Allow identification of which speculation rules triggered a speculation #336