WICG / pending-beacon

A better beaconing API
Other
43 stars 8 forks source link

Request Limit #87

Open mingyc opened 10 months ago

mingyc commented 10 months ago

Open questions from last meeting:

Some concerns:

  1. Total limit should not be easily consumed/abused by a single origin
  2. Origin-specific info should not be leaked via quota consumption
  3. A single request can easily take up 50k (e.g. Boomerang)

Relevant Discussions: https://github.com/w3c/beacon/issues/38#issuecomment-1861006131

@noamr @annevk @nicjansma @yoavweiss

fergald commented 10 months ago

How about taking time into consideration? I'm not going to try to spec precisely here but at the high level:

The gaols I'm trying to reach with this are:

noamr commented 10 months ago

How about taking time into consideration? I'm not going to try to spec precisely here but at the high level:

  • a page that has been open for just a few seconds should have a small quota
  • a page that has been open for a longer time should have a larger quota
  • iframe quotas could grow more slowly than top-level frames
  • a permission-policy could "upgrade" an iframe to use the top-level quota

The gaols I'm trying to reach with this are:

  • recognize that if we don't make this usable, pages will just send eagerly with fetch() and/or use keep-alive
  • enabled by default for iframes to at least some extent
  • ability for the top-level frame to "promote" iframes to same status

Maybe start with shipping something simpler and extend later?

mingyc commented 7 months ago

Pinging @yoavweiss @noamr

As the OT is started, we might want to continue this discussion.

yoavweiss commented 7 months ago

I'd love to hear @annevk's opinion on the options in the OP.

Let me try to clarify (for my own sake) the threat model this quota is protecting against. Both top-level frames and iframes can use up network resources without any limitations while they are alive. IIRC, the claim was that users can notice this (unclear how - maybe by seeing that they are network constrained elsewhere) and close these tabs to stop that use, while they won't be able to do the same with fetchLater calls.

Hence, we're trying to protect against network abuse by any individual party on a particular site.

If our quotas were to span across frames, they are likely to end up leaking cross-origin state or enable interference with cross-origin activity, and may cause more harm than their presumed protection.

But if our quotas don't span across frames, abusers can easily circumvent them by creating more frames that send more data.

Given the above, my preference would be:

annevk commented 7 months ago

I haven't really changed my thinking on this. We'd give a top-level origin/site a quota and it can share that quota using Permissions Policy. That still seems like the most robust approach. That also puts the blame squarely with the top-level origin/site, which is generally what you want if you were to surface any of this to end users.

noamr commented 7 months ago

I haven't really changed my thinking on this. We'd give a top-level origin/site a quota and it can share that quota using Permissions Policy. That still seems like the most robust approach. That also puts the blame squarely with the top-level origin/site, which is generally what you want if you were to surface any of this to end users.

I also don't see a way around this. It does mean that fetchLater would only work reliably for cross-origin iframes if permitted by permissions-policy.

mingyc commented 7 months ago

I am going to add Permissions Policy into OT implementation, but houw about its default? I think from our side we still want to minimize the friction to make the API available to 3P frames, and making it to 'self' means that the API won't be available in most of the existing sites at all.

annevk commented 7 months ago

That is the correct default though. Otherwise you make the top-level site responsible for something it didn't even know.

noamr commented 7 months ago

That is the correct default though. Otherwise you make the top-level site responsible for something it didn't even know.

I tend to agree with @annevk on this, though perhaps this is not a spec concern. I can't find any place in the spec that defines default values for these policies but perhaps I'm missing something.

I can see how hypothetically a UA can e.g. decide to enable this by default but disable it when there are signs of abuse.

mingyc commented 7 months ago

We'd give a top-level origin/site a quota and it can share that quota using Permissions Policy.

Combing the comment from Yoav

64KB limit per reporting origin. 10 reporting origins per frame

Does that mean a top-level frame has 640KB quota by default, and it allows at most 10 reporting origins within this frame? What happens when 11th origin makes a fetchLater? Does all iframes share the same 640KB or has their own independent quota?

And a quota for origin A in top-level frame is not shared for origin A in other top-level frame right?

noamr commented 7 months ago

We'd give a top-level origin/site a quota and it can share that quota using Permissions Policy.

Combing the comment from Yoav

64KB limit per reporting origin. 10 reporting origins per frame

Does that mean a top-level frame has 640KB quota by default, and it allows at most 10 reporting origins within this frame? What happens when 11th origin makes a fetchLater? Does all iframes share the same 640KB or has their own independent quota? And a quota for origin A in top-level frame is not shared for origin A in other top-level frame right?

There is no limit to the number of reporting-sink origins. The proposal is:

yoavweiss commented 7 months ago

Each reporting-sink origin nested within that top-level document has a 64k quota. you can have more than 10 of those, but if you reach the total of 640k you're out of luck.

This seems fine as long as we think that there won't be more than 10 reporting sink in the "typical" case. If that's the case, this policy would be fine, and won't result in reporters "ensuring their quota" by registering and using it early.

@nicjansma - WDYT?

noamr commented 7 months ago

OK, I'm working on the PR, the main thing I need fleshed-out is how to deal with workers, since they:

I see three options here:

thoughts?

mingyc commented 7 months ago

We are fine with supporting only documents for now. There are infra limitation in Chromium such that fetchLater can't be supported in workers.

mingyc commented 6 months ago

Filed https://github.com/w3c/webappsec-permissions-policy/issues/544

mingyc commented 6 months ago

@noamr @yoavweiss

  • The top-level document has 640k quota for deferred fetches
  • Each reporting-sink origin nested within that top-level document has a 64k quota. you can have more than 10 of those, but if you reach the total of 640k you're out of luck.

Although this is implementation-specific details, I just realized this requires a mechanism to enable cross-process iframes to update each other the size of their pending requests synchrounously (given the example), which is difficult to achieve and may not pass security review.

The above is neccessary, as the steps introduced by https://github.com/whatwg/fetch/pull/1647/commits/8d237d00f7f984e37e823d9d937c7f538fa01290 assumes that child cross-site iframes can obtain all body sizes of pending fetchLater requests from their parent documents in real time when JS fetchLater() is called (such that QuotaExceededError can be thrown). However, traversing cross-site ancestors in renderer process won't provide anything useful

Is there any alternative approach for it?

arturjanc commented 6 months ago

I might be misunderstanding something, but I worry that a design based on a shared limit for a top-level document and its descendants outlined above in https://github.com/WICG/pending-beacon/issues/87#issuecomment-1985358609 would result in an XS-Leak.

The top-level document could simply delegate the deferred-fetch permission to a cross-origin iframe and would be able to get visibility into that frame's fetchLater use based on sharing the 640KB quota for the top-level document. In this case, the Permissions Policy requirement only protects the top-level document from its iframes, but doesn't protect the iframe from revealing information to its parent/ancestors (and sibling iframes). I imagine we could fix this with a separate opt-in switch for the iframe, but that also seems a bit like a footgun - it would be hard for developers to understand that opting into using fetchLater when embedded in an iframe can reveal information to the parent.

annevk commented 6 months ago

The opt-in is using fetchLater() as by default that would throw. This is generally how Permissions Policy works, no? To avoid the need for the nested document to adopt some postMessage() protocol that would reveal the same information.

arturjanc commented 6 months ago

The opt-in is using fetchLater() as by default that would throw.

  1. An application might use fetchLater in a top-level context, which will be fine - it won't throw and won't leak information if the document doesn't delegate permission to its iframes. But in the proposed design another site will by default be able to iframe that document and reveal information about its use of the API, making the design inherently leaky.
  2. Providers of embedded content might want to use fetchLater but not reveal information about this to the embedding page. Saying "sorry, this API always reveals information about its use to the parent" is a problematic pattern - especially given that fetchLater isn't explicitly meant to communicate with the parent (unlike e.g. postMessage where the document intends to send a message to another window).

This is generally how Permissions Policy works, no?

I don't think so :) Permissions Policy generally shouldn't give the embedder information about the use of the API gated behind the permission by the embeddee - you can delegate the permission, but you don't know if it's actually exercised. In the design we're discussing for fetchLater the problem is the per-top-level-document limit which serves as a side channel to gain information about the use of the API (which IIUC is special here and we don't have it for other permissions).

annevk commented 6 months ago

I guess that's fair. If we can solve for that as well that would be better. It's unfortunate we still haven't made more progress on "X-Frame-Options by default".

noamr commented 6 months ago

Perhaps something along the lines of specifying the top-level origin in the options dictionary? fetchLater(url, {activateAfter: 3000, quotaOrigin: "https://toplevel.example.com"})

arturjanc commented 6 months ago

Perhaps something along the lines of specifying the top-level origin in the options dictionary?

The two problems I see with this are that: this doesn't protect you from sibling frames (so if A embeds and delegates permission to both B and C, either one exposes information to the other frame even though it only trusted A when calling the API), and that it's not obvious that the quotaOrigin origin can learn information about an embeddee's calls to fetchLater (and there's no way to prevent the embedder from doing so if an a document wants to be embeddable).

I guess that's fair. If we can solve for that as well that would be better. It's unfortunate we still haven't made more progress on "X-Frame-Options by default".

I agree, but this is a constraint we need to deal with for all web APIs. I have some hope that we can get similar benefits from third-party cookie restrictions (i.e. you will be able to embed cross-site content, but it won't be credentialed by default, so won't leak authenticated data), with the caveat that that's site-scoped and subject to relaxations due to browser exceptions, the Storage Access API, etc.

Practically, I don't think it makes sense to block this API until embedding on the web is either opt-in or safe so we should figure out what the best approach here is.

Looking at @yoavweiss's https://github.com/WICG/pending-beacon/issues/87#issuecomment-1955964723, I'm fairly unconvinced that a per-top-level window limit provides significant protections against abuse. My guess is that targeted interventions by browsers against certain sites/origins if this becomes a problem are probably preferable here, especially if the alternative is making fetchLater leak data cross-origin.

annevk commented 6 months ago

This feature very much needs some kind of limit and making the top-level site responsible still seems like a good model to me. The alternative where the limit is per origin/site will instead result in cross-site tracking, which is a much worse problem.

Requiring some kind of unsafe opt-in token for nested documents might be reasonable though.

arturjanc commented 6 months ago

Can you clarify what the tracking vector is here? Specifically:

I'm sure you have a point here, but I don't immediately see how the limit addresses tracking problems, i.e. I don't think the threat model has been documented well enough to explain the concern. Especially if we're trying to do something that goes counter to the same-origin policy in response, it seems important to understand this better.

Also, the limit doesn't need to be per-origin/site, but could be per-document / storage partition. Is that any better from your perspective?

yoavweiss commented 6 months ago

Just to clarify, when referring to "quota per reporting origin", I was considering that quota to be partitioned on the top level site (and maybe also the iframe site), but may not have actually said that. Apologies!

@annevk - is that the part that you consider as a cross-site tracking vector? Or were you referring to something else?

annevk commented 6 months ago

@arturjanc this API temporarily persists data after the current tab has been closed. Space for that is limited and doesn't naturally fall under any existing quota. I don't see a no-quota solution as tenable.

arturjanc commented 6 months ago

Now I'm confused... You can persist data after the current tab is closed in many ways (set localStorage, etc.) and you can also use ServiceWorkers or SharedWorkers to execute code and send network requests after the tab has been closed. How does this allow cross-site tracking, and why does this come into play only after the tab has been closed (since you presumably could achieve the same while the tab is open)?

Basically, I'd really like to understand what threat a per-top-level-document quota is addressing here and why a 640K quota would help, compared to not having any quota.

noamr commented 6 months ago

Now I'm confused... You can persist data after the current tab is closed in many ways (set localStorage, etc.) and you can also use ServiceWorkers or SharedWorkers to execute code and send network requests after the tab has been closed. How does this allow cross-site tracking, and why does this come into play only after the tab has been closed (since you presumably could achieve the same while the tab is open)?

The way I understand it, requests from Shared/service workers can be cancelled once all their owner windows are gone. The top-level quota is not about cross-site tracking, but rather about having pages quietly abuse the network bandwidth after they've been closed, without an ability for the user to do anything about it.

annevk commented 6 months ago

Right, and the tracking comes into play depending on how you do the quotas.

arturjanc commented 6 months ago

Alright, I think I understand this now and have a couple of possible approaches that should address the concerns that have come up here.

As a summary, fetchLater itself (without quotas) does not allow cross-site tracking, but creates a network abuse concern where sites can send requests after the user has closed the tab. If we want to solve this problem with quotas to limit the amount of data sent with fetchLater, and we misalign the scope of the quota with the web's privacy boundary (e.g. have a per-origin global quota, so that same-origin iframes embedded in different third-party contexts share one quota), then this would allow cross-site tracking.

So overall we have three security/privacy/abuse-related risks here:

  1. Cross-site tracking, if there's a quota whose scope spans state partitions.
  2. Cross-site leak, if there's a quota that's shared between any cross-origin/cross-site documents of frames.
  3. Abusive consumption of network resources which can't be prevented by the user closing a given tab.

In my view, (1) and (2) need to be solved, i.e. we shouldn't design the API that introduces either risk. The impact of (3) is much less clear to me given that IIUC we already have a number of other behaviors on the web that permit the same behavior (e.g. Service/SharedWorkers which remain active after closing a tab and whose network requests aren't always cancelled when that happens) and AFAIK there's no compelling evidence that this has become a significant problem. But taking @annevk's "no-quota solution [isn't] tenable" at face value, I think it's reasonable to try to address this if possible, so let's try.

Here's a quick summary of the approaches I think we could take here:

  1. No quota after all :) We solve problems (1) and (2) if we don't introduce a quota. We don't solve (3), but this seems like a problem that could be addressed with @yoavweiss's https://github.com/WICG/pending-beacon/issues/87#issuecomment-1955964723 proposal above, i.e. targeted browser interventions in cases of abuse.
    • We should also make sure to cancel all pending requests when the user clears state for a given origin/site. This would give users the ability to stop any abuse by clearing cookies/state, either for a specific domain or globally.
  2. A quota scoped to (top-level site, document site).
    • Roughly corresponding to triple-keyed partitioning some browsers have for HTTP cache and some other state. It prevents (1) and (2) (with the caveat that this would still leak information between same-site-but-cross-origin documents, but that's arguably less severe than other existing leaks in this scenario). It wouldn't fully prevent (3), but it would create a small cost because it requires the abuser to purchase separate top-level domains (or use a domain on the PSL) to execute code in separate partitions and get additional quota.
  3. A per-document quota with a restriction on the API to top-level documents only.
    • If we only permit fetchLater to be used in top-level contexts without being exposed to iframes, then we address all concerns, but likely significantly reduce the value of the API.
    • Alternatively, we could permit the use of the API in same-origin iframes and have a quota scoped to the top-level origin, which is a bit more permissive, but likely still bad from a utility perspective.
  4. A per-document quota with a restriction on the API to top-level documents only and allowed permission delegation to a small fixed number of iframes.
    • We could permit the delegation of the permission to a chosen number of iframes (e.g. 10) for a given top-level window. In that model the first 10 frames which get delegated permission could use the API and further delegations would silently fail. This limits the amount of network data that could be sent in a given top-level window to 10x the per-document quota.
    • This is similar to the current proposal, except that the quota would not be per-top-level-document, but rather per-document, addressing the cross-site leak problem.
    • In this model there still exists a (smaller) cross-site leak because the shared limit on permission delegation within a given top-level window is something that can be detected by iframes which have been granted the permission. So e.g. with a limit of 10, if A delegates permission to B and C, and then B delegates permission to 9 of its own iframes and the last delegation fails, B will know that A delegated permission to at least one other frame.
      • This is a less interesting information leak than what we were discussing above. We could fully fix it by preventing re-delegation of the permission, so fetchLater would only be available to the top-level document and its direct descendants, but not to any more deeply nested frames.

My gut feeling is that any of these options would be okay-ish; I think (4) has an advantage of addressing the bulk of the concerns, and especially if we can limit it to the top-level document and its direct iframes, it would work. The main cost in this model is that it adds a lot of potentially unnecessary complexity. Because of this I have a slight preference for (1) i.e. no quota unless we have some evidence that the consumption of network resources from closed tabs is a real issue.

mingyc commented 6 months ago

(3) probably contradicts the reason why this API is developed. (4) sounds good to me.

@noamr @yoavweiss @annevk WDYT?

yoavweiss commented 6 months ago

(4) seems fine as long as we also introduce further per-reporting-origin restrictions, to avoid introducing negative dynamics between multiple 3Ps operating on a page.

To be clear, these per-reporting-origin restriction would be on top of the per-document ones, and will not share a quota across documents.

As an example, we can say that each reporting origin can report 64KB, each top-level document can include 10 reporting origins to a total quota of 640KB.(*)

Then that top-level page can further explicitly delegate e.g. 10 other iframes to report to 1 reporting origin with 64KB quota each. (**)

(*) A script running in the context of the document would be able to know when 10 reporting origins were already exhausted, but such a script can already run arbitrary code in the context of the top-level page, and e.g. override the fetchLater API entirely.

(**) This assumes that for iframes, a single reporting origin would be enough. If not, we can extend that to 2-3. Ideally, the top level origin would have provided parameters and define what quota each iframe gets. I don't think we have such a mechanism in place though, and I don't think this alone merits the complexity.

noamr commented 6 months ago

Then that top-level page can further explicitly delegate e.g. 10 other iframes to report to 1 reporting origin with 64KB quota each. (**)

What's the point in the 1 reporting origin restriction here? If the quota for the iframe is 64kb anyway, it doesn't matter how it's divided between reporting origins.

yoavweiss commented 6 months ago

What's the point in the 1 reporting origin restriction here? If the quota for the iframe is 64kb anyway, it doesn't matter how it's divided between reporting origins.

I guess you're right and the reporting origin restriction here would just increase competitive pressure to "grab" the reporting origin, if such pressure exist in cross-origin iframes.

noamr commented 6 months ago

Also I think sharing the quota freely with same-origin iframes is OK. So to summarize 4 into a proposal:

Does this sound OK to everyone?

mingyc commented 6 months ago

@arturjanc Are you okay with us proceeding with this approach?

arturjanc commented 6 months ago

@noamr Do you mean that only 1 cross-origin iframe can be allowed to use fetchLater, or that up to 10 cross-origin iframes could get the delegated permission via the Permissions Policy (in which case they would use 10x64kb i.e. the full quota of the top-level document)?

I think both are okay, but in the second case the quota accounting gets a bit trickier; also, would an iframe be permitted to delegate the permission to its own frames? (It would be easier if we didn't allow that, but I'm not sure if that would affect any use cases you have in mind for this API.)

Sharing the quota with same-origin subframes, the per-reporting-origin limit and keeping the API synchronous sound reasonable to me.

noamr commented 6 months ago

@noamr Do you mean that only 1 cross-origin iframe can be allowed to use fetchLater, or that up to 10 cross-origin iframes could get the delegated permission via the Permissions Policy (in which case they would use 10x64kb i.e. the full quota of the top-level document)?

I think both are okay, but in the second case the quota accounting gets a bit trickier; also, would an iframe be permitted to delegate the permission to its own frames? (It would be easier if we didn't allow that, but I'm not sure if that would affect any use cases you have in mind for this API.)

I see the accounting like this - a cross-origin iframe in the document that allows deferred-fetching is equivalent to a single live 64kb fetchLater. So this allows delegating 10 frames from the root, and a frame can delegate.

noamr commented 6 months ago

Envisioning the algorithm as such:

mingyc commented 6 months ago

@arturjanc Could we proceed with this proposal? If so, @noamr Could you help updating the spec? I will update the Chromium implementation afterwards.

noamr commented 6 months ago

@arturjanc Could we proceed with this proposal? If so, @noamr Could you help updating the spec? I will update the Chromium implementation afterwards.

Waiting also for @annevk to see if the direction in https://github.com/WICG/pending-beacon/issues/87#issuecomment-2043122221 is acceptable.

annevk commented 6 months ago

What happens with a chain like A -> B -> C with A and B delegating? B would not have quota? What if B had used the API before it created C, C would not have quota?

What happens with A1 -> B -> A2? Assume that A1 and B don't delegate. Does A2 have quota?

What happens with A -> B -> C1 and a parallel A -> C2? Assuming A delegates to C2, does C1 have quota?

I think this proposal works, but we have to be fairly specific as to how this quota delegation works and when it succeeds and when it doesn't. And same-origin with the top-level seems reasonable (modulo ABA), but not sure about same-origin for any cross-origin documents.

It does seem like it could have the properties that make everyone happy, including addressing @arturjanc's smaller cross-site leak if I understand it correctly.

noamr commented 6 months ago

What happens with a chain like A -> B -> C with A and B delegating? B would not have quota? What if B had used the API before it created C, C would not have quota?

Right. A cross-origin iframe is considered a live 64kb fetchLater. So A -> B -> C would mean that B won't have quota. If B has a live fetchLater when loading the iframe, C will not be permitted to use the API (as if allow wasn't set).

What happens with A1 -> B -> A2? Assume that A1 and B don't delegate. Does A2 have quota?

No. For simplicity, delegation is direct only.

What happens with A -> B -> C1 and a parallel A -> C2? Assuming A delegates to C2, does C1 have quota?

It does not. Quota only works directly to your cross-origin iframes. C1 can use C2's quota by communicating to it via BroadcastChannel or shared workers or whatever.

I think this proposal works, but we have to be fairly specific as to how this quota delegation works and when it succeeds and when it doesn't.

Yes, will turn this into a very explicit algorithm in the fetch PR.

... not sure about same-origin for any cross-origin documents.

I don't understand this comment, can you rephrase please?

annevk commented 6 months ago

@noamr I think that's reasonable, but note that A2 and A1 have direct script access to each other and so A2 could use A1's fetchLater() as well and have it work (as things stand today).

(You addressed the comment you didn't understand earlier, in my question about C1 and C2. Although it's unclear if they can communicate directly long term as they will probably end up having separate partitions.)

noamr commented 6 months ago

@noamr I think that's reasonable, but note that A2 and A1 have direct script access to each other and so A2 could use A1's fetchLater() as well and have it work (as things stand today).

A1 -> A2 would share the same quota. It's shared between the top-level root or a top-level-of-its-origin iframe with all its same-origin iframes and their same-origin iframes and so on. In other words, if you can access the contentDocument by script, you're sharing the fetchLater quota.

annevk commented 6 months ago

Right, but A1 and A2 can access each other's document in A1 -> B -> A2, but given your definition A2 would not have quota. (Again, I think that's reasonable as we are slowly moving toward considering that to be a boundary.)


Somewhat unrelated, but given that @arturjanc mentioned it again above: I feel like I should point out (again?) that not all browsers feel that script execution should be possible beyond the lifetime of a tab.

noamr commented 6 months ago

Right, but A1 and A2 can access each other's document in A1 -> B -> A2, but given your definition A2 would not have quota. (Again, I think that's reasonable as we are slowly moving toward considering that to be a boundary.)

Right, forgot about window[name]. Happy to keep that as a stricter boundary.

noamr commented 6 months ago

@arturjanc Could we proceed with this proposal? If so, @noamr Could you help updating the spec? I will update the Chromium implementation afterwards.

Done, check out https://github.com/whatwg/fetch/pull/1647

nicjansma commented 6 months ago

Sorry I know this discussion is long and drawn out, but I just want to verify a few assumptions I have, and to think out loud how a consumer of this API (e.g. as a RUM provider, like I contribute to) might be affected by these quotas.

Assumptions, are these correct?

Questions:

Here's a scenario I'm worried about:

If there's a hard limit on 10 reporting-origins specifically, that would encourage a script to "pre-register" their fetchLater("https://myorigin.com/nop") in case they want to use it, before other competing-for-10-reporting-origins are able to. This could be wasteful if that script doesn't actually need the request, and/or isn't able to register a real payload before the page unloads.

If it's not a hard limit of 10, but just sum(all reporting-origins)<640kb, I can imagine that fetchLater()-dependent scripts might then be (perversely) encouraged to pre-register a 64 KB dummy payload at startup in case they might ever need to use it and/or grow to it over time (e.g. the RUM event-log use-case above).

noamr commented 6 months ago

Sorry I know this discussion is long and drawn out, but I just want to verify a few assumptions I have, and to think out loud how a consumer of this API (e.g. as a RUM provider, like I contribute to) might be affected by these quotas.

Assumptions, are these correct?

  • If a fetchLater() is .abort()ed, I'm expecting that it would immediately "give" the quota back to the document and reporting-origin (to be used by the same script/reporting-origin immediately if desired, or utilized by a different script/reporting-origin later)

Correct, the quota is calculated based on the current pending deferred fetch requests at the time you call fetchLater. When a deferred fetch is aborted, it's immediately removed from that list.

  • I can imagine .abort()s might be used by e.g. RUM providers if they're "building" a payload over time (think: a log of events), and want the "latest" payload to be ready to be sent (once) in case the page exits. So the script frequently calls fetchLater(), then when a new event happens, call .abort(), append to the new payload, and issue a new fetchLater().
    • Similarly, fetchLater() with an activateAfter that is exceeded, when that request is sent by the browser, the quota drops by that payload amount?

Also correct, when a deferred fetch is activated, it's no longer in the list of current pending deferred fetch requests.

  • If a request to fetchLater() would either exceed the per-document or per-reporting-origin quotas, there's an immediate QuotaExceeded exception that is thrown so the caller can decide to use another method (e.g. immediate sendBeacon() instead)

Correct, you'd get a QuotaExceeded DOM exception. The exception is thrown immediately when you call fetchLater, not as a promise rejection.

  • I can imagine a RUM provider would detect this, and possibly switch its runtime mode from "use fetchLater and build data over time" to instead a compatibility mode where they send data immediately or in the pagehide/vischange events (e.g. via sendBeacon())

Questions:

  • Is the number of report-origins capped to 10 specifically? Or just as a byproduct of max 640KB per document?

You can have as many report origins as you'd like. The per-origin quota is 64kb, the per-top-level-document quota is 640kb, and the per-cross-origin-iframe quota is 64kb.

  • Could a script pre-detect whether they can send up to X bytes without actually sending the request? fetchLater(64KB) then immediate .abort() seems like it might work?

Yea, that would work.

Here's a scenario I'm worried about:

If it's not a hard limit of 10, but just sum(all reporting-origins)<640kb, I can imagine that fetchLater()-dependent scripts might then be (perversely) encouraged to pre-register a 64 KB dummy payload at startup in case they might ever need to use it and/or grow to it over time (e.g. the RUM event-log use-case above).

I understand the risk, but it seems more mitigated than other alternatives. I want to be careful not create a whole elaborate economy around this quota in the web platform. In the end the embedding site chooses which providers to embed, and this problem would arise only if they include more than 10 without auditing any of this. We might want to find ways to deal with it outside of a web platform API... e.g. dev-tools that tell the embedding page that this kind of thing is happening. Open to suggestions!