WICG / transfer-size

38 stars 8 forks source link

How do we do data accounting for ServiceWorker requests? #2

Open csharrison opened 7 years ago

csharrison commented 7 years ago

This came up in the prototype review: https://codereview.chromium.org/2180933002

and in the technical design doc. It seems like per-frame accounting won't work here.

kinu commented 7 years ago

To be clear, (putting aside our preference about whether we want to take care of SW cases in the initial spec or not) I think per-frame accounting could make sense for typical resource fetching via SW, e.g. it looks if one frame's fetching some portion of data we could basically account the amount that was fetched regardless of whether it comes from network, from http cache, from network but via SW (after fallback), or from SW's cache storage. However it feels there could be several subtle / unclear cases, e.g. How should the pre-caching by SW (i.e. network fetches that are not directly associated to particular fetch from a frame) be accounted / limited? If SW fetches data both from network and cache but ended up returning only one of the data should the limit be applied only to the portion that is actually returned to the page or to both? If SW somehow transcodes the data fetched from network and the resulting data size ends up much larger (or smaller) should the limit be applied to a) the size of data fetched from network, or b) the size of data that is returned to the page, or c) both? Etc etc...

/cc @jakearchibald

jakearchibald commented 7 years ago

Other data-using things that aren't linked to a specific page:

As long as a non-limited context exists (a top level window or an iframe without a max-), all bets are off, as I can use SharedWorker and BroadcastChannel to farm off all the bad stuff to the non-limited context.

The above limits intend to enforce the total amount of resources used by the page, regardless whether the resources comes from cache or not

I'm confused by this bit in the README. while(true); is very small but uses a lot of CPU.

jkarlin commented 7 years ago

Dusting off this explainer and issue. Here is my (draconian) proposal:

1) Constrain network bytes, not disk bytes. This way we don't need to worry about indexeddb/cachestorage/localstorage etc., usage.

2) If the frame is a client of a ServiceWorker, than all network bytes attributed to that ServiceWorker also count against the frame.

3) Likewise, any Worker/SharedWorker of the same origin as the frame has its network bytes count against the frame.

4) So what types of network bytes are we counting? Fetch(), XHR(), DOM elements, WebSockets, WebRTC. Anything else?

igrigorik commented 7 years ago

So.. if I have multiple tabs, each with an active size policy, controlled by same SW.. then one tab can drain another's budget? =/

jkarlin commented 7 years ago

Unfortunately, hence the draconian bit. I don't see a better way unless we're willing to let malicious actors abuse the system.

jkarlin commented 7 years ago

Since it's so unpredictable, perhaps we should instead just forbid frames with size policies from talking to Service Workers?

jkarlin commented 7 years ago

@ojanvafai would be interested in this discussion.

ojanvafai commented 7 years ago

My intuition is that ServiceWorker/SharedWorker should get a different limit.

The problem we're dealing with at the moment is lack of incentive to make third party content small. If you can only load a small amount of data in the frame you show the user, there's little incentive to load a ton of data in a ServiceWorker. But there are legitimate cases for precaching content for future loads (e.g. cache content while the user is on wifi to avoid charging their metered cell plan).

So, throwing an idea out there, what if ServiceWorker/SharedWorker get a budget that's 5x the per-frame limit by default, but there's an API for a publisher to tweak it? If the are multiple different limits, I guess we'd choose the max of the limits of the currently connected clients?

That's not totally satisfying, but it's the best I can think of at the moment that isn't too limiting.

jkarlin commented 6 years ago

So how about this: Resources fetched via the SW fetch event are attributed to the corresponding frame that initiated the fetch event. All other fetch() calls from the SW are attributed to all client frames.

It still punishes SWs for doing a bunch of loading in the background, but I'm not sure of a good way around that.

I don't want to provide workers with a higher limit than the frame itself, as that's effectively just making the actual limit the higher worker limit.

igrigorik commented 6 years ago

We could ignore SW initiated fetches, as a v1? Yes, it allows SW to do background activity, but presumably some frame will want to use said data at some point, which is when we'll apply the quota.

jkarlin commented 6 years ago

I'm not following the "which is when we'll apply the quota" bit since we only apply the quota to network bytes.

jkarlin commented 6 years ago

I'm leaning towards ignoring Service Workers, Shared Workers, BroadcastChannel, etc for V1. For V1 we'll focus on data used directly by the frame. For v2 we'll focus on data used indirectly by the frame.

This fits in nicely with the fact that the API guarantees that the frame has used at least the threshold of bytes. It doesn't guarantee that it hasn't used more. This fuzziness comes from the fact that cross-origin resources have some random padding subtracted from their size.