WICG / shared-storage

Explainer for proposed web platform Shared Storage API
Other
96 stars 24 forks source link

Proposal to schedule a shared storage worklet to run in the future, particularly from contexts that normally can't run worklets #174

Open jkarlin opened 3 months ago

jkarlin commented 3 months ago

There are two places where one can write to shared storage but can’t run a worklet, meaning missed private aggregation based reporting opportunities if the writer doesn't have script access on the page. The two places are 1) when writing to shared storage via a response header and 2) when writing to shared storage from Protected Audience worklets.

It would be nice if the writer could schedule a worklet to run in the future. The browser could coalesce such requests and rate-limit them by origin to reduce performance issues. It might look something like:

scheduleWorklet(scriptURL, {operation: "opName", data: {<enter your contextual data here>}});

And the response header mechanism would likely be similar to that used for writing to shared storage via response headers, and require a similar opt-in from the publisher. Something like:

The worklet script would be fetched and executed sometime in the future, likely rate limited to preserve resources. This would allow folks to write data in buyer and seller PA worklets, or via response headers, and feel comfortable that sometime soon they'd get to process the data in a worklet and generate a private aggregation report via shared storage.

What do you all think? Would this be useful? Please let us know about your use cases and if this fits the need or how it might be adjusted.

MattMenke2 commented 3 months ago

What's to prevent leaking information in the script URL, like can currently be done via event-level reporting?

e.g., from generateBid(), use a scriptURL of "https://tracker.com/tracked-user-Matt-from-site1.com-visited-site2.com-and-his-id-there-is-1234.js"?

jkarlin commented 3 months ago

Not entirely ironed out, but the idea was roughly that the data origin of the shared storage worklet would match that of the calling PA worklet, and that the script url would need to be predeclared (e.g., in a short list of script urls hosted at https://buyerorigin/.well-known/shared-storage/X address or via some other header/js).

alois-bissuel commented 1 month ago

Hi,

Landing a bit late on this issue. This would be very useful for quite a few use cases in Protected Audience (for instance WICG/turtledove#1182).

I am not sure to understand completely the specifics though, especially the coalescing part. Would the worklet be called with a concatenation of all the arguments of all the calls to scheduleWorklets in a given timeframe?

jkarlin commented 1 month ago

Sorry, I missed your question! I'm curious for your ideal answer to your own question. What I was thinking so far is that we'd create a single worklet, but then either:

1) call the operation once per call to scheduleWorklet, feeding the operation the parameters passed in the call to scheduleWorklet

or

2) call the operation once, passing an array of of parameters passed to scheduleWorklet

I don't have a strong preference, though the latter seems to allow for more room to optimize?

alois-bissuel commented 1 month ago

I think the second option would be better (more room for complex processing).

jkarlin commented 1 month ago

Another question is then how frequently scheduled worklets for a <calling_origin, script_url, operation_name> should be able to run. There is a device resource cost to running these worklets, so I'd like to keep the frequency relatively low. Right now I'm thinking no more than once every 10 minutes. If it were twice a day or something then there wouldn't be enough Private Agg budget available. Since the Private Agg budget includes a 10 minute window, 10 minutes seems like an obvious place to start to me.

Some other constraints: It would only run if the browser is already running, that is we wouldn't open the browser for scheduled worklets. And ideally we'd find a quiescent time to run.