Open jkarlin opened 3 months ago
What's to prevent leaking information in the script URL, like can currently be done via event-level reporting?
e.g., from generateBid(), use a scriptURL of "https://tracker.com/tracked-user-Matt-from-site1.com-visited-site2.com-and-his-id-there-is-1234.js"?
Not entirely ironed out, but the idea was roughly that the data origin of the shared storage worklet would match that of the calling PA worklet, and that the script url would need to be predeclared (e.g., in a short list of script urls hosted at https://buyerorigin/.well-known/shared-storage/X address or via some other header/js).
Hi,
Landing a bit late on this issue. This would be very useful for quite a few use cases in Protected Audience (for instance WICG/turtledove#1182).
I am not sure to understand completely the specifics though, especially the coalescing part. Would the worklet be called with a concatenation of all the arguments of all the calls to scheduleWorklets in a given timeframe?
Sorry, I missed your question! I'm curious for your ideal answer to your own question. What I was thinking so far is that we'd create a single worklet, but then either:
1) call the operation once per call to scheduleWorklet, feeding the operation the parameters passed in the call to scheduleWorklet
or
2) call the operation once, passing an array of of parameters passed to scheduleWorklet
I don't have a strong preference, though the latter seems to allow for more room to optimize?
I think the second option would be better (more room for complex processing).
Another question is then how frequently scheduled worklets for a <calling_origin, script_url, operation_name> should be able to run. There is a device resource cost to running these worklets, so I'd like to keep the frequency relatively low. Right now I'm thinking no more than once every 10 minutes. If it were twice a day or something then there wouldn't be enough Private Agg budget available. Since the Private Agg budget includes a 10 minute window, 10 minutes seems like an obvious place to start to me.
Some other constraints: It would only run if the browser is already running, that is we wouldn't open the browser for scheduled worklets. And ideally we'd find a quiescent time to run.
There are two places where one can write to shared storage but can’t run a worklet, meaning missed private aggregation based reporting opportunities if the writer doesn't have script access on the page. The two places are 1) when writing to shared storage via a response header and 2) when writing to shared storage from Protected Audience worklets.
It would be nice if the writer could schedule a worklet to run in the future. The browser could coalesce such requests and rate-limit them by origin to reduce performance issues. It might look something like:
And the response header mechanism would likely be similar to that used for writing to shared storage via response headers, and require a similar opt-in from the publisher. Something like:
The worklet script would be fetched and executed sometime in the future, likely rate limited to preserve resources. This would allow folks to write data in buyer and seller PA worklets, or via response headers, and feel comfortable that sometime soon they'd get to process the data in a worklet and generate a private aggregation report via shared storage.
What do you all think? Would this be useful? Please let us know about your use cases and if this fits the need or how it might be adjusted.