WICG / turtledove

TURTLEDOVE
https://wicg.github.io/turtledove/
Other
522 stars 225 forks source link

Limit on Number of Interest Groups / Web Bundles #46

Closed appascoe closed 1 year ago

appascoe commented 4 years ago

In the Aggregate Reporting API repo, it's written:

Pending reports take up storage on the client’s device, so there should be some limits on the total storage this API can use per origin.

I presume we would want similar per-origin limits on interest groups and web bundles in TURTLEDOVE. I think this is worth explicitly stating in the explainer.

This leads to a follow-up question: I would also presume that we would want some global limit on the amount of storage occupied on the client's device. Is this accurate?

I would argue in favor of a global limit, but such a limit does open up some attack vectors. For example, a malevolent actor could, in principle, spoof being many origins, reach the quota for each origin, so that the data for prior, legitimate origins get evicted. Perhaps a mechanism such that only origins that are .well-known can add data to the browser. Maybe this is the intent, but I don't see it specified.

Beyond this, there doesn't appear to be a specification that would prevent a malevolent actor from writing their own interest groups into the browser under a legitimate origin's name. This could result in eviction of legitimate data, or result in nonsense requests during the interest group request.

michaelkleber commented 4 years ago

All browsers have to manage resources — for example, every time an HTTP response comes with a Cache-Control header, it's asking the browser to store it for some amount of time — and the details of how to handle resource constraints are up to the browser's discretion.

In the TURTLEDOVE case, browsers could track how old things are and how often they've actually won an auction, so could make informed decisions about what to evict if space constraints forced them to do so. And web bundles could always be deleted and then re-downloaded later, if the browser chose.

Beyond this, there doesn't appear to be a specification that would prevent a malevolent actor from writing their own interest groups into the browser under a legitimate origin's name.

The joinAdInterestGroup API must be called from a window (top-level or iframe) whose origin matches the group's owner.

ablanchard1138 commented 4 years ago

Hello,

We also believe that the browser having to handle a large number of ad bundles could lead to degraded web experience for users (large consumption of data and usage of memory to load all the logic, but also pictures, videos, etc). That's one of the reason for the existence of SPARROW's gatekeeper, hosting the auction outside of the browser, and only sending the creatives to be displayed (and not all that could be displayed eventually).

However, solving this issue by reducing the number of interest groups is going to strongly impact the performance of advertising. If this is to happen, then indeed it needs to be stated.

@michaelkleber, could you elaborate on how this mechanism of filtering based on winning rate / size would work?

Does it require some server-side actions to happen? Since the average winning rate can be quite low for many campaigns on many users, doing it fully in a browser is probably not realistic to achieve a good enough user-level filtering performance.

Also, how advertisers would be notified that a particular bundle has been dropped? This is an important information that should influence marketing strategies.

michaelkleber commented 4 years ago

The web bundle to render each ad is just something downloaded from a URL. So even if the browser needs to conserve space and deletes some stored web bundles to do so, it could always download the bundle resource again, at render time. (It would be a privacy problem if the browser usually made a network request at ad render time, but doing so occasionally seems fine.)

There's no reason that I can see to evict the in-browser bidding JS functions, so all ads should get to participate in the auction. And there would be no server-side action or notice of re-download.

ablanchard1138 commented 4 years ago

Thanks for the clarification and sorry for my delayed answer. If I understand correctly:

Is this correct?

If so, it raises a few concerns of privacy vs performance, that will directly stems out of the delay do you expect between a dropping of the creative assets and their subsequent reloading.

Either you let several auctions pass to notice that a trashed bundle should have won, and then reload it for future potential impression opportunities. The inconvenient is that you actually let deliberately win a bundle that was not supposed to win, resulting in unfair auction (which could bias advertiser strategies) and lower publisher spend. Or, you reload the bundle very frequently, somehow pretty accurately to avoid losing to many opportunities, but then you have a bundle URL call timestamp that is very close to the impression itself, and then you miss out on the TURTLEDOVE promise of asynchronicity.

In any case, this seems like an important part of the whole TURTLEDOVE pipeline, and I think it would be of value to mention it in the main proposal. A more detailed explanation would be extremely useful to better understand the consequences of it.

JensenPaul commented 1 year ago

Closing this issue as it represents past design discussion that predates more recent proposals. I believe some of this feedback was incorporated into the Protected Audience (formerly known as FLEDGE) proposal. If you feel further discussion is needed, please feel free to reopen this issue or file a new issue.