WICG / turtledove

TURTLEDOVE
https://wicg.github.io/turtledove/
Other
534 stars 236 forks source link

Elaborate on what TURTLEDOVE needs from web bundles #75

Open jyasskin opened 3 years ago

jyasskin commented 3 years ago

Web Bundles are still under development, and we (e.g. @littledan) are currently trying to figure out what MVP non-Chromium groups might be interested in implementing. Could y'all write down what you expect a browser to be able to do with a web bundle in order for it to be useful to this API? For example:

Dan may have more details he wants to draw out.

michaelkleber commented 3 years ago

The guiding principle in all of these answers is that we're looking for a privacy-preserving way to download an ad at one point in time, store it for a while, and then render the ad later, inside a Fenced Frame. In both cases we want to avoid leaking information about the user — in particular, no fingerprinting at download time, and no timing attack at rendering time.

  • Does the browser need to be able to navigate a top-level page to a bundle?
  • Does the browser need to be able to navigate an iframe to a bundle?

We certainly need one of these, but which one depends on how you think about Fenced Frames. @shivanigithub and @jkarlin for their thoughts.

Just to be clear, by "navigate... to" here, you're considering the creation of a new frame whose contents are the bundle, right? I don't think TURTLEDOVE has any use case relating to taking an existing frame and replacing its contents with the bundle's.

  • Does the browser need to be able to navigate an iframe to a URL that was preloaded using a bundle?

I don't think so.

  • Would it be helpful for the fetch that retrieves a bundle to express what's already in the browser's cache, so that only the missing resources are transferred?

No — we would explicitly avoid using this feature if it existed. The contents of the browser cache could be a fingerprinting vector, letting a server distinguish different people who fetch the same bundle. Our goals include a sort of k-anonymity in which everyone fetching the bundle looks the same.

shivanigithub commented 3 years ago

We certainly need one of these, but which one depends on how you think about Fenced Frames. @shivanigithub and @jkarlin for their thoughts.

Fenced frames are embedded documents, similar to iframes but they are treated as a top-level browsing context because they are not allowed to have any iframe like communication with the surrounding page. So we would need to be able to create this new embedded document (fenced frame) using a web bundle. Additionally, the web bundle is passed on to the fenced frame as an opaque object which implies that the surrounding page is not able to know the "src" web-bundle of the fenced frame.

littledan commented 3 years ago

I'm wondering, is it important to negotiate languages, content-type support, etc within bundles that TURTLEDOVE wants to load in frames?

michaelkleber commented 3 years ago

Are you talking about what happens when the web bundle is fetched in the first place, or at the time it's rendered?

It seems to me that negotiation needs to happen at one of those times, but I'm not sure which.

jyasskin commented 3 years ago

Rendering-time content negotiation requires that the bundle contains all the options that might be negotiated, which means the client needs to download more data. Given that, you probably want to do the negotiation when the bundle is fetched in the first place. If some part of the content negotiation is private for some reason (e.g. use of client hints?), that might be a reason to do it client-side.

michaelkleber commented 3 years ago

RIght: negotiation at fetch time seems better for resource usage, and at render time seems better for privacy. I think for TURTLEDOVE we could be content with either of them, if supporting one or the other is better for your other use cases.

littledan commented 3 years ago

Thanks for explaining. I am trying to understand the requirements for loading bundles in general, so this context is really helpful for me.

darobin commented 3 years ago

One thing we would be interested in looking at is whether this could become the format for ad creatives, irrespective of whether they're loaded through Turtledove or not. (This isn't to say that Turtledove is bad or anything, just looking at making this part shared with other options.)

One property we're interested in is some degree of static analysis. It's too easy to change ad resources dynamically and create problems that are hard to defend against. Having ads be bundles that are forbidden from loading any further resource (or in fact from having any network interaction that isn't mediated through a browser API) would be beneficial.

This assumes that the bundle needs to have an entry-point resource of some kind (which would not be the case for resource bundles). I wouldn't expect such bundles to work in any (real) top-level context.

This probably does not require SXG-style signing from the origin but embedding a provenance/chain-of-custody paper trail would be very interesting. You could get things like "I never want to see an ad from X" again or "crappy company Y used up 10% of my mobile bandwidth with ads last month" at the browser level. (Note that this sort of traceability looks like it's going to become the law in Europe soonish. We should do it with standards rather than with ad networks, their implementation is consistently poor.)

For a variety of reasons (eg. integrating with frequency capping and the such) I believe it may be interesting to have a dedicated <ad> element to load such bundles into.

michaelkleber commented 3 years ago

Yes, strong agreement: there are a bunch of Privacy Sandbox use cases that would benefit greatly from all ads being Web Bundles, not just the TURTLEDOVE-served ones. Apart from the contextually-targeted ad that's competing in an on-device auction, more remote use cases include on-device frequency capping (across sites) and A/B experimentation (with consistent diversion across sites).

eriktaubeneck commented 3 years ago

One thing we would be interested in looking at is whether this could become the format for ad creatives, irrespective of whether they're loaded through Turtledove or not.

This is a great idea in spirit, but I think it would be very useful to come up with a more generic term for the overarching class of elements which want to utilize cross-site information within the APIs being considered in the Privacy Sandbox. Issue #57 on the Conversion Measurement API discusses a similar naming abstraction from "impression" and "conversion" (which are somewhat overloaded terms wrt ads) to more generic terms like "click", "view", "attribution."

For example:

  1. If publisher.com has a direct contract with advertiser.com to run some form of creative on their site, entirely within their first party scope (and not leveraging any of these APIs), then I see no way for the browser to even enforce a this requirement.
  2. There are some grey areas where the classification of "ad" may or may not apply:
    1. Sponsored content, which may not make sense to render in a web bundle.
    2. Link aggregators, which may or may not be considered "ads" but do rely currently on 3rd party cookies, and I would expect them to use these proposed APIs.