WICG / webpackage

Web packaging format
Other
1.23k stars 118 forks source link

WebBundles for Ad Serving #624

Open jeffkaufman opened 3 years ago

jeffkaufman commented 3 years ago

Bundles are a promising way of serving ads. Ad javascript could fetch multiple ads in a single request and host each on its own origin. This would ensure that malicious ads can't read or modify either the publisher page or other ads, and that malicious parties also can't read or modify the ads. While these protections are technically possible today, in a convoluted and inefficient manner, bundles provide them in a clear and natural way, and more efficiently than current web APIs.

What's Possible Today

When publishers load Google ads on their pages, they start by including some JavaScript. This ad JS is responsible for sending requests to a server, which decides what ads to show. After the ad server makes its decision it responds with the HTML for each ad, and the ad JS renders the ad HTML to the page. When deciding which ads to return, the server can generally make the best decision if it can consider the whole page simultaneously. For example, one advertiser may not be willing to have their ad appear next to competitors', or the publisher may want to serve a large ad only if the remaining ads are smaller. This means the server should receive a single ad request for the HTML for all the ads on the page.

The best we can do today is with XHR or fetch. The ad JS sends a request, receives a list of the HTML to be written for each ad, creates individual iframes, and finally writes the ad HTML into the iframes. Traditionally, the ad JS created these frames on the same origin as the publisher page. While this is very simple and efficient, it does not offer any security protection. For example, scripts within an ad inserted this way can easily read and modify the publisher page.

wbn-doc-1

ads' HTML fetched with XHR and rendered same-origin

To protect the publisher from potentially harmful ads, the ad JS generally wraps third-party ad HTML in "SafeFrames", an IAB standard way of rendering ads. The ad JS injects an iframe pointing to a container HTML document, hosted by the publisher’s ad server on an isolated subdomain. Ad HTML is then passed into the container through the iframe's name attribute. The container sees this, and overwrites itself with the ad's HTML. Additionally, preventing the ads from interfering with each other requires every ad on the page to use a container loaded from a unique origin, so they will all be cross-origin from each other. Hosting containers on unique origins, however, adds substantial latency, since none of these containers can be cached. Slower loading ads mean worse user experience, primarily due to increased layout shift, and less money for publishers.

wbn-doc-2

ads' HTML fetched with XHR and rendered via SafeFrame

While the publisher is now protected, this approach does not protect the ad HTML from the page. The publisher or other third-party scripts on the page can make arbitrary modifications to the HTML before it is passed into the SafeFrame. Similarly, they can read the HTML and extract identifiers that can help bypass click fraud detection. It is possible to protect against this by sending the ad request from a fetching iframe on the ad network domain, and then postMessaging the HTML to receiver iframes. A fetching frame approach, however, imposes both a network round trip and a delay waiting for the container to postMessage that it's ready, which is a large amount of latency.

wbn-doc-3

ads' HTML fetched with XHR in a fetching frame and rendered in a receiver frame

What's Possible with Web Bundles

Web Bundles with subresources offer a solution to all of these problems. The ad JS can send a request for a bundle, and the response can contain each ad's HTML as a resource on its own distinct, opaque origin. The ad JS can render an ad by setting the src of an iframe to a resource that's inside the bundle, but nothing on the client can read or modify the ad HTML because the resources are opaque. Similarly, because the origins are distinct, the ads are not able to read or modify each other or the publisher page.

wbn-doc-4

ads' HTML fetched as a bundle and rendered via setting the iframe src

Overall this is more efficient, in terms of network, CPU, and latency, because it removes the need to load iframes that overwrite themselves with the ad HTML. Bundles allow the ad server to, essentially, say what it means: here are multiple resources that should each be rendered on a unique origin.

Specifically, what makes bundles useful for ad serving is that they provide:

littledan commented 3 years ago

@jeffkaufman Thanks for the writeup. Would the mechanism suggested in #623 work for your use case?

jeffkaufman commented 3 years ago

Sort of: either <iframe src="urn:uuid:..."> and <iframe opaque src="https://..."> would work for this use, but the "There's no way to map loading these opaque URLs to an 'underlying' URL for potential verification, as suggested in #551" aspect doesn't. These ads are ephemeral, and there wouldn't really be any sort of 'underlying' URL.

WebReflection commented 3 years ago

hi @jeffkaufman , I have a question regarding this point:

These ads are ephemeral, and there wouldn't really be any sort of 'underlying' URL.

could you please expand a bit on how extensions will see these iframes? more specifically:

Thanks in advance for any kind of help in clarifying more these aspects of the urn: proposal: I see benefits, as you mentioned, and not ads related only, but I'd like to fully understand what's the idea behind.

jeffkaufman commented 3 years ago

@WebReflection It sounds like you're asking a broader question about https://github.com/WICG/webpackage/blob/master/explainers/subresource-loading.md, and not about the aspects that make it a good fit for ad loading? I think your questions might make more sense as a top level issue?

jeffkaufman commented 3 years ago

@littledan Actually, I'm not sure <iframe opaque> does work, at least on its own since we have to trust the publisher not to remove the opaque. Would a response header work?