Clarification on Entities

appascoe commented 4 years ago

We were having a meeting the other day, and it dawned on us that we had some confusion between us on which entities are being called in the different requests on TURTLEDOVE. We'd appreciate some clarification. With the two different interpretations we have, we see different challenges.

Entities

The TURTLEDOVE doc refers to some entities as "ad networks," for example first-ad-network.com and second-ad-network.com. It's a little confusing what exactly these refer to. I see the breakdown as:

Advertiser: An entity that wishes to advertise its offerings.
Publisher: An entity that wishes to sell inventory for ads.
DSP: An entity that represents multiple advertisers. It submits bids to SSPs on the behalf of its advertisers.
SSP: An entity that represents multiple publishers. It receives bids from DSPs and runs auctions, selling its publishers' inventory.

Some entities may be a mix of these responsibilities, but for sake of argument, let's consider them separately.

On an advertiser's page, a DSP has a pixel that would add the browser to a set of interest groups. In this scenario, first-ad-network.com would be a DSP server. The interest group request needs to make a request to first-ad-network.com, and so the interest group response (containing partial bid data) would be derived from the DSP's servers as well.

Subsequently, there's the contextual request. According to the TURTLEDOVE docs:

An interest-group request: An additional ad request, of a new and different type, is constructed by the browser and sent to the same publisher ad network.

This implies that the interest group request and contextual request call out to the same entity, first-ad-network.com, already defined to be a DSP. (In addition, in #20 I see in the discussion, "I was imagining that each piece of in-browser JS would receive signals from one ad network — the same ad network that wrote the JS in the first place.") However, also as described in #20:

Here's what happens at the time of a page visit, calling out the things that I glossed over in the explainer:

Person navigates to publisher page

Publisher's ad network has a script on the page which issues the contextual/1p ad request to their ad server, like today. This includes all the normal information about what page the ad would appear on.

Server-side, some exchange sends RTB call-outs to various DSPs, including contextual and 1p signals. In today's world, the responses are bids that go into an auction. In a TURTLEDOVE world: The DSP's response could include more stuff — some signals encoding that DSP's opinion about the topic of the publisher page.

This seems to indicate that the contextual request goes to an SSP instead of a DSP.

DSP/DSP Challenge

Assuming that both requests go to the same DSP, this seems like SSPs have no fundamental role in a TURTLEDOVE world, and would instead have to pivot to being solely DSPs. It would be incumbent on DSPs to have their domains included on an as many publisher ad-network lists as possible. This seems relatively low friction to just have a publisher add a domain to a text file, so I can't really see how exchanges provide any significant value in this scenario.

Reading:

In the latter case, a URL like https://first-ad-network.com/.well-known/ad-partners.txt can list the domain names of other ad networks that first-ad-network buys space from, and a public key that the browser can use to encrypt the interest group information while it is passing through other ad networks. (Probably this should be a part of the IAB ads.txt spec, instead of a new .well-known file; it's similar to their "authorized sellers" — and they can come up with a better name than "ad-partners" for the relationship.)

Does this mean that the DSP dsp.com would be able to write interest groups under SSP's name ssp.com? Even so, it still feels like a strong incentive for the DSP to create relationships with publishers directly.

If it's supposed to function like this, there's another issue. The DSP dsp.com writes into the browser, under the SSP ssp.com domain interest_group=www.wereallylikeshoes.com_athletic-shoes. But then later, the browser calls:

GET https://ssp.com/.well-known/fetch-ads?interest_group=www.wereallylikeshoes.com_athletic-shoes

However, dsp.com is the entity in generating the response. Are we expecting ssp.com to forward this request to dsp.com? Why? That seems like additional unnecessary traffic.

DSP/SSP Challenge

The issue here has to do with the contextual response. Given that the bidding.js function has signature function(adSignals, contextualSignals), it's unclear what the SSP would actually include in the contextualSignals object and how it gets passed around:

Assuming that the SSP has coordinated a bunch of responses from DSPs, does the contextualSignals object contain data from all DSPs or some "winner(s)" that the SSP predetermines? If it contains all data, then this would seem to imply that every DSPs bidding.js would include contextual signals from all DSPs integrated with the SSP. If it only contains a subset, then not all DSP bidding.js functions can effectively execute; this is problematic because no interest group data was available during the SSP's selection, and valuable opportunities (for the DSP, SSP, advertiser, and publisher) are missed.
If the contextualSignals object contains information that is solely derived by the SSP (without DSP input, that is), this would seem to hamper a DSP's ability to control its own bids on contextual opportunities, or really, even have control over its own bids when interest groups are involved. From a DSP's perspective, it's desirable to apply ML techniques to both the contextual and interest group requests, and have the browser combine them consistently.

The documentation seems ambiguous to us. Which of these scenarios is intended, or is it neither?

michaelkleber commented 4 years ago

I think you're reading https://github.com/michaelkleber/turtledove/issues/20#issuecomment-602800377 the way I intended. But the short answer is:

When fetching ads, the browser always talks to the publisher's SSP, just like today. That's the party who knows the publisher's needs, including what buyers are allowed onto their page.
When an interest-group request goes to an SSP, there's an opportunity for RTB-style buying by all the DSPs that it works with. There are provisions for the interest group to be encrypted so that the SSP can't see it, but can only pass it along to the DSPs who buy for that advertiser.
When a contextual request goes to an SSP, there is another opportunity for RTB-style call-outs to DSPs, if those DSPs want to send their own contextual signals back to their in-browser bidding code.

appascoe commented 4 years ago

So if i understand correctly, this means that:

const myGroup = {'owner': 'www.wereallylikeshoes.com',
                 'name': 'athletic-shoes',
                 'readers': ['first-ad-network.com',
                             'second-ad-network.com']
                };
navigator.joinAdInterestGroup(myGroup, 30 * kSecsPerDay);

should probably change to:

const myGroup = { 'owner': 'dsp.com'
                 'advertiser': 'www.wereallylikeshoes.com',
                 'name': 'athletic-shoes',
                 'readers': ['first-ad-network.com',
                             'second-ad-network.com']
                };
navigator.joinAdInterestGroup(myGroup, 30 * kSecsPerDay);

Is that correct?

appascoe commented 4 years ago

Actually, a quick addendum:

If everything goes through an SSP, I would contend that this is not how things work today. Right now, SSPs are completely blind to interest groups anyway. While encrypting can maintain that blindness, it seems like an unnecessary call or even encryption; why not just go to the DSP directly? (To be clear, this is a purely technical argument.)

It doesn't seem to me that SSPs provide much value here. They're no longer able to run auctions themselves (since that all happens in the browser anyway), and there's no significant integration burden on the publisher side to work with DSPs directly. What do you think the SSPs are bringing to the table in this scenario? Why would a DSP be willing to pay an SSP's margins? (To be clear, this is purely a business argument.)

dialtone commented 4 years ago

I would add that going through the SSP for the fetch ads requests also adds multiple additional costs and technical limitations. In terms of costs:

Decryption of payloads is an additional cost that everyone will need to pay for such requests
The SSP and all the DSPs, now need to have capacity to process not just web requests to publishers, but each and every interest group fetch ads, and refresh of those ads. This could easily be as big as the volume currently handled by exchanges.
The double forward of the request essentially increases the latency of such request.

limitations/other notes:

The SSP can now learn about the groups a browser was added to.
The SSP will likely impose limits to the minimum values of caching headers or parameters to avoid getting DDOSed.
Introduces an unexpected single point of failure in the exchanges not just for displaying ads, but also to store information on the browser around interests of the user.

If the original scope of this was to guarantee that you can't just enter bids everywhere, you can still solve this by flipping the validation between buyers and SSPs by having the SSP instead publish a file of known buyers on their network, that will validate the joinGroup call done by the DSP on www.wereallylikeshoes.com. This would also remove the need to have the decryption and leave the SSP in charge of the relationship with publishers.

I'd love to understand if I'm reading this need right, as things stand at the moment it seems to me that this way of fetching ads isn't worth the extra complexity and costs that it brings.

michaelkleber commented 4 years ago

You're doing an excellent job of illustrating why the original explainer just said first-ad-network.

If the ads industry decides that it wants to re-shape the structure of business relationships based on these new technological capabilities, then go right ahead. (Though my recollection is that prior discussion in the web-adv BG concluded that what you're describing isn't much different than header bidding.)

The only thing that matters from TURTLEDOVE's point of view is whether or not the publisher has a direct connection to the buyer; I don't care whether that buyer is a DSP or an SSP today. If the connection between the publisher and the buyer is mediated by some other entity, then we can support it using the flow I described above.

That said, I suspect the SSPs of the world would object to you DSPs claiming they add no value here. To pick one example, surely publishers will still need brand safety controls that affect what kind of ads appear on their sites. Each pub maintaining that sort of configuration on every DSP individually sounds like a substantial burden.

dialtone commented 4 years ago

Hey michael, I'm not sure if the response is to me or andrew but I'm a bit confused by your reply anyway.

The problem here arises from the complexity of the flow intended by TURTLEDOVE. As the document itself describes with encryption of payload, storage of encryption keys in a well known file and so on. It's unclear what value all of that complexity adds, but it's clear that it adds multiple costs and limitations.

Aside from Andrew's claim on SSP/DSP value respectively, it's entirely possible for the SSP to exist and manage the relationship but not be the only entry point for all requests from a client. As mentioned this would increase latency (due to decryption and request forwarding) and decrease reliability (higher request volume and single point of failure for more functionality than for today).

You state "the ads industry decides that it wants to re-shape" but, aside from the obvious remark, TURTLEDOVE is doing the re-shaping here as the SSP today is not in the line of sight of interest group modifications, that data today remains private between buyer and DSP, it's fine to put it in the browser and keep it private there, but I don't see why the SSP should be involved in this mix.

appascoe commented 3 years ago

It's probably my fault for mixing both business and technical concerns; they inform each other, but right now I do want to be more focused on the technical side. There are two main things I would like to understand better:

1) What value an SSP brings during the interest group request call. Just echoing @dialtone, it seems wasteful without any clear purpose.

2) The SSP being called on the contextual request. If we take as a given that SSPs still provide value by partnering with publishers, how would an SSP determine which contextual signals get sent back to the browser? Presumably the SSP would fan out the contextual request to its DSP clients, each of which would return their own contextualSignals to be passed into any bidding.js that may have been supplied previously through an interest group response. So:

i) Does the SSP respond to the browser with all contextualSignals objects from each DSP? If so, is this burdensome for the browser? I'd also contend that the spec needs to be fleshed out more such that the contextualSignals from a given DSP only make their way to that same DSP's bidding.js function.

ii) If the SSP doesn't respond to the browser with all contextualSignals, how would you imagine the SSP selects which single, or set, of contextualSignals objects do get sent to the browser? From what I can tell, there is not enough information available to the SSP to select the "best" choice, due to the uncorrelated nature of the contextual and interest requests.

michaelkleber commented 3 years ago

Sure, focusing on the two technical questions:

The SSP / publisher's ad platform is the one responsible for enforcing the publisher's rules about what ads are allowed to appear on their page. I imagined that the ad creatives needed to flow through the SSP at some point so that they can exercise that kind of discretion: scan the ad for malware, label the ad with metadata like "restricted category #17 (alcohol)" so that their on-device JS can later filter out ads based on that metadata and the publisher's settings, and so on.
Yes, your answer (i) here has it right: the SSP passes all the contextualSignals from all the DSPs back to the browser, and each DSP's signals are exposed only to its own on-device bidding JS.

dialtone commented 3 years ago

Hey, thanks for the answer.

Your point (1) makes sense, however for the SSP to properly enforce those measures the fetch_ads request would need to happen at the same time as the ad rendering given that for SSPs the filtering rules are a function of the publisher and the fetch_ads request happens uncorrelated. Given how this works currently in the spec, it would be fair to say that in order to achieve the same level of functionality you could have the browser send a third request to relevant SSPs communicating to them the web bundle received.

I think this would be more efficient anyway given that in the spec case you would need to send a number of fetch_ads requests anyway, one to each SSP. The SSP can then communicate server-side with the DSP about ads that got disapproved by them in general.

In either case both TURTLEDOVE in its current shape, or by moving the fetch_ads request around would end up needing an extra step to actually have the publisher choose what ads to display exactly, today the publishers can choose not to work with certain DSPs or brands for example, and I don't think TURTLEDOVE covers for this at the moment.

Regarding your point (2): how will the browser route the contextualSignals around if the DSP doesn't participate in the contextual bid? Does that mean that it would be mandatory for every DSP to participate in the contextual bid if they want their interest group bid to be evaluated for the given request? Seems quite heavy of a requirement given that it's not possible for the DSP to know at contextual bid time if they have ability to participate in the interest group, so it would effectively ask everyone to always participate, and the DSP would need to return a lot of results each time.

michaelkleber commented 3 years ago

For (1), I can picture a flow that doesn't need an extra request:

The publisher's SSP makes up whatever classification of ad content they want, and offers publishers controls based on that classification.
At the time of the interest-group fetch_ads request, the SSP attaches their ad content classifier labels to the creative, as it flows through the SSP on the way to the browser.
At the time of the contextual request, the SSP knows who the publisher is, and so the contextual response can include information about what ad content labels are not allowed.
The SSP's in-browser JS compares the labels on the interest-group-targeted ads with the publisher's rules, and removes ads from the auction if the publisher rules say to do so.

The same flow works for publisher configurations regarding DSPs, brands, etc.

Regarding (2): I can imagine SSPs and DSPs inventing a variety of approaches for how to deal with contextual signals. A few possibilities are:

DSP participates in the RTB call-out for each contextual call
SSP offers some kind of caching mechanism, so that DSP doesn't need to participate every time on popular pages
SSP makes up its own contextual signals that it offers to all of its DSPs, and the DSP decides that they are good enough that they prefer to just use the SSP's signals instead of coming up with its own

Note that these all differ in what happens when servers talk to each other. In other words, this is all engineering by ads companies, not by browsers. If you'd like browser support for a way that contextual signals can get from the DSP to the browser without going through the SSP, that seems straightforward — but of course the browser wouldn't tell the DSP anything about what interest groups the user is in.

appascoe commented 3 years ago

Another spitball idea for (1):

The DSP syncs its ad units with the SSP along with an ad id and a declared content category for the ad. SSPs can perform regular audits that the DSP is honest and trustworthy.
Through the ad approval process, a DSP receives a token from the SSP indicating the ad is trusted. SSPs can implement their own ad approval policies for this; it's up to them when they issue tokens.
Require DSPs to declare the content of their ads in the web bundle. Without a declaration, the browser rejects the bundle.
Require DSPs to send the SSP's trust token in the web bundle.
On the publisher page, the SSP can cash in the token through their bidding.js and apply content filters as appropriate, using the DSP's own declared category.

I think this implements about as much trust as exists today, but doesn't require a call to the SSP for every interest group request, but only once per ad.

As for (2):

Participating in each contextual call seems to be the most preferable of the three to me.
Caching mechanisms can be dangerous due to pacing issues or day-parting. It's not bad in principle, but I'd be concerned about inappropriately ripping through budget with a high bid that is reapplied for a length of time that is at the SSP's discretion.
I'd certainly be interested in SSP contextual signals, but only for augmentation. We use large machine learning models for interest signals and for contextual signals. We were kind of thinking that by supplying our own bidding.js, we'd be able to essentially send over partial predictions in the adSignals and contextualSignals objects, and combine them for a final computation in the bidding.js. For example, you may have that bid = f(a1*x1 + a2*x2 + a3*x3 + a4*x4). adSignals may contain u = a1*x1 + a2*x2 and contextualSignals v = a3*x3 + a4*x4. From there, bidding.js would implement (among other logic) f(u + v). In particular, the contextualSignals object would be very important for us to pace our budgets.

Another idea. It exposes some data, but maybe it's not particularly privacy-violating: During the contextual request, part of the package is the list of ad networks that have interest groups in the browser. From there, the SSP could filter and request contextualSignals objects from only those DSPs that have data in the browser (or whatever logic they want). I wouldn't expect this to be privacy-violating in the current environment, but it is another vector for attack, I suppose.

michaelkleber commented 3 years ago

Yup, I think your ideas for (1) and (2) here both sound reasonable.

I agree that caching has risks, but it seems to me like something an SSP and DSP could make work; certainly it seems like the DSP would need to be able to specify cache lifetimes appropriately for their business needs.

I have thought about your "list of ad networks" idea, and of course the worry is that it could end up being a fingerprinting vector — especially since in the current proposal, any domain could declare itself an ad network :-). But some kind of k-anonymity thresholding could help with that risk... so if contextual call-out efficiency turns out to be a hurdle, we can work on it.

dialtone commented 3 years ago

Hey Michael, in light also of the conversation during IWABG on tuesday, what do you suggest would be next steps to have this considered as an edit to the spec?

JensenPaul commented 1 year ago

Closing this issue as it represents past design discussion that predates more recent proposals. I believe some of this feedback was incorporated into the Protected Audience (formerly known as FLEDGE) proposal. If you feel further discussion is needed, please feel free to reopen this issue or file a new issue.

WICG / turtledove