WICG / turtledove

TURTLEDOVE
https://wicg.github.io/turtledove/
Other
513 stars 216 forks source link

Keeping marketing strategies private #32

Open Pl-Mrcy opened 4 years ago

Pl-Mrcy commented 4 years ago

In this system, the interest groups are available to the user. That means a competing advertiser B has direct access to specific elements of the marketing strategy of advertiser A. This may already be an issue for many advertisers.

To a much greater extent, having the bidding logic and ad bundles loaded in hundreds of millions of browsers raises serious concern for marketing strategic planning teams, even though these logics may not be directly available in clear. Indeed, marketing strategies often correlates with sensitive and proprietary information such as remaining stock, margin levels, specific partnerships, etc. Companies might not want to take the risk to expose these, which will results in lower advertising spend, as the performance they would get from it would decrease.

What would prevent to reverse engineer certain components and make sure that the logic remains fully hidden?

michaelkleber commented 4 years ago

That means a competing advertiser B has direct access to specific elements of the marketing strategy of advertiser A.

I'm not sure what you mean here. Are you referring to you being able to see what interest groups you personally are in, in your own browser? We certainly do want the browser to be able to provide that amount of transparency.

But advertiser B has no way to learn anything about advertiser A's group memberships for people in general.

To a much greater extent, having the bidding logic and ad bundles loaded in hundreds of millions of browsers raises serious concern for marketing strategic planning teams, even though these logics may not be directly available in clear.

But note that much of the logic still remains in servers! If you're an ad company, then the browser sends you two different requests, and each one is an opportunity to run all the sensitive computations you want.

The result of those computations isn't a single bid, like in today's world, it's two sets of signals that get combined to form a bid later. But the individual values in the "ad signals" and "contextual signals" bundles can be just as inscrutable as the single ad bid value is today.

As I commented in https://github.com/BasileLeparmentier/SPARROW/issues/5#issuecomment-628846717, I understand the benefit of SPARROW revealing less of this information to observers who might be competitors. And personally, I think you can make a better case for this part of SPARROW than for the part that leaks more information about the user.

But TURTLEDOVE offers a huge amount of flexibility in how your bidding logic works. It seems to me that if you want almost everything to happen server-side, and only the simplest possible JS in the final on-device bidding (say a dot product of two vectors), that option is available to you.

Pl-Mrcy commented 4 years ago

We certainly do want the browser to be able to provide that amount of transparency.

I understand that and I agree with you. I was just pointing out that several advertisers already expressed concerns about this element that we consider at the core of the proposals (both TURTLEDOVE and SPARROW).

My point was more directed towards the JavaScript and other proprietary elements in the ad bundles, preloaded in the browser. How can advertisers be sure that all these elements will remain secure and impregnably locked away from the competitors?

Besides, the bid values would also be accessible within the browser and thus potentially used by the competition compared to today where they are not.

michaelkleber commented 4 years ago

Let me focus on this since it's concrete:

Besides, the bid values would also be accessible within the browser and thus potentially used by the competition compared to today where they are not.

I'm still not sure which of two things you're asking about:

(a) Inside Michael Kleber's browser, ad-company-1 is going to learn what competitor ad-company-2 is bidding, and use that to do better in this auction.

(b) Someone who works at ad-company-1 is going to browse the web and then inspect their own copy of Chrome to see what competitor ad-company-2 is bidding.

For concern (a), the usual browser same-origin-policy will prevent a competitor from meddling with your data. The browser won't let ad-company-1 see other stuff happening inside my browser. We can include protection for ad-company-2's signals, both ad-specific and contextual.

For concern (b), you're quite right that an ad-company-1 employee could find out what the bids are for showing ads to them personally. But isn't this something you can often learn today as well? In header bidding situations, the bids are available directly in the browser. And for publishers who use any mechanism for including a reserve price in their ad requests, you could have your browser issue ad requests with a variety of reserve prices to learn exactly the bid price for you specifically.

Apologies if it's neither of these and I'm still not understanding the nature of your question.

ablanchard1138 commented 4 years ago

Hi Michael

We are indeed talking about (b).

Following your points and Last W3C IWABG live discussion, here are a few precisions on our points:

Header Bidding exposes the results of winning bids from auctions that already ran on Rubicon, AppNexus, Index, Criteo, Amazon, etc. The actual end client, and all those that had already lost the auction before (retailer_A, retailer_B, TravelAgency_C, Brand_X…) are unknown from the browser at the time of the prebid auction.

On the other hand TURTLEDOVE would allow exposure of all participants (retailer_A, retailer_B, TravelAgency_C, Brand_X…) bidding logic ("if audience abc and context xyz then bid x.xx"). The information value is not only in the bid itself, but also in the logic that lead to it. De facto it reveals sensitive, proprietary bidding intelligence at a much greater scale (to give you an order of magnitude, we can arbitrate dozens of campaigns before placing the winning bid externally).

We are all in for transparency of the auctions and ad decisioning rules. We vouched for it many times during meeting with SSPs (AdX, FBX, etc) in the last 8 years. The danger we are pinpointing here is not the transparency on the decisioning which allow bidders to participate in a truthful environement ("rules of the game") maximizing overall social welfare (total value creation), but rather the fact that TURTLEDOVE (or any browser-side proposal asking for preloaded bidding logic) reveals how each bidder intent to play the game ("individual strategies"). By revealing these strategies before the auctions happen (or by repeating similar auctions several times) you do lose the truthfulness properties of pure 2nd or 1st Price auctions. What we know from experience is that it introduces fear on advertiser side (they get afraid to reveal too much), and frictions in reaching an equilibrium (each reaction of each player forces the other to react etc).

And this raises concerns on several levels for many advertisers (our clients, and ourself). Here are a few of them:

A classic use case is Retail or OTA clients asking for advertising aiming at maximizing the resulting profit. As such, they will share the margin they do on particular goods, brands, locations etc and we will embed it in the way we bid and recommend items in order to maximize their overall profit – attributed or incremental depending on what they want. One can easily understand that they are already cautious in sharing these sensitive matters with a few partners. But the prospect of seeing these values, or a transformation of it, potentially readable in millions of browsers is a real concern for them.

Another example is the risk for any auction participant to only rely on the intelligence build by others, just by replicating the bidding logic and adding 1 cent to each bid to outwin others. This particular malevolent bidder would de facto steal other bidders opportunities, and could resell it to their clients at the same or at a lower price. Since this kind of operation would not require a lot of investment, the malevolent bidder could make a hefty sum without creating any value for anyone, and get a decent market share after a while. This example might appear convoluted, but it one we've seen happening in the past in 2nd Price auctions. Again, a full transparency on bidding logics of each participant could enable these feats with even greater and effective scale.

Similar to the prior example, individuals could target very precisely a specific advertiser and tank its budget by activating precisely the triggers that will make him bid at the highest cost. The bid we place for our advertisers range between 0.01 to 100 USD CPM: there is a danger that a mechanism revealing when we are to bid 100 USD could be exploited for harm or profit.

Are there any guarantees that we can give to our clients that their bidding logic cannot be read by malevolent users tempering with the browser?

PedroAlvarado commented 4 years ago

At Resonate, we've studied some of the challenges you raised a bit. With the Turtledove proposal and thus far, please find a summary of some of our conclusions below.

  1. Bidding Logic - Advertiser or proxies thereof manage the bidding logic. Bidding logic executes at the browser and the servers. We anticipate that a substantial portion of bidding logic will run on the server-side as it does today. The bidding logic that runs on the browser uses operands based on contextual and ad signals.

  2. Signals - To a degree, the contextual and ad signals are arbitrary. They may be opaque or transparent. We anticipate a high degree of opaqueness associated with these signals to the extent that they are not human-readable(e.g., A3475). Moreover, there is an indication that there may be access controls around who can make use of these signals(e.g., Advertiser-A can't use Advertiser-B Signals). See the Github comment below.

@ablanchard1138, per your concerns, having access to the browser bidding logic will likely not yield much insight to a competitor/adversary as the logic operands will be meaningless. Also, if an actor were to copy the bidding logic verbatim and attempt to use it, they will not be able to successfully execute it as they would not have access to the signals.

As I alluded in the thread linked below, we must consider defining the access control mechanisms for these signals early as Turtledove moves from proposal to specification. Equally, we need to make a requirement that the implementation of these controls is a critical component of this proposal's initial implementation.

@ablanchard1138 Let me know if this makes sense. It'd be great to hear support from you and others concerning signal access control.

@michaelkleber Please correct me, if I'm off the mark with my/our understanding.

I was imagining that each piece of in-browser JS would receive signals from one ad network — the same ad network that wrote the JS in the first place.

It seems reasonable for multiple ad networks to make some sort of agreement with each other to consume one another's signals if they mutually decide to do so. But nobody should be required to share signals if they don't want to... and the browser can deploy encryption to preserve that.

Originally posted by @michaelkleber in https://github.com/michaelkleber/turtledove/issues/20#issuecomment-608068541

jwrosewell commented 4 years ago

These are excellent points and talk about the needs of advertisers. There is an unofficial draft of success criteria over at the W3C Improving Web Advertising Business Group that I would like to see discussed further and adopted as a method of assessing TURTLEDOVE, SPARROW and all other proposals before any specificaitons are formed.

Ultimately if advertisers won't use it for at least the reasons mentioned by @ablanchard1138 then they won't spend money on the open web.

michaelkleber commented 4 years ago

I'm glad to see @PedroAlvarado's comments above; they are indeed in line with what I've been thinking.

@ablanchard1138 I agree that some ways of using TURTLEDOVE would involve putting a lot of custom business logic into the in-browser JS. You make a compelling case for why advertisers or other buyers would choose not to use it in that way.

But there's a lot of flexibility in how to use the proposed design. Remember, the buyer generates signals for their ad, and signals for each page context, and even signals for each user based on a first-party identity, as well as their own JS to combine all those signals. Keeping all the custom business logic in the server-side signal creation is an entirely reasonable way to use TURTLEDOVE, and is a natural way for a buyer to sidestep many of the concerns you describe.

To be concrete, let's take one explicit threat you mentioned:

Another example is the risk for any auction participant to only rely on the intelligence build by others, just by replicating the bidding logic and adding 1 cent to each bid to outwin others. This particular malevolent bidder would de facto steal other bidders opportunities, and could resell it to their clients at the same or at a lower price.

Advertiser A puts me, a random person, into interest group advertiser-a-1234. When my browser asks for an ad targeting the group, they hand me an ad, along with signals [[0.514, 0.706, 0.586, 0.391, 0.814], 0111101000000011]. Later I visit a page on publisher-p.com, and the advertiser's RTB sends signals back to the browser, [[0.474, 0.425, 0.807, 0.583, 0.142], 1001110111001010]. The in-browser JS creates a bid using a formula that starts by taking the dot product of the two vectors and counting the bits in the xor of the two bitvectors.

What could your malevolent bidder do?

I cannot see how your malevolent bidder could have any hope of always outbidding you by one cent.

michaelkleber commented 4 years ago

Ultimately if advertisers won't use it for at least the reasons mentioned by @ablanchard1138 then they won't spend money on the open web.

@jwrosewell I completely agree! If we produce an ad serving mechanism that few advertisers choose to buy through, or that few publishers profit from, then this work is a failure.

But of course we need to evaluate that compared to a baseline where third-party cookies are unavailable. Something like TURTLEDOVE obviously requires a lot of new work; the fact that nobody would choose to migrate to it if they could keep the status quo cannot be considered a strike against it.

ablanchard1138 commented 4 years ago

Pedro, Michael, thank you for your inputs.

At Resonate, we've studied some of the challenges you raised a bit. With the Turtledove proposal and thus far, please find a summary of some of our conclusions below.

@PedroAlvarado, could you please share your studies, so we can understand how these conclusions were reached?

@michaelkleber, reading your answers It seems that we do not necessarily put the same things behind the (broad) concept of "advertiser" or "bidding logic". Let us redefine which is which, and rephrase our examples for everyone to better understand what we mean.

Different entities coexist under the "Advertiser" umbrella (it remains simplistic and does not necessarily apply to the general advertising case, but we hope it allows to better grasp each of the examples):

"Manufacturer": the party producing the goods that are to be promoted through advertising. Example: shoe brand, hotel chain, restaurant.

"Retailer": the party in charge of selling the good. Example: retail website, Online Travel Agent, Food delivery service.

"Buying Agent": the party in charge of buying advertising, either for the retailer, or the manufacturer, aiming at a certain goal (brand awareness, visits, conversions) for a certain price (budget or ROI).

Each of these parties, often mixed together under the "advertiser" umbrella, are all having different, inter-dependent roles in the value chain, providing different pieces of the value creation (production capabilities, selling platform, advertising efficiency). Each of them are affected by the different proposals to a degree.

As for "bidding logic", let us say that it is a function that out of a particular context and a particular interest group returns a bid value. The most essential version of a bidding logic is a table with three columns: interest group, context, and bid.

We argue that it is possible to get access to the "bidding logic" (i.e. the table mentioned above) by tempering with browsers.

Let us detail what a potential attack could look like based on the first example we gave. Retailer A has different margins with its manufacturers 1 and 2, and as such it makes sense to:

Let us consider Retailer B, selling the same products from the same manufacturers as A, who wants to get an understanding of the relative margin made by Retailer A. The attack would unfold this way:

  1. Retailer B browses Retailer A site (runs a fleet of browser instances to do so). By doing so, Retailer B gets added to the various IG of Retailer A and gets the related JS bidding functions
  2. Retailer B browses the web and gets an auction (including A IGs) to run in the browser instance, with a particular context. While doing so retailer B logs the contextual signal sent by Retailer A as an input to its bidding functions.
  3. Retailer B uses that contextual signal to execute all JS bidding functions (could be outside of the browser, in any JS execution engine), and gets a bid value for each Retailer A IG for this context.
  4. Retailer B can then rank Retailer A IGs per bid value, and using the human-readable description of the IG (and a recording on how he got added in this particular interest group through browsing behavior) to see that retailer A " Manufacturer 1" IG is twice more valuable than A "Manufacturer 2" IG for instance.

This attack is only valid at "constant context". However, for many Retailers advertising use cases, the 'user' component is relatively independent of the 'context' one (the bid could be seen as a product of the two values: bid = value_user * value_context). In that case, the relative value of IGs could be easily inferred at iso context.

Furthermore, the aforementioned strategy could easily be scaled to many contexts using many browser instances. This would allow advertiser B to easily constitute the bidding table (IG x Context), all this at the expense of advertiser A, who is buying the ad placements, only.

Having access to this "mapping table" would be quite similar in practice to having access to the underlying logic itself, and could be used in the second and third example we gave. Two Buying Agents X and Y working for the same Retailer A could engage in an "emulation contest" of interest groups and bidding logic of the other, instead of outsmarting the other through fair competition by bringing intelligence in the interest group constitution and the opportunity valuation.

An individual could also build a bot or a browser extension that retrieves the maximum value paid for any display opportunity (up to 100 USD for us, but could be much higher for video for example), storing interest group and context, and artificially reproduce these exact conditions to maximize the spend for a buying agent, without any way for this actor, the retailer or the brand to prevent it in a timely fashion.

Again, these examples might appear convoluted or out of pure fantasy, but they are based either on actual client feedback or events that we encountered in the last 8 years. We feel that having the bidding logic pre-loaded in users browsers increases dramatically the risk for such events to happen. We hope this (long) explanation helps you in understanding where our concerns are.

As you seem to mention, a way to obfuscate would be to have a highly variable contextual signal. But what elements can the advertiser leverage to meaningfully modulate the contextual signals for a given URL/"external context"?

Adding noise to the context to prevent Advertiser B to build an efficient mapping table would be at the detriment of performance of the campaign, and as such, impractical.

michaelkleber commented 4 years ago

Thank you, Arnaud, that was a very clear explanation.

I don't really see how a buyer can resist a similar attack from their competitors today. I understand a server-side auction means that, when the competitor probes each page, they would see only the winning bid and not the other ones. But if the attacker is willing to scale up the attack (as you said, "to many contexts using many browser instances"), then they could surely start by getting cookies that are on some remarketing lists and not others. At that point, header bidding or publisher reserve mechanisms seem like ways to get at the same underlying mapping table.

But I do agree that TURTLEDOVE would at least speed up the kind of attack you're talking about, and might make it easier in other ways too.

Do you see any changes we could make that would help here? Of course I understand that moving all bidding to a trusted server, as in SPARROW, is your favorite answer.

PedroAlvarado commented 4 years ago

@ablanchard1138 Our work is intermingled with Resonate specific assets and, therefore, is not in a form friendly to share. We'll attempt to separate the two and share them at a later time. That said, to reach these interim conclusions, our assumption model takes all the feedback in Github, the proposal documentation, and the rational intent to minimize change and to protect business strategies(as much as possible).