WICG / turtledove

TURTLEDOVE
https://wicg.github.io/turtledove/
Other
538 stars 237 forks source link

FLEDGE : Answering contextual requests with a generate_bid() function #116

Open MarieScibids opened 3 years ago

MarieScibids commented 3 years ago

Hello,

Following FLEDGE call discussions on frequency capping and A/B testing as part of contextual requests, we wanted to come back with a quick proposal on that subject.

We suggest that Ad Network respond to contextual requests with a _generatebid() function, along with a list of pre-selected eligible ads. Similar to FLEDGE, the function will take as input a _browsersignals, containing information that the browser knows like _prevwins to allow on-device frequency capping or information to perform A/B testing. The output of _generatebid() contains the final bid along with the associated ad. We know that something along this line was for instance already proposed in TERN.

Theoretically a scaling issue could arise since following this procedure DSP will be unable to select a priori the “highest bidder” and will have to send multiple ads (amongst the thousands potentially eligible for this request). However we could easily imagine heuristics on the DSP side to perform some sort of pre-selection in order to comply to a maximum number of ads to send: a simplistic one would be to select the ads of the top x campaigns assuming _generatebid() is maximum, and a “fallback” (ad,bid) couple from a campaign whose bid does not depend on _generatebid().

By doing that, advertisers would keep a certain control on frequency/recency capping and optimization and can still perform A/B testing without any risk for the user's privacy as everything is computed on-device.

Do you see any limitation or point that we may miss on this proposal ?

michaelkleber commented 3 years ago

As we discussed during the 2/17 meeting, the key problem here is that if we allow such a generate_bid() function to get access to the on-browser cross-site signals you're asking for, then we cannot allow the publisher page to see the output of the function.

In particular, in FLEDGE we leak the one bit of information of whether or not any ad won the auction. Even that is too much information to leak if the surrounding page gets to create an arbitrary function that determines that bit!

So your proposal seems entirely reasonable as long as the result always renders inside a Fenced Frame. That means that even if the frequency-capped ad decides not to bid because of the cap, and the contextually-targeted fallback ad without any cap ends up the winner, that contextual ad would also be forced to render with all the restrictions imposed by FLEDGE.

Rendering in this special environment — which is going to ultimately require only aggregate reporting — is a new challenge that not all ads will want to undertake. In FLEDGE, only the ads that want to use this particular targeting capability need to do the extra work. Under your proposal, we would need every ad in the auction to be willing to live in this aggregate-reporting world.

lcevans commented 3 years ago

(Speaking as a DSP) This approach appeals to us too (ability to use browser_signals for contextual bids)

But it raises another point:

michaelkleber commented 3 years ago

Yes, absolutely — this would require some new way for a creative-rendering-url in a contextual ad response to be bundled with an on-device bidding function.

What's there now is a runAdAuction() argument with an additional_bids parameter, which could be a way to hook up contextual responses with the rest of the on-device bidding. But there are a bunch of design details that would need to be worked out, as well as the privacy ones from my previous comment.

RLemonnierScibids commented 3 years ago

If I understand well the constraints induced by the fenced frame and aggregate reporting are not going to prevent the core adtech use-cases (like getting granular-enough reporting to perform ML optimization), since all interest-based advertising is going to run under these constraints anyway.

Thus I think several (a majority of?) DSP/advertisers would consider that depriving prospecting campaigns of key use-cases like frequency capping or A/B testing seems much more annoying. As an example, large consumer-packaged goods advertisers routinely use frequency as a KPI of their branding campaigns and their contribution to the overall programmatic spend is massive.

This additional_bids parameter is very interesting. Could we for instance imagine that:

Thus, no data would be sent to DSPs.

We would therefore have:

Do you see any privacy issue with this proposal?

michaelkleber commented 3 years ago

@RLemonnierScibids This does seem like a direction that FLEDGE could evolve in. But as I said before, the trade-off in ease of use for buyers who don't want the on-device bid adjustment features seems quite real to me.

RLemonnierScibids commented 3 years ago

Thanks for the feedback!

Just to be clear: our proposal is that for each bid request, the buyer will choose:

So I am not sure I see the trade-off here, since this would basically just provide a 2nd way to the buyers to participate in the contextual auction. The buyers you mention would just have to never use this new possibility.

Am I missing something?

michaelkleber commented 3 years ago

As I mentioned back in https://github.com/WICG/turtledove/issues/116#issuecomment-798991125, the key trade-off here is that once it's possible for a contextually-targeted ad to access on-device information (e.g. in a generate_bid() call), we need the winner of the on-device auction to always render inside a Fenced Frame — even if the winning ad is a different contextually-targeted ad that didn't use any on-device bidding information.

RLemonnierScibids commented 3 years ago

Ok thanks for clarifying again.

We were considering this in the buyer’s seat which in this case wouldn’t have access to this additional bit of info since DSP would have to choose between the 2 ways to participate in the contextual auction.

Regarding the publisher, if I understand well your concern is that it would have access:

Taking an extreme example of a single generate_bid_contextual() function

def generate_bid_contextual([...]):
    if user in interest_group_123 :
    return 2
    else:
        return 0

and a highest contextual bid of 1, the publisher could learn that the user is in interest_group_123 or not through its observation of whether the on-device auction produces a winner or not. Is that correct?

If that’s the case we think this issue could be alleviated since the publisher doesn't need at all to observe the generate_bid_contextual() functions in clear, and is basically just passing them to the browser which should be the entity to interpret them. For instance could we imagine the following encryption architecture:

RLemonnierScibids commented 3 years ago

@michaelkleber Since we ran out of time before reaching this agenda item in the call list, do you think you would have time to answer our last comment before the next call in 2 weeks? This feature still seems a very important factor of whether DSP will be able to implement acquisition campaigns or not. Or else I would love the opportunity to put this item at the top of the list in two weeks :)

michaelkleber commented 3 years ago

Hello @RLemonnierScibids, sorry that we didn't get to this during the call.

I don't think your idea of an encrypted contextual bidding function helps here. It would indeed provide a way for the DSP to modify its bid without the publisher site knowing that it did so, if the DSP wanted to hide that information. But if the DSP wanted to share that information, then offering an encrypted channel wouldn't particularly help; surely the DSP could communicate its intended logic to the publisher in some other way.

In essence, one FLEDGE privacy goal is that the publisher not learn interest group memberships even if the IG owner would be willing to share.

RLemonnierScibids commented 3 years ago

Hi @michaelkleber, thanks for clarifying!

We are currently studying potential solutions but first we would like to get a precise understanding on what exactly is the case you are trying to avoid.

Would you be able to give details about your assessment “in FLEDGE we leak the one bit of information of whether or not any ad won the auction. Even that is too much information to leak if the surrounding page gets to create an arbitrary function that determines that bit”, and how would this setting allow to leak more info than in the current setup?

At the moment we consider the following:

A dsp DSP_1 whose bidding function would be:

def generate_bid_contextual():
    if website == cnn.com and frequency == 0:
    bid 1
    else:
    bid 0

A SSP whose score_ad function would be:

def score_ad():
     if DSP==DSP_1 and bid > 0:
    DSP_1 wins
    else:
    no winner for the on-device auction (contextual wins)

Assuming DSP, SSP and publisher collude with each other, the publisher will know each time the on-device auction wins:

thus getting a bit of info on past behaviors in addition to the current context of the webpage.

Now I am not sure I understand how this is different from what could happen in the current state of proposal if the DSP defines all its interest group generate_bid() functions exactly as the generate_bid_contextual() function above, and the SSP defines its score_ad() function as above.

In both cases, it seems that in the unrealistic case of colluding actors ready to waste vast amounts of money to get one additional bit of info on some users it would be possible.

From our point of view this raises the question of what the privacy metric should be: should we reject an important use-case for the industry if there is a theoretical possibility that colluding actors might acquire a very expensive bit of info on a given user?

Or could we rather compute a “cost of adversary success” metric (in the philosophy of “Time until adversary’s success” described here) to see if these attacks would make economical sense or not?

RLemonnierScibids commented 3 years ago

Thanks a lot @michaelkleber for our discussion on the topic on Wednesday!

If I summarize:

The last point is where I still need clarification, since I am not able to see which other privacy attack you are referring to. Could you explain for instance how a publisher would learn “the 7th bit of a given user_id” other than with the procedure explained above?

RLemonnierScibids commented 3 years ago

hi @michaelkleber! I couldn't be there last FLEDGE call but still very interested to discuss this topic.

If you could formulate the privacy attack you identify that would be possible with generate_bid_contextual() but not with the current generate_bid() setup for interest groups we are keen to see what limitations we can think of in order to keep the same level of risk between the 2 options.

Thanks a lot!