WICG / turtledove

TURTLEDOVE
https://wicg.github.io/turtledove/
Other
521 stars 222 forks source link

Questions on Fenced_Frames_Ads_Reporting document #205

Open vincent-grosbois opened 3 years ago

vincent-grosbois commented 3 years ago

Hello after reading this document : https://github.com/WICG/turtledove/blob/main/Fenced_Frames_Ads_Reporting.md I have a few questions:

1) Is this document "officialy supported" and will be intergrated to Fledge ? or is it just a proposal for now ? 2) Are 'eventData' in 'reportEvent' fully arbitrary data, or is it supposed to be within a list of possible data ? 3) What is preventing "evendata" payload from containing PII info ? is it that because of the Fledge "micro targeting protection", it won't be possible to generate an ad in the fenced frame that contains data that is too specific ? 4) Based on the code, it's possible to send buyer-centric data (encoded in eventData) to the seller report, like for instance sending it the name of the buyer, etc. Are we sure it's not an issue with Privacy? 5) More generally, what's the use case for allowing the reports to be sent to the seller ? It seems that sending arbitrary info to both buyer and seller will easily allow for fingerprinting of the user. Example : generate a new UUID in event data, sent it in both buyer and seller report --> buyer and seller can collude and join data based on this UUID

Any thoughts on this?

jeffkaufman commented 3 years ago

Some thoughts as someone who's been following this and suggested something similar https://github.com/WICG/turtledove/issues/99#issuecomment-800243564:

  1. Are 'eventData' in 'reportEvent' fully arbitrary data, or is it supposed to be within a list of possible data ?

My interpretation is that it is arbitrary data.

  1. What is preventing "eventData" payload from containing PII info? Is it that because of the Fledge "micro targeting protection", it won't be possible to generate an ad in the fenced frame that contains data that is too specific ?

That sounds right: eventData comes from reportEvent, which is inside the fenced frame. The fenced frame was created via a renderUrl which:

a) had to pass through k-anonymity filtering, and b) is already available to both buyer and seller reporting through browserSignals

  1. Based on the code, it's possible to send buyer-centric data (encoded in eventData) to the seller report, like for instance sending it the name of the buyer, etc. Are we sure it's not an issue with Privacy?

What privacy issue do you see? For example, the seller already knows the name of the buyer through browserSignals. interestGroupOwner.

  1. More generally, what's the use case for allowing the reports to be sent to the seller ? It seems that sending arbitrary info to both buyer and seller will easily allow for fingerprinting of the user. Example : generate a new UUID in event data, sent it in both buyer and seller report --> buyer and seller can collude and join data based on this UUID

The buyer and seller can already join their event-level reports. One way to do this would be for the seller to generate an event id in reportResult and put it in signalsForWinner which would then be available to reportWin in sellerSignals. Alternatively, the buyer or seller could add an event id to perBuyerSignals, which is available to both reportResult and reportWin.

In general, if the buyer and seller can receive the same information it means you don't have to reconcile diverging interpretations of what happened in the browser and no one needs to take someone else's word for what happened. For example, if a buyer and seller agree to transact on a CPC basis then it's best if they can both trigger their reporting off of the same "click" event.

vincent-grosbois commented 3 years ago

Thank you for the quick answer! Glad we agree on points 1 to 4 :)

Concerning what you're saying on point 5, now this is getting me confused on Fledge... If a seller sends the "user 1st party id on seller site" in perBuyerSignal, the buyer will have access to this info. But as he also has access to this in reportWin, he can just send this info back on his server (along with the "buyer 1st party id" he already had in userBiddingSignal). Now the final buyer server receives a message containing both 1st party id, thus is able to "fingerprint" the user

Am i missing something?

jeffkaufman commented 3 years ago

I think what you're missing is that userBiddingSignals and all other user-level advertiser information is available only to generateBid, and not reportWin or reportResult?

vincent-grosbois commented 3 years ago

Ah! Indeed from the initial interest group, we only get interestGroupOwner and interestGroupName from browserSignals it seems.

Now the question is the following: what's preventing a buyer from generating a new interestGroupName for each user on his website ? basically "interestGroupuserId". From the report sent via reportWin, you retrieve the interest group name (containing user id on buyer side) and some arbitrary payload from the seller (ie the user id from seller side). I guess this wouldn't work again, due to micro-targeting protection?

jeffkaufman commented 3 years ago

In https://github.com/WICG/turtledove/blob/main/FLEDGE.md#5-event-level-reporting-for-now I see:

The renderUrl can always be included since it has already passed a k-anonymity check, for example, but the winning interestGroupName will only be present if it has exceeded the threshold which gates daily updates.

If you used unique values for interestGroupName they wouldn't meet the threshold, and so would not be available to reportWin or reportResult

vincent-grosbois commented 3 years ago

Thanks I see ! So if I summarize:

Indeed it seems like in that case there is no possible "leak" of info :) The only comment I have with this is that it's heavily biased towards seller. You could imagine that this reporting mechanism would have been done completely reversed, ie that in the final report you can add as many buyer-side info as possible, but you can only find out on seller-side the domain where the display occured.

appascoe commented 3 years ago

I think what you're missing is that userBiddingSignals and all other user-level advertiser information is available only to generateBid, and not reportWin or reportResult?

I would like to comment that this is a bit of a sticking point. In order to do any real machine learning or optimization, buyers need to be able to pass at least some of the userBiddingSignals into reportWin and reportResult. On our side, we were expecting that the Aggregate Reporting API would provide some k-anonymity checks to prevent any PII from leaking.

This issue was raised in https://github.com/WICG/turtledove/issues/145 , but we haven't received a concrete response yet.

vincent-grosbois commented 3 years ago

I may be wrong, but my assumption is that the real reports that will allow buyers to do machine learning etc are the reports that will occur through the measurement API . Either using aggregate reporting API or event-level conversion API. So to me that's 2 other sets of reporting API that will exist and be compatible with Fledge, in addition to the reporting system we are discussing here (reportWin and reportResult), that is purely-fledge

appascoe commented 3 years ago

I've been operating under the assumption that, in the long run, reportWin and reportResult are entry points into the Aggregate Reporting API, not a separate mechanism. Of course, the Aggregate Reporting API isn't ready yet, and so in the interim, these provide more granular data.

jeffkaufman commented 3 years ago

@appascoe you might be interested in https://github.com/WICG/turtledove/issues/164 where we're asking for aggregate reporting during generateBid and scoreAd?

appascoe commented 3 years ago

Yeah, I'm interested, but not sure it solves the problem. Being able to do some logging from those functions is certainly useful, but what I would say is the most important place to submit feedback for aggregation is report_win. These other functions are too far up the chain. This is because a win is a strong filter for the user having a chance to interact with the ad, e.g. predicting click performance is predicated on the ad being displayed.