AdRoll / privacy

Collection of privacy specs and discussion
Other
16 stars 8 forks source link

Trail store impression entry information #3

Open Pl-Mrcy opened 3 years ago

Pl-Mrcy commented 3 years ago

Hi,

In SPURFOWL, you give this example of an entry in the trail store for an impression:

+------------+----------------+------------+------------+-----------+
| type       |   timestamp    |  domain    | user agent |   adv     |
+------------+----------------+------------+------------+-----------+
| impression   15:00:00 25/11    news.com     chrome      shoes.com |
+-------------------------------------------------------------------+

If I understood the proposal correctly, news.com is the one populating the store in this case (same-origin policy).

In the discussions around privacy sandbox proposals (including TURTLEDOVE), we usually consider that displays are rendered in fenced frames and that the publisher (news.com) can't know the content and the redirection domain of the ads on its estate. The opposite is also true.

How can the publisher populate the trail store - the adv field in particular - in this case? Who could do it since the actor would need to have both the publisher and the advertiser information?

Noeda commented 3 years ago

I think in this case the proposal text might have been a bit unclear.

By default yes, same-origin policy applies. However, if shoes.com could fill in trail store while user is on shoes.com then the proposal would not have much point. shoes.com already can tell what the user is doing on their web site using first party data.

In the part that you pointed out, the trail store would still be filled by shoes.com (or most likely, a third party that shoes.com has trusted).

1) User visits news.com, which has ads. dsp.example.com wins an impression and it is shown on the page. 2) The ad is rendered in an <iframe> that comes from dsp.example.com. The trail store is populated from the <iframe>, that is sourced from dsp.example.com. dsp.example.com knows the advertiser so in that way it knows how to populate adv in the trail store. 3) Later on user visits shoes.com. shoes.com establishes trust to dsp.example.com. dsp.example.com can now fill in trail store with first party data from shoes.com through a tracking pixel. Later on, when reports are computed, this allows joining first party data with impression data (but only inside the SPURFOWL sandboxed JavaScript functions).

+---------------+
| news.com      |
|               |
| .........  <--+---- news.com content
|               |
| ...  +------+ |
|      |  ad  | | <---- <iframe> ad sourced from dsp.example.com
| ...  +------+ |           |
|               |           v
| ..........    |     Trail store is updated by this iframe, for dsp.example.com
|               |
+---------------+

So in this sense, in your example the trail store is really populated from dsp.example.com from an <iframe>, rather than news.com doing it. Like you said, news.com wouldn't even know what ads are being shown, considering privacy sandbox proposals.

Pl-Mrcy commented 3 years ago

Thank you for your answer. It is a bit clearer for me now. Yet, based on you describe, I am still not sure how the <iframe> can populate the domain field in the trail store this time. It is the opposite side of the same problem I mentioned earlier: dsp.example.com doesn't know the publisher site it is printing an ad on at this point, does it?

Noeda commented 3 years ago

It is the opposite side of the same problem I mentioned earlier: dsp.example.com doesn't know the publisher site it is printing an ad on at this point, does it?

I think you are right, and I think the proposal is a bit handwavy in this respect. I see two ways forward here:

1) We add a mechanism that allows the knowledge that publisher is news.com to enter the trail store for dsp.example.com, maybe with some special tags on the fenced iframe and the <a> tag. Need to think a bit how this would work exactly. 2) We won't populate publisher domains to trail store at all. The end result is that trail store will know when impressions happened and what ads were shown, but we won't know where.

I think 1) would be more useful; it would allow building metrics, that respect user privacy, while at the same time letting advertisers know on which sites their impressions were shown on. For example, we could have some metric that gives the message "most of your impressions were shown on news.com and they click more on news.com than they click on clothes.com" or something in similar spirit.

I'm going to try clarify the proposal text to call out to this detail.