WICG / turtledove

TURTLEDOVE
https://wicg.github.io/turtledove/
Other
519 stars 221 forks source link

Storing first party data in trustedBiddingSignalsKeys #1062

Open vanshika0812 opened 6 months ago

vanshika0812 commented 6 months ago

We were exploring KV server to implement frequency capping across Interest Groups (IGs) in Protected Audience, considering the limitation of frequency capping at the IG level only with browserSignals.prevWinsMs. Additionally, is it possible to store first-party data in trustedBiddingSignalsKeys, and are there any k-anon restrictions associated with this approach?"

michaelkleber commented 6 months ago

There is no k-anon restriction in the keys sent to the KV service. However, do remember that the values it stores can be queried by anyone who knows the key. Quoting Section 3.1,

...so to prevent potentially leaking user information, keys should be either: not individually identifying (e.g. applying to many people, perhaps to all people who visited an advertiser page and that an ad campaign might show to) or unguessable (e.g. using random identifiers that are assigned at interest group join time and known only to the caller of joinAdInterestGroup). They should not be uniquely identifying and use guessable keys (e.g. hashed email address, name, or phone number).

But also, if you're talking about frequency capping across different IGs joined on the same site, then this seems reasonable — but perhaps we can improve the Protected Audience previous wins data so that you don't need the KV server for it.

vanshika0812 commented 6 months ago

@michaelkleber The structure of prevWinsMs after generalising it to capture wins across all interest groups would involve aggregating wins from various IGs into a unified dataset. Here is a proposed structure for the modified prevWinsMs:

 "allIGs": [
    {
      "IGId": "IG1", // this is just a placeholder for now, can modify the id part.
      "wins": [
        {
          "timeDeltaMs": 12345,
          "ad": {
            "renderURL": "https://example.com/ad1",
            "metadata": { 
              // Relevant metadata fields
            }
          }
        },
        {
          "timeDeltaMs": 23456,
          "ad": {
            "renderURL": "https://example.com/ad2",
            "metadata": { 
              // Relevant metadata fields
            }
          }
        },
        // Additional wins for IG1
      ]
    },
    {
      "IGId": "IG2",
      "wins": [
        {
          "timeDeltaMs": 34567,
          "ad": {
            "renderURL": "https://example.com/ad3",
            "metadata": { 
              // Relevant metadata fields
            }
          }
        },
        {
          "timeDeltaMs": 45678,
          "ad": {
            "renderURL": "https://example.com/ad4",
            "metadata": { 
              // Relevant metadata fields
            }
          }
        },
        // Additional wins for IG2
      ]
    },
    // Wins for other IGs
  ]
}

Also in the official documentation, there is no where mentioned about the data retention for prevWinsMs. Could you tell till what time the data for wins history would be persisted in prevWinsMs?

MattMenke2 commented 6 months ago

As with everything else FLEDGE, the lifetime limit is 30 days (even if an IG is re-joined so it lives longer than 30 days, prevWins, joinCount, and bidCount will forget actions that happened more than 30 days ago).

michaelkleber commented 6 months ago

Hi @vanshika0812, could you update your GitHub profile with your name and affiliation? If you're going to be proposing contributions to the API, then you will need to be covered by the WICG's contributor license agreement; see https://www.w3.org/community/wicg/ to join.

The structure of prevWinsMs after generalising it to capture wins across all interest groups would involve aggregating wins from various IGs into a unified dataset.

The privacy model of the Protected Audience API involves each call to generateBid() having access data from a single site other than the one where the ad is going to appear. If I understand what you're suggesting, that collection of previous wins across all of the bidder's Interest Groups would involve having data about many different sites where the user was added to different IGs. That's not allowed in our privacy model.

vanshika0812 commented 6 months ago

@michaelkleber I meant gathering previous wins data for a specific site across all its Interest Groups under prevWinsMs, focusing on individual site-specific IGs rather than aggregating data across the bidder's IGs from different sites.

MattMenke2 commented 6 months ago

Do you mean sites where an IG was joined, or where an IG bid in an auction? The latter would leak data, as IGs themselves can be joined from multiple sites, so having access to URLs and metadata from other IGs, even with the same owner, is potentially cross-site data.

michaelkleber commented 6 months ago

@vanshika0812 Got it! OK, as long as you're talking about all the IGs that were joined on a single site, you're right that there is no new privacy behavior here.

Let me ask, though: Have you considered just using a single IG, rather than having the same browser join multiple IGs on the same site? Over the past few years of design discussion, many ad techs seem to have decided that the one-IG-per-join-site model is a good way to make the API meet their needs.

nikhilrajpal commented 5 months ago

@michaelkleber one-IG-per-join-site model restricts the DSP capability of

michaelkleber commented 5 months ago

I'm afraid I don't really understand either of the capabilities you described. But note that a single IG could absolutely contain bidding code for multiple different strategies, and could use different strategies for different ads the group might bid on.