WICG / floc

This proposal has been replaced by the Topics API.
https://github.com/patcg-individual-drafts/topics
Other
936 stars 90 forks source link

Accessing the meaning of a cohort id #27

Closed cgouvernet closed 3 years ago

cgouvernet commented 3 years ago

FLoC proposal gives an example of what could be a cohort id: "43A7".

Will the underlying "meaning" of the cohorts be public ?

michaelkleber commented 3 years ago

This is an interesting question! We're just at the stage of experimenting with how to cluster people into cohorts in the first place. For some ways of building cohorts, it might make sense to talk about the "underlying meaning", and for other ways, that might not be a question the browser has any way to answer.

As far as users of FLoC are concerned, the right way to understand what a particular cohort "means" would be to observe all of the ad requests from people in that flock. This is similar to the way that a particular 3rd-party cookie ID doesn't have an inherent meaning but can give you information by observing behavior over time — except with a flock, you observe the behavior of a collection of thousands of people, instead of one.

dmarti commented 3 years ago

This question would have to be resolved in order to use FLoC on any site where audience discrimination could become a regulatory issue for the site owner. For example, if a FLoC cohort ID turns out to map to users of assistive technologies, or to users with a specific health condition, it would be a risk for a site to use FLoC on any page where employment or housing ads might be shown to users in the USA. Related issues:

michaelkleber commented 3 years ago

@dmarti Unfortunately, a core lesson of AI Fairness research is that any user of an ML signal needs to think about the questions you're asking, even if the raw signals have a "meaning" that seems ostensibly free from bias. This will surely be the case for flock, just as it is the case for e.g. coarse-grained geolocation, which might be correlated with race.

So while I completely agree that we will need to (1) pick a clustering algorithm and (2) make its "meaning" as clear as possible, this won't absolve potential users of the API from thinking about the question as well.

(For more on this subject, do check out the Sensitive Categories and Excluding Sensitive Categories paragraphs in the Explainer.)

dmarti commented 3 years ago

@michaelkleber Very good points. Any site that uses FLoC does need to be aware of a large set of bias-related questions, and possibly provide explanations written to address the concerns of that site's users.

In the case of FLoC, an API user can be a site that calls getInterestCohort, a site on which FLoC training occurs, or a site with both. If FLoC training is opt-in for sites, then the owner of each site can go through this process at their own pace, and only turn on FLoC training in production when they have addressed all relevant AI ethics, transparency, and regulatory questions.

One option might be to require a page under .well-known including a link to a redirect or link to a FLoC explainer for that site. The FLoC classifier could check before training, and use the existence of the .well-known as an assertion that the site has gone through this process.

Simon-J-Harris commented 3 years ago

This is an interesting question! We're just at the stage of experimenting with how to cluster people into cohorts in the first place. For some ways of building cohorts, it might make sense to talk about the "underlying meaning", and for other ways, that might not be a question the browser has any way to answer.

As far as users of FLoC are concerned, the right way to understand what a particular cohort "means" would be to observe all of the ad requests from people in that flock. This is similar to the way that a particular 3rd-party cookie ID doesn't have an inherent meaning but can give you information by observing behavior over time — except with a flock, you observe the behavior of a collection of thousands of people, instead of one.

This is something I've been thinking of quite a bit & I'm struggling to make sense of it, hopefully someone can help me here. Buyers need to understand which interest groups they are targeting, because their clients (advertisers) want reach certain interest groups. As an example in the Google Ads platform a chain of restaurants might want to target the following interest groups:

/Food & Dining /Food & Dining/Coffee Shop Regulars /Food & Dining/Frequently Dines Out/Diners by Meal/Frequently Eats Breakfast Out

Will buying platforms like Google Ads be able to understand the interests of a FLoCs so buyers can target them per the above? Or instead will a platform only be able to allow buyers to target groups that respond well to ads for /Food & Dining?

michaelkleber commented 3 years ago

If an advertiser wants to target "/Food & Dining/Coffee Shop Regulars", then an ad buying platform will need to have some way of deciding which flocks are good enough matches for that intent.

In a 3rd-party cookie world, ad techs observe the browsing behavior of each cookie, and decide which cookies look like they are "/Food & Dining/Coffee Shop Regulars" type. The analogous approach is ad techs observing the behavior of each flock, and decide which of them seem like "/Food & Dining/Coffee Shop Regulars".

Simon-J-Harris commented 3 years ago

If an advertiser wants to target "/Food & Dining/Coffee Shop Regulars", then an ad buying platform will need to have some way of deciding which flocks are good enough matches for that intent.

In a 3rd-party cookie world, ad techs observe the browsing behavior of each cookie, and decide which cookies look like they are "/Food & Dining/Coffee Shop Regulars" type. The analogous approach is ad techs observing the behavior of each flock, and decide which of them seem like "/Food & Dining/Coffee Shop Regulars".

Thank you for the quick response much appreciated. Just to double check I'm correctly understanding this you're saying a if a buying platform sees that a FLoC (e.g "43A7") visits lots of pages it classifies as "dining out & coffee shops" it might decide that FLoC "43A7" is a good fit for "/Food & Dining/Coffee Shop Regulars" & avail that as a segment to its buyers? Thanks again for the quick response & for helping improve my understanding of things.

michaelkleber commented 3 years ago

Yup, that's it.

antlauzon commented 3 years ago

How exactly does a user's floc id change? Is there any transparency into how specific signals alter the floc id?

michaelkleber commented 3 years ago

When you ask the browser for a cohort, you get both a number and a "version" string, which indicates the FLoC clustering algorithm used to generate it. The answer to how specific signals alter the floc id depends on that algorithm.

See this page for a description of the specific clustering approach used in the first Chrome origin trial.