Going deeper: FLoC Index, Governance & Business logic

ascentitall commented 4 years ago

The purpose of this thread is to further define the universe of available FLoCs, how they are assigned based upon user behavior, and the level of transparency made available to websites (and ultimately, the ad ecosystem).

FLoC Index Given the centralization of FLoC assignment occurring within the browser, there is expected to be a finite set of FLoC values that are standardized - ie - the FLoC Index. It seems reasonable to begin with the IAB Taxonomy, which includes over 600 content/interest categories with up to 4 levels of hierarchy.

FLoC Governance It is assumed that the FLoC Index will be modified over time; FLoCs may be requested by the business community and privacy advocates may request certain FLoCs be removed (for example, Health-related labels). Maintenance of the FLoC Index is expected to be governed by the browser, and could be a new component within browser version updates. Browsers will need to allow for ongoing consumer and business community dialog related to FLoC Index changes.

Business logic Apart from the FLoC Index, the below specifics should have further definition:

how many FLoCs can a user belong to at a moment in time?
how many FLoCs can a user belong to over a period of time (eg 30 days)
how many behaviors or page view events are required for a user to be assigned to a specific FLoC? Are page engagement (dwell time, scroll depth) also considered in the assignment of a user to a FLoC?
will the time period or recency of FLoC assignment to a user be exposed?

rodolpheAV commented 4 years ago

As far as I understand the Floc proposal, the aim would not be for the browser to expose multiple characteristics of the browsing history to the ad ecosystem but rather exposing a single obfuscated id shared by multiple other users who have common browsing history (and interest).

This means that you would not receive any interest informations like "Automotive" "Business and Finance" and "News and Politics International News" from a browser, but rather a floc like "B14F" meaning that all people with this same floc have been browsing the same websites or webpages in the past (and may react the same way to some specific ads displayed to them).

I could be wrong and I heard some other people from the industry with the same questions than yours so I think the proposal would have to be clarified about the number of floc available for a browser and what a floc will look like.

eddiec1234 commented 4 years ago

Something has to decide which floc a site visit belongs to. I'm assuming the visiting site will need to send a floc category to the browser so it knows how to log. Else a server component will need to created to decide since I don't see that happening at the browser level for the millions of websites someone could visit.

ascentitall commented 4 years ago

Thanks @rodolpheAV for the important clarification. In order for FLoC to be adopted by industry, there needs to be clarity on how they are assigned to users.

@eddiec1234 my understanding is that the browser sets the FLoC, it is not provided by the website. (This convention of website setting a browser value is more akin to Turtledove). I'd also say that the site isn't the driver to assign a user to a FLoC, but rather the URL or specific page(s) a user visits.

@jkarlin Bottom line - FLoC is a black box and given the absence of business logic, its unclear why this proposal even exists (in terms of use cases). What use cases is FLoC trying to achieve? Clarity is needed on basic questions like:

how many possible FLoC values are there?
can a user belong to multiple FLoCs at once?
how frequent can a user's FLoC be updated?
will any other variables, such as a time-based value or confidence co-efficient be exposed for the user's FLoC?
if user A visits a new URL they haven't visited prior, and the page has rich content for Personal Finance & 401k, should the site assume that the current FLoC represents Personal Finance/401k interest? What if user B consistently visits sports sites and also visits the Personal Finance page - would user B have a different FLoC, unrelated to Personal Finance? (One would expect a different value). In other words, a site cannot assume that the FLoC of a visitor on a page that traditionally signals interest/intent is a FLoC that necessarily represents this interest. Without a co-efficient, all FLoC IDs will appear the same and it will not be feasible to run Machine Learning in an effective way to build an interest-based audience...

michaelkleber commented 4 years ago

Hi folks, thank you for the questions! Some things you're asking about are inherent to the idea of FLoC, while some of them are open questions where there are multiple plausible answers.

FLoC is about grouping lots of similar people together, and then letting an advertiser target a whole group (a "flock"). Any person is in one and only one flock at a time, and that flock is the output of some ML whose input is based on what you've browsed. What flock you're in generally stays the same as time goes by, but may change as your browsing behavior changes; details on lookback window and hysteresis need to be worked out.

The question of what should count as "similar" is a giant research problem, and we have some ideas we've been experimenting with, but the right answer will surely take some time to figure out. At least for starters, we were not thinking of any standardization, taxonomy, or inherent meaning to the labels. The way to figure out what a particular flock "means" is to observe what the people in that flock do: gather a bunch of data about the IAB Taxonomy of web pages and what flocks visit those pages, and you can form opinions about the interests of the flock.

Other approaches here seem reasonable, though: every page making assertions about its IAB Taxonomy could be a different way to seed the notion of user similarity. Or even users checking off which taxonomic categories they want to proclaim that they are interested in — I've heard some browsers say they like that idea better than something based on observed behavior. In any of these cases, there would still be a FLoC-style clustering afterwards, so that many people send the same FLoC header, to ensure that it's not useful for tracking.

michaelkleber commented 4 years ago

P.S.: @ascentitall and @eddiec1234, would you consider adding names and affiliations to your GitHub profiles? It would be great to make it clear who all is participating in the conversations about these new APIs.

kdeqc commented 4 years ago

It seems like classifying people as being in a single flock is treating them like they're fairly one dimensional. For example, there's what I am interested in for work and then there's what I'm interested in personally. I use the same laptop, and ideally I wouldn't want to see ads that are targeted to my personal interests while I'm working, and vice versa. Would flocks be updated quickly enough that I could be in one flock during my work day, and another one later on? It's a similar type of situation for people who share a computer - so maybe something taking into account the clustering of data?

I also think that exclusionary flocks could be very useful. My example here would be that I like watching videos of people making desserts, but I am also diabetic, and spend a lot of time reading diabetic-friendly recipes. I watch the dessert videos because I sort of live vicariously through them, but I don't want to see ads about buying desserts or the tools to make them. So I'd hope that ads could be targeted to dessert video watchers flock as long as they also weren't in the diabetic recipe readers flock.

michaelkleber commented 4 years ago

Hi Kris,

One key design requirement is that your flock cannot be useful for tracking individuals across sites, and the requirement that flocks include a large number of people is an important part of meeting that need. Rapidly updating your flock, or providing the cross-product of two different contexts of interests (home vs work), both dramatically shrink the size of the identified population, and so they would increase the tracking risk.

Another key requirement is that your flock not reveal anything sensitive about you, and medical conditions certainly seem sensitive to me. So a flock that revealed you were likely to be diabetic is the kind of thing we should be actively avoiding, not building for.

All of the targeting types that you described might be reasonable use cases for TURTLEDOVE instead: with that API, your interest groups may affect the ads you see, but they never out you to the server or the surrounding web page. But the design of FLoC doesn't seem like the right fit here.

kdeqc commented 4 years ago

Hi Mike! I do appreciate that it doesn't take a lot of data points to identify a person - but the flip side of that is that if the sizes of these flocks are also too large, then it's no longer really effective interest-based advertising either (and by effective, I mean it's not only not helping the advertiser but can also be annoying the user, too). I can see situations where TURTLEDOVE could be used to mitigate the problem, tough - but I do think there are situations where it wouldn't be.

jdwieland8282 commented 4 years ago

Hi @michaelkleber, thanks for your comments. If I understand what you are saying correctly, we should think of FloC's not the way we think of traditional interest based audiences like "auto intenders" or "house hold income of x" but rather as personas or personalities. The members of FLoC "xyz" share more than just the same page visits but rather the same sentiment, outlook, personality, etc. Is that closer to what a true FLoC is? If a lookup of FLoC id to FLoC meaning is never released then the advertising community is left to decipher what each FLoC id is interested in and respond accordingly.

michaelkleber commented 4 years ago

That seems like the right idea, yes — though of course there's a big open question of how similar the people in a flock really would be.

ascentitall commented 4 years ago

Making more sense now.. any indications on what will be the total number of FLoCs - perhaps by order of magnitude - 100, 1,000, 10,000, 100,000?

michaelkleber commented 4 years ago

We certainly want flocks to have thousands of people in them.

When we wrote the explainer, I included Sec-CH-Flock: 43A7 because I was thinking of it as four hex digits long. A 16-bit key would support up to 64 thousand flocks; if each one included a few thousand people, then we'd be talking about a web using population in the hundreds of millions. There's nothing locked in about those particular numbers, but they generally seem reasonable to me.

sukria commented 4 years ago

Hi,

how many possible FLoC values are there?

If I extrapolate from the FLoC document, I tend to assume 16^4 as the Sec-CH-Flock header looks to be composed of four hexadecimal digits (hence, 65536 flocks max).

But as @michaelkleber said in this thread, this doesn't seem to be decided yet.

rodolpheAV commented 4 years ago

A 16-bit key would support up to 64 thousand flocks; if each one included a few thousand people, then we'd be talking about a web using population in the hundreds of millions. There's nothing locked in about those particular numbers, but they generally seem reasonable to me.

Understanding that nothing is decided right now It seems quite restrictive to me. There are currently 4 billions people browsing the web, with 65k Floc, each of them would be attributed to 60k people. Regarding the increasing number of people connected and that they are using multiple devices/browsers, using 4 or even 5 hex digits seems too restrictive.

@michaelkleber do you have a more precise estimate of the number of expected people per FLOC based this proposal ? a few thousands should be understood as 3k, 10k, 50k or 100k ?

michaelkleber commented 4 years ago

Right: Not decided yet, but I think we will start with 64K flocks, and I think we will make sure each flock has at least 5000 people in it, just because we need to start somewhere. Everything is subject to change based on further discussion, especially as we get into origin trial and receive feedback about how well it meets needs.

sokn78 commented 4 years ago

Thanks for all your precisions @michaelkleber What would be the general approach concerning the segmentation method to be used to gather users into the FLoCs ? Unsupervised clustering in high dimensions is known to be a challenging problem. It is unlikely that we can build homogenous FLoCs with respect to all possible aspects of its included users navigation. Would you rather give to each FLoC a particular orientation, gathering users sharing interests in a FLoC specific sub-area of the whole navigation space ? I do not mean it would have to be explicitly labeled, as discussed in previous posts. I understand it is the advertiser's job to find relevant FLoCS for its own purpose. Nevertheless, a generic navigation-based similarity would probably lead to small variance in interest for a specific campaign or product among FLoCs.

michaelkleber commented 4 years ago

@sokn78 This is an excellent question, and one which @jkarlin and I will be working on along with researchers.

TheMaskMaker commented 3 years ago

Clarification question: to my understanding, the Fledge proposal allows advertisers and publishers to create flocs, or cohorts, but this thread implies a limited number of pre-defined flocs. Is this a difference between the proposals and could you provide more details on how this works?

michaelkleber commented 3 years ago

@TheMaskMaker FLoC is about the browser creating a highly limited number of cohorts that everyone can use; each browser is in only one of them. FLEDGE is about individual parties (advertisers, ad tech, publishers) creating their own audiences, whose use is controlled by the creator; each browser can be in lots of them.

They are intended to be complementary, with different ad targeting use cases perhaps being better served by one or the other API.

TheMaskMaker commented 3 years ago

@michaelkleber Thank you!

WICG / floc

Going deeper: FLoC Index, Governance & Business logic #10