FLoC Cardinality Use Cases

Sghazzawi commented 3 years ago

On the WICG call on 4/1/2021, Michael Kleber said that Google is not currently planning on exposing any sizing information with regard to FLoCs. This decision directly impacts a fundamental marketing use case. Marketers need to know the approximate size for planning and optimizing an advertising campaign. There is a huge difference for a marketer targeting an audience with 2,000, 200,000, or 2,000,000 devices.

Michael Kleber and others (https://github.com/WICG/floc/issues/87) suggest that you may be able to estimate a FLoC size by the relative number of observations of FLoC. However, there is no privacy-conscious way of differentiating between a browser with “FLoC: 123” refreshing 1000 times vs 1000 unique browsers within “FLoC: 123”.

Given that the FLoC spec requires a certain threshold of browsers to be viable, the actual number of devices in a FLoC must be calculated by some mechanism. What would prevent us from exposing this mechanism to return sizing information? If this sizing information is an estimate or an order of magnitude, this is still useful for the marketing use case.

It is unclear what the threat model of exposing exact or approximate sizing would be. We don’t believe that exposing an estimate of FLoC size damages privacy.

Given the lack of viable fulfillment of a valid use case, what would be the rationale for excluding this valuable feature?

In summary:

Marketers need to know size for planning and optimization
- there’s a huge difference for a marketer targeting an audience if that FLoC is 2,000 vs 200,000 vs 2,000,000
There is no way to estimate this number
- Even with a behavioral stream, there is no way to differentiate between a browser in FLoC 123 refreshing 1000 times or 1000 unique browsers in FLoC 123
The cardinality of a FLoC need not be exact to be useful
It is unclear what the threat model of exposing these numbers would be. We don’t believe that exposing an estimate of FLoC size damages privacy
FLoC currently exposes a threshold mechanism to determine if any particular FLoC is big enough to be exposed. This mechanism could be reused to return an estimated size of the FLoC

michaelkleber commented 3 years ago

However, there is no privacy-conscious way of differentiating between a browser with “FLoC: 123” refreshing 1000 times vs 1000 unique browsers within “FLoC: 123”.

Ah, counting the number of distinct people you see in some time interval with some flock is the sort of thing we ought to enable, using aggregate measurement.

That is, the tool that Chrome would use to count cohort sizes is the same tool that everyone else should be able to use also. Marketers should absolutely be able to count the people in their audience (or intended audience) that way.

Sghazzawi commented 3 years ago

If I understand you correctly, you're suggesting that we use the aggregation API (https://github.com/WICG/conversion-measurement-api/blob/main/AGGREGATE.md) to get the FLoC sizing information. Why wouldn't the FLoC API support getting FLoC sizes directly? Why necessitate using a second technology for fulfilling a major FLoC use case?

PedroAlvarado commented 3 years ago

I wonder if we could have "well known" aggregate reports about FloCs that are powered by the mechanics available in https://github.com/WICG/conversion-measurement-api/blob/main/SERVICE.md

From there, I can imaging a few possible output reporting paths:

Browsers receive FloC aggregate reports, say once a day, and you can then document.interestCohort().countDistinctEstimate(). (Preferred)
The aggregate service or a derivative up the stack adds support for reads of global Floc reports via API for interested parties. (pull-based approach)
A variant of option number two using a push based approach where interested parties register a call back for report delivery( not great )

michaelkleber commented 3 years ago

Can you explain in what way the aggregation API does not meet your needs?

Learning a total count of people in a particular cohort seems like a strange request to me. How is it helpful to know that cohort 1234 contains 5,000 people if only 1,000 of them are people you have an opportunity to show an ad to? Or only 2,500 of them are people in the country your advertiser is interested in?

sheakevin commented 3 years ago

For our digital marketing customers, one of the first metrics they will look at prior to targeting an audience is how large it is. Our customers need to know the potential reach of their advertising campaigns to plan and assess its efficacy.

If FLoCs become the main way marketers target consumers in the future having an estimated size tightly integrated to the FLoCs is essential.

michaelkleber commented 3 years ago

Yes! @sheakevin I definitely want to support the "potential reach of their advertising campaigns" use case. It just seems like the aggregate measurement API is a better fit for that goal than a report on FLoC cohort sizes that includes people whom their advertising campaign cannot reach.

PedroAlvarado commented 3 years ago

@michaelkleber There seems to be some dissonance between the aggregate measurement API and FLoC in the context of this thread that may contribute unnecessary friction to using FloC for advertisement purposes.

Indirect vs. Direct Presence Requirements on a Page. On the one hand, FloC allows any participant in the rendering of a FLoC-enabled page to learn about a given FloC. This learning process can happen directly(running JS on the page) or indirectly(HTTP requests/redirects). On the other hand, the aggregate measurement API requires a "direct"/JS presence on the page to use the corresponding APIs and measure. Requiring the execution of JS to learn the size of a cohort size can be an unnecessary restriction to access FLoC information.
Aggregate Reporting Requires JS deployment Scale To estimate a cohort size, the aggregate reporting API not only requires JS execution but also wide deployment of JS code. This is not great for how current AdTech works were much of the presence/reach of networks is via redirect flows.
Restrictions on Aggregate Reports For the most part, it can be said that if FLoc is enabled on a given browser and page, all participants can access FloC information. The same is likely not to be true of aggregate reports. In different open forum conversations, it has been said that the delivery of aggregate reports may be restricted by origin/destination rules, effectively limiting access to aggregate reports data. It seems fitting that a FloC cohort size has the same level of access as the FloC identifier and version.
Browser Scope vs. Page Scope Information FLoC sits fundamentally at the browser instance level, whereas the aggregate report API sits at the page/web-property level. It seems fitting that all general FLoC information, including cohort size, should be handled and made available at the browser scope level instead of the "page-level" scope. In this way, it would not be a hard requirement of every page out there to use the aggregate reporting APIs to measure the FloC cohort size. This can reduce the barrier to entry.

The above points aside and in the context of advertisement planning, it seems that using FLoC in any sensible way requires the use of the Aggregate Reporting API as knowing the reach/cohort-size is a critical piece of information during the advertisement process. This tight coupling can make a case for document.interestCohort().estimateSize() as a starting point while also providing signal to complement other forms FloC-based measurements.

WICG / floc

FLoC Cardinality Use Cases #91