WICG / floc

This proposal has been replaced by the Topics API.
https://github.com/patcg-individual-drafts/topics
Other
934 stars 90 forks source link

Unsupervised Learning #12

Open kaprasad opened 4 years ago

kaprasad commented 4 years ago

The Problem

The FLoC proposal speaks to using federated learning to create groups of users with similar interests. An intuitive example is classifying users viewing websites selling cars as “in-market auto”. These classification systems are often built two ways, supervised and unsupervised. The example above is a naïve unsupervised approach -- websites are labeled as being related to auto sales and users are added to the audience if they have enough activity on those classified websites. A supervised model would take a label set – either people that recently purchased a car, or people that visited known auto sites – and find other similar users, based on their internet browsing behavior. An advantage to the supervised approach is that the model allows for discovery of other related behavior that may not have a similar classification (e.g., reading news about interest rates).

The current proposal lacks detail around who determines the algorithm and how it is deployed. Based on the available description, we are inferring only one unsupervised model will be available; users with ‘similar’ behavior will be placed into cohorts and a cohort ID will be available during bidding. Without information into how the cohorts were made we cannot create the naïve model above -- specifically, what websites contribute to each cohort and the tradeoffs between recency of activity, frequency of activity, and volume of activity. Moreover, unless we have reporting about conversions at the flock-level, we cannot use the supervised approach to discover flocks relevant to a given advertiser. While there are methods by which we can derive the contributing websites, these mappings would constantly need to evolve as the FLoC model changes.

Additionally, it is unlikely that the one-FLoC-for-all will work for all advertisers. In the auto targeting example, there is no guarantee the algorithm will associate users with similar auto-viewing behaviors into the same flock. In that case, it is likely a small percent of each flock will contain those users – rendering that method of targeting useless. Most users in a given flock will not be interested in buying a new car, resulting in a waste of advertiser money and a poor user experience. Even with robust conversion reporting and a sophisticated algorithm, advertisers will lose the ability to find their relevant audience.

Publisher & User Impact The FLoC proposal as it stands today will favor larger publishers. Algorithms used by ad tech companies will start to index higher on publishers with higher traffic, resulting in more accurate targeting on their inventory. This will lead to a drop in revenue for the smaller publishers. To make up for the lost revenue, publishers will either have to show more ads per page or erect paywalls, neither of which are ideal outcomes for the end user or the future of the internet.

michaelkleber commented 4 years ago

Hi Kanishk,

I agree that "in-market auto" is probably not a good use case for FLoC. The fact that each person is in only a single flock, and that the clustering is unsupervised, means that any specific audience that you think of is not likely to be concentrated in a small number of flocks.

If you want to build your own audience, then TURTLEDOVE is our proposed API. I see you've opened https://github.com/michaelkleber/turtledove/issues/26 there, so I'll say more over on that issue.

FLoC is really about noticing people who have something roughly in common — what parts of the web they tend to browse — which makes it a signal that you might use in ways similar to using coarse-grained demographics or geo, or might feed to an ML model that could discover useful inferences.

achimschloss commented 4 years ago

@benjaminsavage - added this here https://github.com/w3c/web-advertising/blob/master/support_for_advertising_use_cases.md#lookalike-targeting

I think in the discussion in the call we came to the conclusion that FLoCs (in it current form) would not cater for that (given its random initialisation) and TURTLEDOVE (in its current form) neither, given you'd want to build a prediction model and apply that, not mark already known audiences (one (brand) interacted with directly already).

In terms of making the APIs in general more flexible, it would be good to also conceptually include the notion of publishers using these mechanics to contribute to cohort models.