Open benjaminsavage opened 4 years ago
Hi Ben,
It's still an open question whether there might be meaning in the individual bits of the ID assigned by FLoC. There are a wide range of possible clustering algorithms to consider, and they don't all have the same properties.
But even aside from the question of whether logistic regression on flocks is semantically meaningful, I would have expected the large cohort sizes to leave something too coarse to be useful for most lookalike audience use cases.
I'm not sure if you've seen this proposal: https://github.com/w3c/web-advertising/blob/master/privacy_preserving_lookalike_audience_targeting.md
The key idea there is to use the Aggregated Reporting API to perform logistic regression on embedding vectors with boolean labels. In that proposal, the suggestion was for publishers to provide custom embedding vectors for use in this process. I am wondering if FLoCs could be used as well?
While the proposal talks about FLoCs as "cohorts", I get the sense that they are not meaningless, arbitrary numbers. Specifically this part:
I assume what this means is:
My question is:
Will step 5 render the FLoC ID useless as anything but a random "cohort ID"? Or will it maintain some kind of meaning like the original embedding vector?
Let me give a concrete example to make my question more clear. Assume:
Are they person A and person B "more similar" than person A and person C? The Hamming Distance between person A and person B is 1. The Hamming Distance between person A and person C is 10.
If step 5 preserves some kind of meaning (for example, we can compare the Hamming Distance between FLoCs and use this as some kind of measure of similarity) then it seems like one could potentially apply the same "Logistic Regression in MPC" approach to FLoC IDs.