WICG / floc

This proposal has been replaced by the Topics API.
https://github.com/patcg-individual-drafts/topics
Other
934 stars 90 forks source link

The encoding choice is inconsistent with the experimental evaluation #121

Closed mjuarezm closed 3 years ago

mjuarezm commented 3 years ago

Version "chrome.2.1" implements the one-hot encoding of the eTLD+1 domain names to encode the user profile. However, the experiments in the FLoC whitepaper show that the TF-IDF and the taxonomy-based ("Vert depth 3" in the paper) encodings perform better than one-hot encoding.

What motivated the decision to use one-hot encoding given the results shown in these figures?

Apologies if I missed something.

michaelkleber commented 3 years ago

Hi @mjuarezm! We are indeed interested in trying a taxonomy-based FLoC clustering mechanism like the one in the whitepaper. But Chrome doesn't have any preexisting on-device mechanism for understanding the contents of pages according to any taxonomy. So the one-hot encoding of domain names is what we know how to do right now, and the more ambitious clustering proposals are candidates for future work.

mjuarezm commented 3 years ago

Thanks for the clarification!