Open kdeqc opened 3 years ago
We have been thinking about this. We updated the explainer to this effect recently as well. One option is what we call 'per-site' cohorts, where a site gets its cohort assigned to it when the page first asks for it, and the cohort is computed on the fly. That cohort is then sticky for that site for some amount of time. This way, each site winds up with a different cohort and they're not useful for cross-site identification.
The downside is that it leaks information about the user faster, if multiple sites have the user's PII and can gather the cohorts from the various sites. It also reduces overall utility. So we need to consider that carefully.
Why would we expect cohort IDs to be different for different sites, even if they were generated on visiting a site and sticky for some period of time? If cohorts are relatively stable to browsing preferences, then it seems like often the cohort ID will be the same when I visit two different sites.
Also, it would be particularly difficult mitigation to explain to end users. "You are recognizable as the same person on Site A and Site B if you first visited Site A and Site B the same week, but maybe less likely to be recognized if your browsing patterns changed before you subsequently first visited Site C the following month." (If a user clears their browser cookies, which should reset any sticky cohort identifier, then the newly-generated identifiers would be the same for all sites they visit soon after, a highly unexpected implication.)
Different cohort IDs on visiting different origins would limit the scope of the additional fingerprinting surface. That is one way limitation on the severity of browser fingerprinting (see the list here), but cohort IDs could still be high-entropy, widely-available and would in that case necessarily be fairly persistent.
The detectability factor is less clear: the request for the cohort identifier is an explicit JavaScript call easy for the browser or an extension to detect, but if it's expected to always be requested by advertising it would be difficult to distinguish between fingerprinters and those just using cohort ID for non-individualized advertising.
We have been thinking about this. We updated the explainer to this effect recently as well. One option is what we call 'per-site' cohorts, where a site gets its cohort assigned to it when the page first asks for it, and the cohort is computed on the fly. That cohort is then sticky for that site for some amount of time. This way, each site winds up with a different cohort and they're not useful for cross-site identification.
The downside is that it leaks information about the user faster, if multiple sites have the user's PII and can gather the cohorts from the various sites. It also reduces overall utility. So we need to consider that carefully.
I'm confused by what you are saying. Wouldn't we expect cohort ids to remain consistent across sites, or else how would large publishers with many small domain/sites target a campaign to a particular floc across their network?
You could say domain grouping instead of domain, but now small publishers who want to use analytics systems have no way at all to determine the value of these flocs. The small site cannot detect what the floc is or how to use it or how it behaves and it needs the cross site (but still anonymized and legitimately privacy preserving) information on the floc from another source.
I think per-site cohorts would be very discriminating against small publishers in this way, and I don't think it solves the fingerprinting issue either as login based sites can do far better without the extra effort. Please let me know if I am misunderstanding something.
From a privacy perspective, one of the bigger concerns about FLoC is it adding a new surface for fingerprinting. The understanding is that this will be mitigated by the introduction of the Privacy Budget, but it doesn't seem like the two proposals are being released at the same time. Any thoughts on adding in some type of fingerprinting mitigation as part of FLoC directly?
Also, any information on how unique an individual cohort would be also be helpful here too.