Open brodrigu opened 3 years ago
Thanks for the question! Please see #74 for a more detailed description of how the algorithm works (at least in Chrome). The initial simhash operation is deterministic, and bots won't have any effect on it.
Bots could have an effect on what the server believes is the size of each cohort. So let's say that only one user is in a given cohort but bots make it looks like there are 1,000 users there. In which case, the server could be tricked into making a cohort have fewer users than it should in the worst case. But to what purpose? Chrome will have a limit on the total number of cohorts it would allow, so it's not like this will allow for more cohorts, so general tracking of users still won't be practical.
It sounds to me like this attack pattern would mainly be used to identify a single user, by joining N cohorts of M members with M-1 bots and 1 user. Where N=K this could allow a single user to be identified (whether or not that is worth the effort) though if the number of cohorts is limited and their nature is pre-determined, then it should not be an issue as every cohort would have users >K.
In which case the attack would not work. Is this correct?
A malignant actor creates a botnet of standard chrome browser installs and programs them to visit a specific set of sites aligned with a behavior they would like to target.
The following script for example:
The botnet runs this script across thousands of unique chrome installs, presumably all landing in the same curated cohort.
Later, a “real” user navigates across the web including the sites in the above script. Is it likely they will end up in the same cohort?
Could this method be used to hack the floc model to create curated cohorts?