Open cooperq opened 9 years ago
Sorry for the late reply. Could you elaborate on "if any scripts are commonly being used by different attackers" a bit? Do you see us parsing script contents somehow?
I mean, just a sha sum would do the trick. I think it's also worth reverse engineering any popular scripts to think about how we can build heuristics to detect them.
Absolutely!
Hashing: Ah, cool, that would help us in cases the same script goes by different filenames or is used by different domains. Perhaps we could also strip comments/whitespace when hashing to allow for trivial differences.
I think stripping comments and whitespace is a great idea. This at least lets us discover if there are standard FP scripts floating around, which I suspect there are. Many people were using the same script for canvas based FP.
In addition to detecting common scripts, this could be very useful for post-crawl analysis. While going through the crawl results, we had many cases where suspicious scripts were changed, taken offline or simply missing on the pages once they were found to present.
Also, I think simhash and MOSS can be very useful for finding near-duplicate scripts. In addition to comments and whitespaces, scripts may include unique identifiers, timestamps or different endpoint URLs. As long as the scripts have very similar content, simhash
would give the same digest and MOSS would give a very high similarity score.
great ideas @gunesacar
Being able to access response bodies through the WebRequest API in Chrome will make this much easier to implement.
It would be really great to be able to store a copy of all the scripts identified as fingerprinting scripts. That way we could see if any scripts are commonly being used by different attackers. This could also help us come up with heuristics if people are using similar tactics across the board.