Closed bstandaert-wustl closed 3 months ago
What is the goal behind measuring the number of APIs used in the script. If this is for detecting fingerprinters, then I think we settled on the Disconnect's list, no?
Or is the goal to extend the detections by Disconnect by using the API presence threshold as a heuristic? If that is the case, then we can also look at the combination of APIs to pinpoint the exact fingerprinting method, e.g., by extending the heuristics proposed in this paper: https://www.cs.princeton.edu/~arvindn/publications/OpenWPM_1_million_site_tracking_measurement.pdf
Here are some combinations for prominent fingerprinting methods:
For CanvasFont:
CanvasRenderingContext2D.font & CanvasRenderingContext2D.measureText
For Canvas:
(CanvasRenderingContext2D.fillText | CanvasRenderingContext2D.strokeText | CanvasRenderingContext2D.fillStyle | CanvasRenderingContext2D.strokeStyle) & HTMLCanvasElement.toDataURL
No calls to following:
CanvasRenderingContext2D.save
CanvasRenderingContext2D.restore
HTMLCanvasElement.addEventListener
AudioContext:
OfflineAudioContext.createOscillator | OfflineAudioContext.createDynamicsCompressor | OfflineAudioContext.destination | OfflineAudioContext.startRendering | OfflineAudioContext.oncomplete
WebRTC:
(RTCPeerConnection.createDataChannel | RTCPeerConnection.createOffer) & (RTCPeerConnection.onicecandidate | RTCPeerConnection.localDescription )
The other fingerprinting analysis is to look at the prevalence of known FP APIs, for which relying on APIs used by FPJS2 makes sense.
Or is the goal to extend the detections by Disconnect by using the API presence threshold as a heuristic
Yes - the Disconnect list only shows known fingerprinters, this would let us find unknown ones as well.
The other fingerprinting analysis is to look at the prevalence of known FP APIs, for which relying on APIs used by FPJS2 makes sense.
The idea is to extend this and ask, "what are the scripts using these APIs for fingerprinting" and their prevalence, similar to how we identify tracking network prevalence. Many of the usages of these are for legitimate non-fingerprinting purposes, but a script that uses many of them across different categories is more likely to be fingerprinting, which is where the threshold comes in.
The four you list make sense as a separate analysis, but don't cover the many other methods used by FPJS.
@SaptakS Nurullah mentioned you were managing PRs - are you able to do a review on this?
Can you resolve the merge conflicts @bstandaert-wustl ?
Done.
The latest tests show _null
for both?
OK the reason the custom metric is returning null
is not to do with the changes in this PR but due to the existing a11y custom metric failing for https://fingerprintjs.github.io/fingerprintjs/](https://fingerprintjs.github.io/fingerprintjs/
@pmeenan is working on a fix.
Implements a test that looks for fingerprinting API strings in response bodies. Strings are manually derived from fingerprintJS.
It reports a count of API usages for each site, and a list of "likely fingerprinting scripts", which is anything with >= 5 API usages. @umariqbal do you have any recommendations on what this threshold should be?
Test websites: