WICG / trust-token-api

Trust Token API
https://wicg.github.io/trust-token-api/
Other
413 stars 82 forks source link

Trust Tokens are not useful for anti-fraud in general #66

Open bakkot opened 3 years ago

bakkot commented 3 years ago

I work at Shape Security on our anti-fraud product, which is widely used by banks, retailers, and other organizations to defend against credential stuffing, user account takeovers, and other forms of abuse. For illustration, we block hundreds of millions of malicious login attempts on banks every day, which would have amounted to many thousands of compromised bank accounts.

I saw that trust tokens are being offered as a tool to prevent fraud with less reliance on fingerprinting. I can see how they'd be very useful for click fraud and similar cases, but because trust tokens are opt-in there are other kinds of fraud for which they do not meet that goal.

For example, consider the case of large-scale automated attacks on login systems and other pre-authentication endpoints, typically carried out by using tools such as selenium or puppeteer and routing the traffic through residential proxies. In many cases it is possible to identify individual transactions as automation without further context - if they fail to interact with the page at all before submitting a request, for example, or if their claimed user agent does not match their observable capabilities - but sophisticated attackers may be able to create traffic which will pass as human for a time. Defense in those cases generally requires being able to correlate all visits from the attacker despite said visits originating from different IP addresses, which is done using identifying characteristics of their traffic which distinguishes it from traffic from other users - in other words, fingerprinting.

Note that the case I'm concerned with is when someone is visiting the same site repeatedly. There's no tracking in the sense of following the user around the internet; the "fingerprinting" is strictly first-party.

Crucially, attackers can represent themselves as having never visited the page before. As far as I can tell, trust tokens do not offer anything for the case of a totally new visitor, such as someone logging in from a library computer which resets its profile between users. So attackers can always represent themselves as belonging to this class of traffic. (Or they could harvest tokens, which just pushes the problem down the road to the token issuers.) And by contrast to deciding whether to count a click as real for the purposes of charging for advertising, this categorization has direct consequences for the user: it's not acceptable to simply decide that any user you've never seen before can't log in to their bank. Trust tokens are fundamentally an opt-in mechanism and therefore fundamentally unsuited for defense against this kind of fraud.

I'm very interested in the privacy budget proposal and in trust tokens as a replacement for fingerprinting, but I'm concerned that trust tokens aren't going to be effective for the kind of fraud discussed above. If the web platform makes it harder to defend against this kind of fraud without providing an effective alternative, we risk making these attacks much more prevalent, at significant cost to users. I'd love to be involved in any further discussions about new web platform capabilities for fighting fraud.

dvorak42 commented 3 years ago

Trust Token attempts to solve part of the problem, notably if you've determined traffic is non-fradulent at one location and are trying to propagate that to other places, but I agree that alone it doesn't really completely solve the fraud problem of being able to classify traffic that doesn't have any previous trust decision included.

Its the partial solution of Trust Tokens in the CAPTCHA space, where having a token proves to an endpoint/login system/etc that the user has previously done some amount of verification that they are a 'real' user and allows them to bypass any costly verification step (image/audio recognition/login/etc). (so things like Selenium and puppeteer would end up being routed to doing whatever CAPTCHA verification, while real users would be able to skip that state)

But how to perform verification is still an open issue that isn't really solved by Trust Token itself.

bakkot commented 3 years ago

In practice, CAPTCHAs are both trivially easy to break for automation and wildly inaccessible for humans, which is why odds are very good that your bank makes use of our product rather than a CAPTCHA. And, as I say above, our product relies in part on identification mechanisms which Trust Tokens are apparently intended to obviate.

So that doesn't really work - you can't say "Trust Tokens depend on the original establishment of trust" when the original establishment of trust, in all likelihood, depends on mechanisms that Trust Tokens are supposed to be replacing.

dvorak42 commented 3 years ago

Trust Tokens itself is only looking to replace use cases that currently use third party cookies for cross-site tracking of trust/fraud signals. Trust Tokens isn't trying to be the API for generating the signals in the first place or dealing with things like fingerprinting and other third-party cookie use cases, that would be the domain of other parts of the privacy sandbox. There's now a more general Privacy Sandbox repo for discussing it in a more holistic way (https://github.com/GoogleChromeLabs/privacy-sandbox-dev-support), might be worth migrating the parts of the discussion that aren't Trust Token specific there.

Even the wider privacy sandbox effort is mostly about dealing with the cross-site tracking issue, while it sounds like your product is mostly dependent on single-site based signals? I agree that there is still a gap in the privacy sandbox on how to support various methodologies in the fraud/IVT space that rely on same-site signals, while preventing these signals being used as another form of cross-site tracking. For building up fingerprints of bad traffic, are those typically done on a site by site basis, or do they get shared across many different origins?

bakkot commented 3 years ago

Trust Tokens isn't trying to be the API for generating the signals in the first place or dealing with things like fingerprinting and other third-party cookie use cases, that would be the domain of other parts of the privacy sandbox.

Hm, ok. The privacy sandbox overview lists Trust Tokens as the sole part of the project aimed at combatting fraud. Which part of the privacy sandbox is intended to address the use case given in the OP?

There's now a more general Privacy Sandbox repo for discussing it in a more holistic way (https://github.com/GoogleChromeLabs/privacy-sandbox-dev-support), might be worth migrating the parts of the discussion that aren't Trust Token specific there.

Sure, I'll continue this discussion there, thanks.

while it sounds like your product is mostly dependent on single-site based signals?

I'm not sure there's a relevant distinction, currently. For example, among many other things we make use of the rich information which has historically been available in navigator.userAgent to bucket users (after verifying it against other observable characteristics of the browser, of course). Is that a "single-site based signal"?

It's true that we don't need do cross-site tracking, as I mention in the OP. The problem is that the tools we use for correlating a single visitor across visits to a single site have a lot of overlap with to the tools used for correlating a single visitor across visits to different sites.

For building up fingerprints of bad traffic, are those typically done on a site by site basis, or do they get shared across many different origins?

I can't speak to other companies' mechanisms, although informed speculation would tell us that a company like Akamai is likely to be sharing that information across different origins. The product I work on is able to be effective without sharing fingerprints across origins.

dvorak42 commented 3 years ago

Yeah, that's a place where we need to work on clarifying the discourse and how we position Trust Token to make it clearer that its intended to be a solution to cross-site sharing of trust/fraud information, rather than a solution for all the fraud needs here. Some of the mitigations talk about how to allow for fraud use cases (https://github.com/bslassey/ip-blindness/blob/master/willful_ip_blindness.md has a section on DoS/fraud prevention), but we probably need some more holistic discussions about how to support fraud/DoS needs in the space while preventing arbitrary vectors that can be used to track a user across different sites/contexts.

bakkot commented 3 years ago

we probably need some more holistic discussions about how to support fraud/DoS needs in the space while preventing arbitrary vectors that can be used to track a user across different sites/contexts.

I would love to be involved in any such discussion, which I would think needs to happen before any of the mechanisms currently used by anti-fraud tools are neutered or removed.