WICG / trust-token-api

Trust Token API
https://wicg.github.io/trust-token-api/
Other
422 stars 84 forks source link

issuer redemption statistics #44

Open erik-anderson opened 4 years ago

erik-anderson commented 4 years ago

The Edge team has been partnering with folks at Microsoft that regularly deal with detecting fraudulent activity. We'd like to explore having aggregated statistics extension for the Trust Token API that give an issuer some context about how the tokens were redeemed without revealing too much about where that activity happened.

We wrote up an explainer to go over the proposed primitives and discuss privacy considerations: https://github.com/MicrosoftEdge/MSEdgeExplainers/blob/main/TrustTokenExtensions/IssuerRedemptionStatistics.md

We're happy to discuss in more detail in this issue and/or issues filed against the MSEdgeExplainers repo. We're looking forward to feedback and refining the proposed extension.

dvorak42 commented 4 years ago

Thanks, will take a look at the proposal.

One initial question (possibly as a separate issue for Trust Token in general) is where transferability of tokens fits into the threat model. Trust Tokens can currently be used in a different browser context from where they were issued since they're not bound to the browser, and things like malware running on the device could outright steal the tokens from the token pool, a bot could setup a user-CA and intercept the raw request, or you could also have a malicious extension intercepting redemption requests and stealing the tokens. I imagine the line falls somewhere on the spectrum, do you see a lot of fraudulent activity from more limited malicious extensions that have the ability to make requests from whatever context they want, but not the full set of permissions need to intercept or access storage? (or are you imagining trying to make Trust Token resilient from extension inspection/interception?)

erik-anderson commented 4 years ago

Yes, the threat model is a bit unusual.

As you note, there are a diverse range of malicious actors. For a sufficiently sophisticated actor, this would not be a super meaningful signal. They could lie in the statistics or the redemptions could happen on a different device.

This is more targeted at detecting less sophisticated actors; e.g., individuals manually clicking on ads served on their own sites, malware that attempts to run under the covers without the user noticing, or even automated attackers who might not put sufficient energy into faking this signal to mask their intentions.

Malicious extensions are certainly a concern. Separate from this proposal, we would be interested in exploring if there are meaningful ways to limit what an extension can do w.r.t. Trust Token signals. For example, if the only way to clear these signals via the browsingData API was to clear in such a way that it would likely be end-user visible, the extension may choose not to try to clear it. If we could limit their ability to intercept and extract the tokens that would be useful as well. Even if something like a malicious extension managed to exfiltrate some of the tokens, if these statistics could reveal that the same user is coming back for more even though they had spent a small fraction, that might be revealing.

dvorak42 commented 3 years ago

A few comments:

Of the signals, Redemption-Rate and Redemption-Distribution seems most useful for helping detect non-standard RR usage and to feedback into RR lifetimes.

Redemption-Redemptions if it could be done in a privacy-preserving way would also be a useful signal to detect abuse patterns, though I'm not sure how to manage a ranking that is privacy-preserving, and depending on the size of the issuer, the period of time for the bucket classification may need to be significantly larger. A small issuer that only sees a little bit of traffic can target specific sites that they want to 'tag' with a specific Rank and use that at issuance time to see whether the user visited those sets of sites.

Is Redemption-Rate completely subsumed by Redemption-Count? It seems like one is just the average of the other?

1) Even just a Redemption-Count stat that shows how many redemptions happened since the last issuance might be helpful, and isn't directly exposed by the API. Though there's always the potential for privacy-leakage based on the number of tokens that a user has used between issuances.

2) It might make sense to have a separate Sec-Trust-Token-Redemption-Stats header that is sent on issuance with all the stats the UA is willing to send bundled up, so that it doesn't interfere with other data that an issuer might want to send in the POST body (solution to a CAPTCHA, challenge-response payload, etc).

3) There's also the question of what amount of significant figures is okay to have in a way that doesn't provide too much leakage/information.

4) This also only would be viable for issuance that happens in a first-party context. Since if you're on A.com which issues against a 3P issuer.com, that would receive the same redemption stats as on B.com which issues against a 3P issuer.com, which would introduce cross origin data transfer. Fuzzing the values each time might helpful slightly, but could be bypassed with enough attempts. Resetting stats on each token issuance might also be sufficient to mitigate this slightly, combined with withholding stats for some period of time.

Then one nit, is that probably references to SRRs (now RRs) should probably be replaced with 'redemptions', since most of this isn't particularly tied to using redemption records (except for Redemption-Rate, which could also be the number of redemptions on a site (with RR transmissions counting as separate instances) instead).