Straw Proposal: Unlimited issuers while protecting privacy

WICG / trust-token-api

Trust Token API

https://wicg.github.io/trust-token-api/

Other

422 stars 84 forks source link

Straw Proposal: Unlimited issuers while protecting privacy #23

Open laughinghan opened 4 years ago

laughinghan commented 4 years ago

Straw proposal for a site to do trust scoring with unlimited numbers of issuers, without compromising privacy (explainer currently suggests limit of 2 per-site, but there are concerns about how that would effect the Web).

Ideas:

document.countTrustTokensFrom('<space-delimited list of issuers>'), returns how many sites on the list the user agent has trust tokens from, but not which ones
document.trustTokensThreshold(threshold, { <map from issuers to numerical score> }), sums up the score given to each trusted issuer, but only returns whether the sum exceeds the given threshold

(Edit) These methods would single-use, and only if hasTrustTokens() has never been called, and would be mutually exclusive from each other.

dvorak42 commented 4 years ago

Not sure I understand the proposal. Wouldn't a malicious site then be able to say:

countTrustTokensFrom({'a.com', 'b.com'}) then either itself or a partner site do countTrustTokensFrom({'a.com', 'b.com', 'c.com'}) to see if the use has c.com tokens, and continue repeating?

Threshold summing also complicates since the site could choose numbers such that they learn whether the user has issuer A, or a combination of specific issuers?

laughinghan commented 4 years ago

Oh wait, the server can't just trust the client, there has to be a way to actually redeem a token to validate.

(Edit) I have two ideas for how to do this:

a probabilistic way based on hash functions that still reveals at least one of the issuers to the server
a potentially much better way where lying is impossible and no issuers are revealed to the server, and the server can even ask for a thresholded weighted sum and the client can reveal solely whether the threshold is met—but it depends on elliptic curve cryptography and I'm not a cryptographer, so ¯\_(ツ)_/¯

(I moved it to its own Gist to braindump less wall-of-text into this ticket.)

laughinghan commented 4 years ago

@dvorak42 To clarify, the site could only call countTrustTokensFrom() once, or trustTokensThreshold() once, or hasTrustTokens() twice—if any are called, the others cannot be called.

The whole point of these methods that a single call lets an honest site do the kind of thing it would otherwise wish to do with lots of calls to hasTrustTokens(), but without revealing as much fingerprinting information. No honest site in the target use case should ever want to call either of these methods multiple times.

(Edit) Only later noticed "or a partner site". How do the site and partner site know they're fingerprinting the same user, a redirect? How is that worse than the current proposal, where if sites coordinate like that, they'd have 2 bits per site of fingerprinting entropy? The same mitigations would work for both, e.g. limit number of calls per-mini-session, not just per-site, where redirects without user activation change the site but not the mini-session.

laughinghan commented 4 years ago

Generalization:

What if we could pass the API an arbitrary pure function that takes a list of issuers, and outputs a single boolean indicating "trustworthy" or "untrustworthy"? If purity and return type are enforced, and the API is single-use, no fingerprintable info could be exfiltrated except for the single boolean.

This could be implemented as a WebAssembly module or Frozen Realm aka SES passed to the API in uninstantiated form. The API could instantiate it and call it with all available issuers, it could return a single boolean and then be destroyed, with the boolean return value being all that is left when the API returns control to the script.

(No idea how the server could verify this though.)