explainers-by-googlers / Web-Environment-Integrity

536 stars 103 forks source link

"device identifier" could be per-origin; would that alleviate concerns? #2

Open bakkot opened 1 year ago

bakkot commented 1 year ago

Per the proposal:

We strongly feel the following data should never be included: A device ID that is a unique identifier accessible to API consumers

I absolutely agree with this as phrased, since it would trivially allow tracking across origins.

However, you can imagine doing hash(origin || per-device ID) and including that instead. That would not allow any cross-origin tracking. And it would enable the server to do rate-limiting for physical devices, which would be extremely useful (though it wouldn't entirely obviate the usefulness of having some other rate-limiting indicator built in).

Would you consider such a thing to be verboten the way a cross-origin ID is?

(The design would need to be slightly more complicated in that the attester would need to include the origin as well, so it couldn't be spoofed just by faking the origin to the attester and thereby getting a new unique ID.)

RupertBenWiser commented 1 year ago

Thanks for the suggestion @bakkot. I think there will be some interesting thoughts to chew on here.

It is currently being proposed at a high level in the explainer that the attester does not know what the origin is. As you've called out, that could cause complications - that would allow cross origin tracking for the attester.

The explainer is currently proposing to partition across incognito so we'd also have to make a difficult decision on if this is sent to the attester. Eg: hash(origin || per-device ID || isIncognito || any other partitioning)

I think these factors will make it difficult to use this approach in its current form.

Perhaps the browser could inform the attester in some way that it wants a new "session" that represents "origin, incognito" etc?

bakkot commented 1 year ago

As you've called out, that could cause complications - that would allow cross origin tracking for the attester.

Hm. I think this could maybe be mitigated - for example, the browser could get the stable per-device ID, mix it with the origin, and then provide that to the attester to be included in the attestation.

Can you say more about the threat model here, though? I've been imagining attesters as being like OS-level features like Play Protect or whatever, and of course the OS already can do cross origin tracking.

The explainer is currently proposing to partition across incognito so we'd also have to make a difficult decision on if this is sent to the attester.

As long as the number of partitions is strictly limited, I think it's fine to exclude other non-origin partitioning information (including the incognito bit).

From the anti-fraud provider's perspective, a single physical device being able to appear as two different devices is only marginally worse than appearing as one (at least in the use cases I've thought about). The problem is when it is able to appear as N for arbitrary N. (That does unfortunately rule out the "browser wants a new session" design, however.)

philippp commented 1 year ago

of course the OS already can do cross origin tracking.

The owner of the OS kernel can theoretically intercept all user-level data, but here we are considering an identifier that would be sent to services run by the attester (as opposed to a signal computed in the OS kernel that stays on the machine). Currently no such join key is being sent, and we don't want to introduce one.

There is also the challenge of resets - if the user clears state on their browser, the identifier itself needs to be cleared to prevent unwanted cross-user / cross-context tracking. Consequently, things like "rate limiting" should probably be backed by the attester, and not user-resettable in the same way (otherwise attackers would constantly reset their rate limiting state).

That said, if there is utility for a user-resettable, same-TLD+1 scoped, attester-masked identifier, let's dig into those use cases!

bakkot commented 1 year ago

The owner of the OS kernel can theoretically intercept all user-level data, but here we are considering an identifier that would be sent to services run by the attester (as opposed to a signal computed in the OS kernel that stays on the machine). Currently no such join key is being sent, and we don't want to introduce one.

But right now the owner of the OS kernel is the same entity as the attester, isn't it? I'm still confused here. Can you write the threat model here out in more detail? Who are the parties, what is assumed about each of their capabilities and level of coordination, that sort of thing?

There is also the challenge of resets - if the user clears state on their browser, the identifier itself needs to be cleared to prevent unwanted cross-user / cross-context tracking.

It's not obvious to me that accepting this constraint is making the right tradeoff; either way it would be good to document. That said, taking it as given, you can still recover most of the value by including a low-precision "how long ago was state last reset" signal: possibly even a single bit for "was state last reset at least one day ago" (or some other suitably-chosen timeframe).

mokrates commented 1 year ago

You don't need to expose a device identifier to the website at all. You could attest to the Client that he is real by signing a challenge token (possibly with added requirements regarding the features of the client), and the Client can then choose to relay the signed token back to the webserver.

The challenge token can be generated by using a fast distributed (between client and webserver) random() algorithm between the webserver and the client, such that the webserver put hidden information into the token.

But srsly: Don't implement that.