fingerprinting when js can store exact values

wanderview commented 4 years ago

I'm a bit confused by the fingerprinting section. Any js that can inspect cache_storage can also write into it. This means js that wants to do some kind of tracking can put a UUID into a cache object somewhere out of the way. Given that, how concerned do we really need to be about fingerprinting of metadata?

wanderview commented 4 years ago

This bug is regarding:

https://github.com/MicrosoftEdge/MSEdgeExplainers/blob/master/CacheAPIResponseMetadata/explainer.md

aarongustafson commented 4 years ago

@wanderview Just so I’m aware of the attack vector you’re bringing up, are you talking about generating a synthetic response and caching that for a given resource?

wanderview commented 4 years ago

I was thinking that a script could do this:

const url = '/foo.html';
const c = await caches.open('tracker');
await r = await c.match(url);
if (!r) {
  // new user
  const uuid = generateUUID();
  await c.put(url, new Response(uuid));
  logNewUser(uuid);
} else {
  // returning user
  const uuid = await r.text();
  logUserSeen(uuid);
}

Which seems much more likely to exactly track a user than fingerprinting metadata associated with responses.

aarongustafson commented 4 years ago

Thanks for that @wanderview, that’s what I thought you were thinking about as well. I think this is an issue that should be filed against the SW spec itself (if such an issue has not already been filed). @jungkees is an editor on that spec and can help with that.

I’m going to go ahead and close this issue here as it is not an issue against this specific explainer.

wanderview commented 4 years ago

I don't see why this is an issue for the service worker spec. All storage has this property; localstorage, IDB, etc.

Our main protection against abuse of storing state is that its bound by the same-origin policy. So the origin can only access/modify its own storage. Some browsers go further and double-key storage in 3rd party iframe contexts.

But doesn't this apply to the restrictions you discuss in the privacy considerations of this explainer? It talked about limiting the granularity of access times, etc. Its unclear to me that is necessary.

jyasskin commented 4 years ago

Because it's possible to store a user ID directly in CacheStorage (or any other storage a service worker has access to), and the metadata in this new spec is 1) not shared with any other origin (or across any storage boundaries for browsers with multi-keyed storage) and 2) cleared at the same time as the rest of the origin's storage, I believe you can remove all the mitigations discussed in https://github.com/MicrosoftEdge/MSEdgeExplainers/blob/master/CacheAPIResponseMetadata/explainer.md#privacy-considerations.

aarongustafson commented 4 years ago

Looping in @melanierichards.

melanierichards commented 4 years ago

Thanks Aaron! Perhaps going a little bit down a philosophical road in this issue, but I think it's worth avoiding new bits of entropy where we can, even if a more useful fingerprinting vector is available in extant APIs. The pertinent question to my mind is: what fidelity do web developers actually need here? Can most use cases be solved by snapping to the day, as proposed, or might it be reasonable to look at another interval that addresses developer needs while exposing the least amount of information necessary?

wanderview commented 4 years ago

You can already do this:

cache.put(performance.now());

To store a high resolution timer. It seems there is next to nil chance this API will ever restrict what can be stored in a response body in the future.

I don't see any benefit to arbitrarily rounding the metadata associated with the response when the js can storage the exact values in the response itself.

In any case, can we at least re-open this issue since there seems to be some disagreement?

wanderview commented 4 years ago

Maybe it would be useful to understand the threat model you are considering here.

aarongustafson commented 4 years ago

The pertinent question to my mind is: what fidelity do web developers actually need here? Can most use cases be solved by snapping to the day, as proposed, or might it be reasonable to look at another interval that addresses developer needs while exposing the least amount of information necessary?

That’s been key for me too, thanks @melanierichards!

Maybe it would be useful to understand the threat model you are considering here.

@wanderview Here’s the W3C PING threat model that @jyasskin and @tomlowenthal have been working on. During the drafting of this explainer, I also consulted the Self-Review Questionnaire: Security and Privacy.

I don't see any benefit to arbitrarily rounding the metadata associated with the response when the js can storage the exact values in the response itself.

As @jyasskin pointed out in another issue, there may be some performance implications to using high-fidelity values.

In any case, can we at least re-open this issue since there seems to be some disagreement?

I can re-open it for discussion. Just keep in mind that guidance we receive from the PING—which will balance privacy concerns with developer wants—will ultimately shape the final decision on this stuff.

wanderview commented 4 years ago

@jyasskin pointed me at this update to his threat model:

https://github.com/w3cping/privacy-threat-model/pull/6/files

I haven't read it all, but the part that jumps out at me is:

A privacy harm only occurs if the user expects not to be associated between two
This conversation was marked as resolved by jyasskin
visits, but the site can still determine with high probability that the two
visits came from the same user.

A user's expectation that their two visits won't be associated might come from:

* Using a browser that promises to avoid such correlation.
* Using their browser's private browsing mode. ([[WHAT-DOES-PRIVATE-BROWSING-DO]])
* Using two different browser profiles between the two visits.
* Explicitly clearing the site's cookies or storage.

This recognition is generally accomplished by either "supercookies" or [=browser
fingerprinting=].

In this case I don't think the harm is realized. Any metadata will be cleared with the storage. Its not visible across profiles or private browsing mode since storage is not accessible across that boundary.

aarongustafson commented 1 year ago

I believe this issue is resolved.

MicrosoftEdge / MSEdgeExplainers

fingerprinting when js can store exact values #146