WICG / pending-beacon

A better beaconing API
Other
43 stars 9 forks source link

[Privacy] Address privacy requirement around "network changes" #30

Open mingyc opened 2 years ago

mingyc commented 2 years ago

The discussion from https://github.com/WICG/unload-beacon/issues/28#issuecomment-1237609079 suggests that one of the privacy requirements around network change, as stated below, is too ambiguous:

Beacons are only sent over the same network that was active when the beacon was registered (e.g. > if the user goes offline and moves to a new network, discard pending beacons).

The foundamental privacy issue here is to prevent beacon from sending to fulfill the expectation that users expect a network cannot know about their previous navigation history when (1) they have already left a page (either by closing a tab or by navigating away) AND (2) they have already left the origin network

The following example is the same from the linked comment:

  1. User is on network1 (e.g. mobile network)
  2. User navigates to site1.com
  3. site1.com creates and queues beacon1 on the page.
  4. User navigates away from site1.com A. where site1.com is put into bfcache, and beacon1 is still pending. B. browser checks whether network has changed or not => No
  5. User switches to network2 (e.g. office wifi), and thinks that network2 wouldn't know about the navigation history of site1.com
  6. beacon1 is sent by browser via network2 (when either page evicted from bfcache or timeout/backgroundTimeout timer ends) => an unexpected behavior

Note that Step 5 can always happen after 4-B as the device's network change takes some time to propagate from OS to browser.

Proposed solution

The current proposal is to force the browser to send out all beacons when a page is no longer open, i.e.

and replaces 4-B with sending out all beacons.

Cons

  1. There can still be leak to network2 if a network change happens very quickly in between B & D A. navigating away B. browser forces sending out all beacon1 from network1 C. (network changes to network2) D. The browser's network API actually sends out request constructed from beacon1.
  2. This potentially makes backgroundTimeout unusable for "navigating away" use case, as the the time between this timers starts (i.e. page enters hidden state) and forced sending (i.e. page enters bfcache) can be very short. But it can still work when the page visibility change but navigation doesn't happen, e.g. switching to another tab.
mingyc commented 2 years ago

A followup question: whether to force sending out on different-page navigation or different-site navigation.

Note that a beacon is attached to a document (page).

fergald commented 2 years ago

If the user still has another instance of site1.com open or opens one later on or there is a service worker for site1.com, then it should be fine to deliver the beacon, we're not doing anything the site itself couldn't do (e.g. if it wrote data to IDB and managed delivery itself).

I think this would leave us a small risk of dropping data due to network change.

We still need to decide if it's OK to wait while in BFCache or not (since there is still a race with the network change event).

tungnh28 commented 1 year ago

Hmm, there're quite a few APIs having the same privacy concerns here (such as report API and background sync). However, I think this leak is quite limited by the fact that pending beacon requests must be HTTPS only, so the request contents are not revealed (only IP will be leaked), and only under a certain situation of racing between network change event and beacon request setup (because we would discard entirely pending requests after getting network change).

Ideally, beacons must only be sent persistently over only one network (tagged at the time the beacon was registered), and discard the future sending if we find the network is different than the old one. There was an immature exploration of Chrome regarding the idea (unfortunately I actually don't know the state or underneath tech difficulties, still asking around). I guess it's time to push it forward.

mingyc commented 1 year ago

... only under a certain situation of racing between network change event and beacon request setup (because we would discard entirely pending requests after getting network change).

@tungnh28 Do you suggest that it's ok for us to try with best-effort, i.e. dropping pending beacons on network changes, after the tab is closed, in the browser process?

Ideally, beacons must only be sent persistently over only one network

We would argue that a beacon should be sendable over different network as long as the tab that creates the beacon is still open when network changes, as developers can achieve this same behavior by other existing APIs.

tungnh28 commented 1 year ago

Do you suggest that it's ok for us to try with best-effort, i.e. dropping pending beacons on network changes, after the tab is closed, in the browser process?

Given the current risks are pretty low, the best-effort approach sounds good to me (I think we are doing the same with report API).

We would argue that a beacon should be sendable over different network as long as the tab that creates the beacon is still open when network changes, as developers can achieve this same behavior by other existing APIs.

Yeah, indeed, and with extending in the comment, (with double keyed because we don't have to concern about user's friction).

mingyc commented 1 year ago

@tungnh28 Could you please confirm if beacons can only be sent over the same network can be bypassed if background sync is enabled given this comment?

(I also asked similar question here)

tungnh28 commented 1 year ago

I don't think so, we had the same privacy concern/limitation in background sync as well (that's one of the motivation of background-sync preference setting, for users who knew the risks and want to disable background sync).

mingyc commented 1 year ago

OK. Then background-sync permission just grant use to bypass checking other live site instance?

I've updated this change. @tungnh28 Could you also please take a look? Thanks!

fergald commented 1 year ago

@tungnh28 can you explain why this sending wouldn't just be governed by the background-sync permission? As you say, it's the same concerns, so it seems like it should be the same permission.

In particular, a site that wants to send data in the background can do so from a service worker if it has that permission. This seems like another case where we could be pushing devs to hand-roll something instead of using this API.

The workaround for this are either to

  1. always send on pagehide and lose some batching opportunities
  2. pay attention to network changes and sometimes lose data

1 is probably OK but I feel inconsistent for background sync not to apply to this network activity.

tungnh28 commented 1 year ago

can you explain why this sending wouldn't just be governed by the background-sync permission?

As the primary privacy concern here, we are trying to prevent beacon from revealing the previous navigation history (host) in some certain situations, apparently gating behind background-sync does not help us resolving it. However, it might be the right thing to do since we are expecting to provide an alternative API which should not be less capable than the existing one.

The privacy risks seem to be quite reduced and I don't think that concern should block us moving forward. I'd like to friendly tag someone who knows the old privacy story of background-sync for better ideas. @beverloo, maybe?

tungnh28 commented 1 year ago

Friendly tag @sauski and I am looking forward to the feedback of privacy team.

Sauski commented 1 year ago

I realise that, from a privacy perspective, we don't have a consistent stance on APIs which want to communicate after the page has been unloaded. After chatting with the Chrome privacy folks, we've come up with a stance that applies to any API which wants to send data at some point after the page has been "unloaded" (with the exclusion of periodic background sync, which is limited to PWAs and is a different beast).

Starting at time T0, defined as when the eTLD+1 for the page which wants to communicate (e.g. send a beacon / report / sync message) is no longer "fully active" in any tab - e.g. considered as "non-existent” from the user’s perspective. Note that entering the BFCache does explicitly not extend this.

At time T0, the user agent could send any beacons / reports / sync messages, regardless of the background sync user permission. Acknowledging that unload handlers + fetch get polyfill this functionality % reliability.

Past time T0 (assuming the eTLD+1 remains "non-existent"), any outbound communication requires the background sync user permission.

Additionally, past time T0, any network change must result in any pending communications being dropped (regardless of why they are pending, e.g. including because they are in BFCache). The time between T0 and a network change is not relevant - a user may have closed tabs just before swapping networks, precisely because they don't want those sites communicating on the new network.

We also consider some "timeout" time past T0, where regardless of network change, it would be unreasonable to send the communication. Elsewhere it was discussed that Firefox has a 40 minute timeout, and this would definitely be a hard upper bound. Below that, we don't have strong feelings - obviously shorter is more private, but below this hard limit, it seems OK for the API to decide.

Acknowledge that this stance isn't respected by some APIs (e.g. background sync doesn't consider network change) - from a privacy perspective, we'd consider this a privacy deficiency, and will be working to rectify that. Going forward, from a privacy perspective, we'd be looking to uphold the bar defined above for all APIs.

mingyc commented 1 year ago

Thanks for the updates!

@Sauski I have the following questions

defined as when the eTLD+1 for the page which wants to communicate (e.g. send a beacon / report / sync message) is no longer "fully active" in any tab

  1. Will it still be eTLD+1 after 3P Storage Partitioning Key is launched everywhere, which changes the key to (Root Frame eTLD+1, current frame Origin)?

Past time T0 (assuming the eTLD+1 remains "non-existent"), any outbound communication requires the background sync user permission.

  1. Does this imply that opening another eTLD+1 tab after previous T0 will lead to another T0?
Sauski commented 1 year ago

Will it still be eTLD+1 after 3P Storage Partitioning Key is launched everywhere, which changes the key to (Root Frame eTLD+1, current frame Origin)?

I could see this going either way, from a raw privacy perspective, the embedded origin could post message up to the top frame and have it act on its behalf, recovering the raw eTLD+1 scope. On the other hand, if we don't have a good reason to break away from the strong model, I don't see why we should.

Does this imply that opening another eTLD+1 tab after previous T0 will lead to another T0?

It's my intuition is that it would (@tungnh28 for thoughts). E.g. if you had something pending send, you could refresh the pending time when the user re-opened the site (or the site came out of bfcache etc).

I worry however, that this would lead to some potentially perverse interpretations (e.g. holding onto some data beyond the lifetime described above on the hope that another "T0" occurs and everything becomes valid again)

tungnh28 commented 1 year ago

Thank you very much for the insight.

It's my intuition is that it would (@tungnh28 for thoughts). E.g. if you had something pending send, you could refresh the pending time when the user re-opened the site (or the site came out of bfcache etc).

It sounds reasonable to me, although I am still a bit concerned about breaking the gap between embedded iframe and 1P frame. Given the case the pending beacon request is coming from iframe (please correct me if it's not the case), and there's exactly an active instance of the same top frame + different subframe (or we are reopening the top level page), it seems not right to me.

Edit: I was saying the case of reopening the same tld, but different subframes.

mingyc commented 1 year ago

@Sauski

the embedded origin could post message up to the top frame and have it act on its behalf, recovering the raw eTLD+1 scope.

But if we only focuses on behaviors after T0 (defined as when the eTLD+1 for the page which wants to communicate is no longer "fully active" in any tab), the page should not be able to make any kinds of communication to recover the scope?

-- Some follow up questions:

  1. Do you think crash recovry from disk storage will be banned by this privacy decision?
    • The timing when a browser crashes can be considered as T0
    • Background sync permission for a site may or may not have changed after reopening the browser.
  2. If a dedicated worker is running for a page, it should also follow this rule that pending beacons (created by the worker) should not be sent after T0?

@tungnh28 Could you please explain more with examples? I don't quite get your concern.

Pending beacons can bet sent by any type of documents (may not be iframe).

tungnh28 commented 1 year ago

Is it the case of reopening top level frame, causing different subframes? I was saying due to some web apps (for example google app script) or user interaction, the subframe A would be injected (or result of navigation) and trigger pending beacon requests. However reopening the top level page is going back to the original state (where A is not available)

Sauski commented 1 year ago

Some more thoughts:

Do you think https://github.com/WICG/pending-beacon/issues/34 will be banned by this privacy decision? The timing when a browser crashes can be considered as T0 Background sync permission for a site may or may not have changed after reopening the browser.

I don't think banned would be the interpretation - difficult to have meet the proposed privacy requirements might be better. Can you help me better understand why browser level crashes are important to include provisions for? Crashes in the renderer, sure, but then we can continue to track network changes at the browser level.

From a security perspective, I'm not sure we'd even want to reveal that a browser level crash occurred.

If a dedicated worker is running for a page, it should also follow this rule that pending beacons (created by the worker) should not be sent after T0?

It's not that they shouldn't be sent, but I can't see any reason that they shouldn't follow the same requirements as above (e.g. they'd at least need the background sync capability).

mingyc commented 9 months ago

Re-surfacing this discussion.

Per latest discussion from Chrome Privacy reviewers, it is expected that any new API implemented in Chrome should not perform any background communication if none of the original document that initiates such communication is alive, even if such behavior can already be achieved by using service worker. This is because a stricter requirement is enforced on new APIs. (Also see previous conclusion for PendingBeacon API in https://github.com/WICG/pending-beacon/issues/30#issuecomment-1333869614)

Hence, a simplified version is currently implemented in Chrome for upcoming Origin Trial, that any pending fetchLater requests for a document will be flushed if that document enters BFCache, without checking if any of other same-origin documents is still open. In the future, a less stricter version should be considere by checking for the existence of other same-origin documents before flushing.

@noamr PTAL. We probably have to update the spec to include this behavior.