WICG / background-sync

A design and spec for ServiceWorker-based background synchronization
https://wicg.github.io/background-sync/spec/
Apache License 2.0
641 stars 83 forks source link

Periodic Background Sync has serious security risks, which are not described or adequately mitigated #169

Open othermaciej opened 4 years ago

othermaciej commented 4 years ago

(a) Periodic BackgroundSync could be used to build BotNets along the lines in this paper: https://www.ndss-symposium.org/wp-content/uploads/2019/02/ndss2019_01B-2_Papadopoulos_paper.pdf

(b) More specifically, a mechanism to periodically phone home could turn an installed base of apps into an active BotNet at any time with no prior warning. Even with no further vulnerabilities, it could be used for purposes such as DDOS, CryptoMining or mass fraud (albeit somewhat mitigated by limits on execution time and frequency).

(c) A mechanism to periodically phone home can be used to greatly extend the attack scope of 0-day vulnerabilities and can make it more efficient to abuse n-day vulnerabilities. Assume a sandbox escape vulnerability usable from a Service Worker is revealed. Periodic background sync allows it to be used against the whole pool of users who have granted the permission right away, perhaps before they have had time to install the patch.

(d) I pointed out a number of similar risks for models with persistent background content (then called “persistent workers”) in 2009: https://lists.w3.org/Archives/Public/public-whatwg-archive/2009Jul/0868.html

(e) All these vulnerabilities are exacerbated by the fact that domains and websites can be purchased. Even if the actor registering for periodic background sync is trustworthy at the time, their assets could be purchased at a later time by a malicious entity. For a website, users can simply stop visiting, but with periodic background sync, they may continue to be vulnerable even if they don’t visit/launch any more.

(f) Concerningly, the specification does not even have a Security Considerations section, even though these types of risks have been known for years. Perhaps mitigations to these threats exist, but one wouldn’t know it from reading the spec.

mugdhalakhani commented 4 years ago

Thanks Maciej! See responses to the points you raised inline:

(a) Periodic BackgroundSync could be used to build BotNets along the lines in this paper: https://www.ndss-symposium.org/wp-content/uploads/2019/02/ndss2019_01B-2_Papadopoulos_paper.pdf (b) More specifically, a mechanism to periodically phone home could turn an installed base of apps into an active BotNet at any time with no prior warning. Even with no further vulnerabilities, it could be used for purposes such as DDOS, CryptoMining or mass fraud (albeit somewhat mitigated by limits on execution time and time).

Crypto-mining Since periodic background sync runs in a service worker, it inherits the lifetime aspects, and more specifically includes time limits, which mitigates the CryptoMining issue.

Mass fraud I'm not sure what you're specifically referring to by "mass fraud", can you explain the attack?

DDOS In terms of DDOS, the attack in the paper you reference relies on a bug in Chrome's service worker implementation that was fixed in July 2018 and that had allowed the service worker to stay alive indefinitely. The paper was pretty bad at making this clear, and an update (see current status) was eventually posted on the university’s web page to clarify that the issue had long been addressed before the paper was published.

BotNet "BotNet" sounds scary, so let's try to define what it means in real terms:

If the user is on a web page, that origin and any origins in child frames can run JS and make fetches. If you're a web property that users generally keep open (Twitter, Gmail, Facebook, iCloud) you're already in a position where you could 'command' these users to make fetches and run JS at a particular time. This is less true on mobile, where pages are routinely killed, but this is an implementation detail due to memory pressure, and not a specified mitigation for this issue.

So let's look at how periodic background sync changes the landscape here: If the user visits a web page once a month, and the browser chose to grant the origin a background sync every 10 minutes, that would massively increase the amount of time the origin can execute JS and run fetches. However, the specified scheduler gives the browser multiple opportunities to arbitrarily throttle this, which Chrome's implementation makes full use of.

For instance, Chrome’s implementation awards an engagement score to an origin every time the user intentionally interacts with it. We associate different sync intervals based on this engagement score as follows:

Engagement Score Periodic Sync Interval (hours)
NONE Never
MINIMAL 36
LOW 24
MEDIUM 24
HIGH 12
MAX 12

For reference, it takes at least four days of consistent intentional use by the user for an installed web app to go from LOW site engagement to HIGH.

The MINIMAL bucket represents ~25% of sites, whereas the LOW bucket represents a further ~65% of sites. This means that for most of the sites, they’ll only get the opportunity to sync in the background every 24 or 36 hours, for the service worker execution timeout period, which is 3 minutes in the current implementation. This opportunity is further guarded by the requirement to be an installed web app. Examples of intentional use are media playback on the site, active time on the site, and direct navigations to a site (not link clicks, nor pop-ups, etc).

Once the user hasn’t interacted with a site for 2 hours, its engagement levels begin to decay. The higher the engagement level, the more aggressive the decay. If the user intentionally uses the site, it stops the decay, and the engagement score can go up again. It only takes a few days of no interaction with the site, to see a site’s engagement score decay to zero, at which point, Chrome’s implementation will deny that site’s opportunity to sync in the background.

But these aren’t the only options. For instance, an implementation could be extremely conservative by performing the sync just before they expect, with high enough confidence, that the user would visit the site anyway, meaning the user has up-to-date content available immediately. Since this is just before the user would visit anyway, it isn't particularly useful as a "BotNet".

All of these opportunities to mitigate the concerns were designed into the spec, but I guess we need to be clearer that these aspects are intended to mitigate issues like this.

So, I propose that we:

Does that help? Let's track adding this in issue 170

We also have some implementation specific notes, but I want other browsers to be able to innovate and adapt here.

(c) A mechanism to periodically phone home can be used to greatly extend the attack scope of 0-day vulnerabilities and can make it more efficient to abuse n-day vulnerabilities. Assume a sandbox escape vulnerability usable from a Service Worker is revealed. Periodic background sync allows it to be used against the whole pool of users who have granted the permission right away, perhaps before they have had time to install the patch.

A browser vulnerability like this would impact every user who visits any site until the patch is released, regardless of background sync. If the suggestions above are followed in terms of sync frequency, I don't think this would make a 0-day worse.

(d) I pointed out a number of similar risks for models with persistent background content (then called “persistent workers”) in 2009: https://lists.w3.org/Archives/Public/public-whatwg-archive/2009Jul/0868.html

I think the points raised on that thread have been addressed here. Let me know if there's something I've missed.

(e) All these vulnerabilities are exacerbated by the fact that domains and websites can be purchased. Even if the actor registering for periodic background sync is trustworthy at the time, their assets could be purchased at a later time by a malicious entity. For a website, users can simply stop visiting, but with periodic background sync, they may continue to be vulnerable even if they don’t visit/launch any more.

The spec allows the browser to throttle syncing to the point where the next sync is infinite-time away, effectively suspended. This happens in Chrome's implementation. Hopefully, adding the previously noted points in the spec will make this clear.

The spec doesn't state when the next sync should be suspended, as I wanted browsers to be able to innovate there. We could add a note to effective-minimum-sync-interval-for-origin suggesting that "infinity" should be returned if the user hasn't visited the origin in the past n days.

(f) Concerningly, the specification does not even have a Security Considerations section, even though these types of risks have been known for years. Perhaps mitigations to these threats exist, but one wouldn’t know it from reading the spec.

I think we could have done a better job here. A lot of the issues are covered in the privacy and resource usage sections, but an explicit security section, and further detail in existing notes, would have been useful. We’ll do so.

The intent here is to allow a site the user trusts to have content ready for them ahead of their visit. E.g., news ready for their morning commute, up-to-date weather reports available to them even if they're now in an area without connectivity.

Ignoring the current spec and implementation for a moment, is this a use case Apple is interested in giving users of the web, or is it something that should only be available to native apps, and why?

mischmerz commented 4 years ago

Seriously? Periodic background syncing in 12- 36 hrs? For installed web-apps? Well - and I thought we developers get something we can work with.

m.

othermaciej commented 4 years ago

Thanks for the lengthy reply @mugdhalakhani . At a high level, I'd say it is essential for a specification and proposed standard to address security in the specification, not just in the implementation. In increasing order of goodness:

  1. A Security Considerations section that describes the possible risks (but without necessarily suggesting or requiring mitigations). This should be table stakes for a web standard dealing with powerful capabilities or security-sensitive matter. In particular, if a specifier is also an implementor and has found it necessary to implement security defenses around a feature, that should be a bring flashing light that at minimum these risks need to be described in the spec.

  2. Not only a description of security risks but also non-normative suggestions of possible mitigations. This is probably the minimum needed to adequately review whether such a specification at least possible to implement securely.

  3. Not just a description of risks and suggestions of mitigations, but also normatively required defenses so the spec is secure by design, not just potentially securable. What you think of as implementation freedom I think of as a possible need to reverse-engineer the market-leading implementation, particularly when the operation of a security mitigation is web-observable. And it's often possible to put limits without totally constraining the implementation. For example, instead of hardcoding Chrome's interval table into the spec, maybe there should be a global minimum interval of 12 hours or the like.

I would hope for (3) from that list. But even some combination of (1) and (2) would be a major improvement.

Comments on some specific items, and answers to questions you asked:

I'm not sure what you're specifically referring to by "mass fraud", can you explain the attack?

If it's possible to commit fraud of some manner through ServiceWorker http requests (for example fake ad conversions or fake account registration), and one defense is noticing whether requests come from a common IP block, periodic sync might be a way to evade that.

Since periodic background sync runs in a service worker, it inherits the lifetime aspects, and more specifically includes time limits, which mitigates the CryptoMining issue.

This depends on what the time limits and minimum execution interval are; both are entirely unspecified. There aren't even suggested limits to avoid such risks as far as I can tell.

In terms of DDOS, the attack in the paper you reference relies on a bug in Chrome's service worker implementation that was fixed in July 2018 and that had allowed the service worker to stay alive indefinitely.

You can take the position that this paper was all about one specific bug. I take it to illustrate a class of problems when JavaScript execution may continue an indefinite time after the page is closed.

But in any case, DDOS risk was not meant to be a mention of just this one paper. You can see I mentioned such a risk in 2009, well before the paper was published.

So let's look at how periodic background sync changes the landscape here: If the user visits a web page once a month, and the browser chose to grant the origin a background sync every 10 minutes, that would massively increase the amount of time the origin can execute JS and run fetches. However, the specified scheduler gives the browser multiple opportunities to arbitrarily throttle this, which Chrome's implementation makes full use of.

That sounds good for Chrome. But the spec makes no suggestion of what a safe limit might be. The fact that the time interval is specified in milliseconds does not suggest that in practice the UA should require it to be hours or days.

NONE | Never

Nothing in the specification suggests that "Never" is a confirming UA choice of interval. The specified interval is defined as a long long, and the UA is allowed to fire events at a greater interval, but +Infinity is not in the value range of a long long. Clarifying that it's conforming for the UA to never fire the sync timer would be a helpful improvement. (Though without saying anything about when the UA might validly chose "never", it makes the spec hard to reply on for websites).

It only takes a few days of no interaction with the site, to see a site’s engagement score decay to zero, at which point, Chrome’s implementation will deny that site’s opportunity to sync in the background.

This seems like a useful mitigation. It would be nice if potential implementers of the spec could know about ti from the spec, and not just from observing Chrome's behavior (or code or docs).

A browser vulnerability like this would impact every user who visits any site until the patch is released, regardless of background sync. If the suggestions above are followed in terms of sync frequency, I don't think this would make a 0-day worse.

A weekly-use site could potentially create daily frequency risk, if installed. Such intervals could fall in a patch uptake window. The existence of a "never" time interval makes this a lot less bad, but, again, from the spec it's hard to tell that this is allowed, let alone advisable.

We could add a note to effective-minimum-sync-interval-for-origin suggesting that "infinity" should be returned if the user hasn't visited the origin in the past n days.

Agree (except then you'd have to make clear the result type is not a "long long", or instead you could specify this as a "never sync" flag on the side).

Also worth noting, nothing in the spec suggests that this capability should be limited to installed web apps.

The intent here is to allow a site the user trusts to have content ready for them ahead of their visit. Ignoring the current spec and implementation for a moment, is this a use case Apple is interested in giving users of the web, or is it something that should only be available to native apps, and why?

This question seems off topic for a security issue against the spec. Surely the need to address these issues is independent of Apple's plans.

But in the interest of being helpful I'll answer. And I'll state this as my opinion, though I expect many of my Apple colleagues would agree with me. I believe the Web should have certain critical properties: an open universal platform where the user is kept safe wherever they go (both with respect to privacy and security). Adding useful capabilities to the web is great, but it must be done in a way that preserves these properties. Sometimes it's not possible to square that circle. And in such cases, I believe it's better to leave a dangerous capability out of the web platform. Sometimes it's possible, but non-obvious. It's good to uncover when that is the case.

Native apps are different. While some protection comes from engineered safety measures, to a signficant extent, modern app platforms rely on curation, knowledge of the publisher's identity, revocation, trust, and so forth. Native platforms also often have started from a base of extensive capabilities and power. Curtailing that power may be necessary, but it is a difficult process that takes time. On the other hand, the web is not curated. And that's one of the great things about the web.

In my view, it is reasonable to give installed web apps some additional capabilities, as install is a signal of user interest of sorts. But even so, I believe installed web apps should have a security and privacy model closer to the web than to native. Install-like actions often feel more like bookmarking than like an app store install.

mischmerz commented 4 years ago

Maciej - I have to chime in here. We all know that what you call 'curation' (others might call that differently) hasn't prevented bad apps on any platform. The web-environment is thankfully already a bulwark against malicious actors, installed web-apps profits from that protection. Additionally, browsers are aware of the URLs being called and can intercept any action, even from web-apps at any time if malicious activity is detected. So I don't understand most of your security concerns.

Web-apps also offer unprecedented flexibility for both, developers and users. As the code is written in Javascript, it's transparent and visible to anybody.

In other words - I don't see any technological reason to overly impede webapps compared to native. As I am not an employee of any browser manufacturer I am allowed to be a bit more frank: Your browser is not exactly known as a serious web-app supporter. Given the fact that y'all don't even support webpush on some platforms makes me wonder if security is the only reason that makes you resist the evolution of web-apps.

But .. in dubio pro reo. Let's all sit down and design a way that unlocks enhanced native-like capabilities for installed web-apps based on a known developer infrastructure. Y'all all have developer key environments in place. Let's extend that for web-apps and we're good to go.

Michaela

othermaciej commented 4 years ago

@mischmerz Mozilla has also expressed serious concerns about this spec (largely on privacy grounds). Do you suspect them too of secretly being against web apps? That wouldn't be fair to them, nor do I think it is fair to us. We may have different priorities, but browser engine developers all share the goal of moving the web forward.

Furthermore, it seems like this issue has resulted in productive discussion which will lead to positive improvements to the spec. We got to learn that Chrome has mitigations which aren't required or even suggested by the spec, which might reduce both security and privacy risk. It will improve the spec to address these points. Would you rather we hadn't given this feedback at all? Is it helpful to respond to security issue reports with suspicion of motives?

mischmerz commented 4 years ago

@Maciej I understand where Mozilla is coming from and you're right - it wouldn't be fair to them to suggest any ulterior motives.

We - as a company - are very concerned about privacy issues but web platforms need to authenticate their users and IP number information is often used for localization, language selection or to conform with organizational or legal requirements. So - while IP numbers may be of some concern in regard to privacy - there are ways for privacy minded users to avoid exposing their details to a web-application. IP numbers are IMHO not a serious problem in regard to background sync.

I also don't want to sound unfair, but webkit is causing a lot of problems for web-developers, especially when it comes to webapps and, of course iOS - not only because some APIs are not implemented for whatever reason, but also because of bugs that don't get enough attention

In regard to the topic: I now understand periodic background sync to be limited to 12 - 36 hours intervals. This is IMHO way too narrow in its application, so that I don't see a lot of usability cases anyway.

This having said - I still don't think that there are serious security concerns in regard to this API specifically or webapps in general. As I mentioned before - the browser is aware of all connections at all times and can always pull the plug if a URL matches some malicious address.

And - I was dead serious about suggesting a verified developer environment that would allow relaxing the rules for web apps from verified developers.

Happy hacking :)

Michaela

tomayac commented 4 years ago

Native apps are different. While some protection comes from engineered safety measures, to a sign[i]ficant extent, modern app platforms rely on curation, knowledge of the publisher's identity, revocation, trust, and so forth. Native platforms also often have started from a base of extensive capabilities and power. Curtailing that power may be necessary, but it is a difficult process that takes time. On the other hand, the web is not curated. And that's one of the great things about the web.

The Safari Safe Browsing feature (and Safe Browsing in general, which in one form or the other exists on most UAs) can be seen as a form of curation.

"Safari & Privacy" terms, highlighted the section "Before visiting a website, Safari may send information calculated from the website address to Google Safe Browsing and Tencent Safe Browsing to check if the website is fraudulent".

"Before visiting a website, Safari may send information calculated from the website address to Google Safe Browsing and Tencent Safe Browsing to check if the website is fraudulent".

mugdhalakhani commented 4 years ago

At a high level, I'd say it is essential for a specification and proposed standard to address security in the specification, not just in the implementation. In increasing order of goodness: (1) A Security Considerations section that describes the possible risks (but without necessarily suggesting or requiring mitigations). This should be table stakes for a web standard dealing with powerful capabilities or security-sensitive matter. In particular, if a specifier is also an implementor and has found it necessary to implement security defenses around a feature, that should be a bring flashing light that at minimum these risks need to be described in the spec. (2) Not only a description of security risks but also non-normative suggestions of possible mitigations. This is probably the minimum needed to adequately review whether such a specification at least possible to implement securely. (3) Not just a description of risks and suggestions of mitigations, but also normatively required defenses so the spec is secure by design, not just potentially securable. What you think of as implementation freedom I think of as a possible need to reverse-engineer the market-leading implementation, particularly when the operation of a security mitigation is web-observable. And it's often possible to put limits without totally constraining the implementation. For example, instead of hardcoding Chrome's interval table into the spec, maybe there should be a global minimum interval of 12 hours or the like. I would hope for (3) from that list. But even some combination of (1) and (2) would be a major improvement.

Already working on (1) and (2). PR here: https://github.com/WICG/BackgroundSync/pull/173

The spec does already provide a default of 12 hours as the values for minimum periodic sync interval for any origin and for minimum periodic sync interval across origins.

I quote: “minimum periodic sync interval across origins MUST be greater than or equal to minimum periodic sync interval for any origin. If undefined, these are set to 43200000, which is twelve hours in milliseconds.”

The difference is you’re suggesting making this normative, and require that any browser implementation set minimum periodic sync interval across origins to a value of at least 12 hours.

The limit was non-normative because the spec allows browsers to implement UI to notify the user of any ongoing background synchronization and might not want to impose such strict limits on frequency.

I’ll create a spec bug to decide whether a cap on frequency should be normative, and what cap the spec should enforce. Let’s track this in #174

The installability requirement will also be left non-normative, even though Chrome implements it. This is because the spec provides leeway for the user agent to implement UI to avoid surprising the user when synchronization happens in the background and an implementation might choose to not restrict to installed apps when they have such UI.

In my view, it is reasonable to give installed web apps some additional capabilities, as install is a signal of user interest of sorts. But even so, I believe installed web apps should have a security and privacy model closer to the web than to native. Install-like actions often feel more like bookmarking than like an app store install.

Agreed, and I believe that with the restrictions on frequency of synchronization which don’t apply to native apps, the Periodic Background Sync capability is more conservative in comparison.

radu-at commented 2 years ago

As a developer I don't think I will start to implement the Periodic Background Sync API, with a value of at least 12 hours is almost useless on such time interval. All developers are looking today for near real time Syncs if possible.

radu-at commented 2 years ago

Periodic Background Sync should be user defined setting under Site Settings from 1m to 1 year, for privacy reasons the browser should not know how many times the users are interacting with my website to determine a sync between 12-36h. As a Site Owner I should be able to instruct my users to do a Periodic Background Sync interval based on my needs because Push API can't be reliable due to heavy traffic. My infrastructure, my users, my responsibility! The browser can't decide by itself the Periodic Background Sync, the browser should impose only a best sync interval which can be overwritten by the user, we as site owners together with our users should have the flexibility to decide by our self this usage and Periodic Background Sync.

aarongustafson commented 2 years ago

@othermaciej Do you feel that #173 addressed this issue to the point where you feel this issue can be closed?