Discussion about out-of-band ruleset updates

jeremyn commented 7 years ago

(@Hainish, @cowlicks:)

There's a code branch https://github.com/EFForg/https-everywhere/tree/sign-rulesets that is intended to allow ruleset updates out-of-band from regular version updates. The idea is that HTTPS Everywhere will request ruleset updates from a central EFF server periodically with the effect that rulesets will update from time to time without the version year.month.day changing.

Privacy Badger does something similar for a whitelist it uses. Recently there were some problems with the updates which lead to a frustrating experience for me, see https://github.com/EFForg/privacybadger/issues/1466. After I understood what was happening, I argued against it, see for example https://github.com/EFForg/privacybadger/issues/1466#issuecomment-315792144 and the related discussion.

I think this is also a bad idea for HTTPS Everywhere and I'd like to understand why this change is being made. I would at least like to avoid the user experience of things cycling between breaking and working for no apparent reason that I had with Privacy Badger.

ghost commented 7 years ago

I am not sure why do you ask to not update rulesets outside of extension update mechanism. The current update mechanism is secure enough to not be the weakest link. The weakest link is the browser extension update mechanism, since Chrome uses SHA-1 and Firefox uses both SHA-1 and MD5 (!!!) for extension update verification, while we use SHA-256. I guess this may be closed. Thanks for letting others hear your opinion though.

ghost commented 7 years ago

uBlock Origin and Privacy Badger both update their data out of the band, I am not sure if anyone sees this as a problem. Maybe the extension "phoning home" may be a concern, but that can be fixed by out of band updates being optional, regular extension updates still including the default ruleset library. This "phoning home" is also not a concern for Tor Browser users since Tor anonymizes the update downloading.

jeremyn commented 7 years ago

@koops76 My concerns aren't about the crypto, it's about the surrounding user and maintainer experience. You can read through https://github.com/EFForg/privacybadger/issues/1466 to get an idea of what I mean.

In any case I would specifically like to hear from the EFF why they are doing this so I understand the motivation.

Bisaloo commented 7 years ago

@jeremyn, I know you are mainly waiting about the EFF answer but if I may chime in:

I understand and I agree with your concerns. On the other hand, separating the rulesets updates from the core updates has several upsides in my opinion, which would still make this change a positive move for our users.

Right now, when a ruleset fix is pushed on master, we have to either:

rush an update because it concerns a high traffic website (https://github.com/EFForg/https-everywhere/pull/9110)
tell the users to wait a month for this change to reach them (and we know this is problematic because it's common to have reports saying "X is broken" when a fix has been pushed for weeks but hasn't been released yet)

In short, I think this could be a net gain depending on how frequent the rulesets updates would be. So I guess my follow-up question is: how often do you plan to push the rulesets updates @Hainish?

In any case, this change and this behaviour should be made very clear at different places (changelog, add-on description, FAQ, etc.) to alleviate some of the concerns raised by @jeremyn.

(As a broader point, I think it would be really helpful if you kept the community in the loop when you are making big changes like this one. Not necessarily to argue but at least to inform so we can prepare for what's coming. This was already an issue in https://github.com/EFForg/https-everywhere/pull/10538.)

cschanaj commented 7 years ago

So I guess my follow-up question is: how often do you plan to push the rulesets updates

AFAIK, we make a release every 2 weeks with exceptions when there is blocking PRs. The benefit of out of band updates is not obvious unless we are pushing much more frequent than that.

Unlike uBlock, HTTPSE and PB are privacy focused extensions. We should avoid phoning home while leaving the users unnoticed (as they are privacy/ security conscious). IMHO, out of band ruleset updates should be made on request (require user interactions) or on a opt-in basis.

Besides, we rarely make core code changes. @Hainish How often are we going to make core releases when we have out of band ruleset updates? We should make the (ruleset & core code) release policies clearly documented.

@jeremyn thanks for bringing this issue to the discussion!!

ghost commented 7 years ago

Unlike uBlock

uBlock Origin is a privacy focused extension, since blocking ads also decreases the amount of tracking and there are also lists (some enabled by default) specifically made to block trackers, even if they are not showing any ads.

ghost commented 7 years ago

Besides, we rarely make core code changes.

Exactly the reason why out-of-band ruleset database updates are desired.

ghost commented 7 years ago

IMHO, out of band ruleset updates should be made on request (require user interactions) or on a opt-in basis.

Privacy Badger already pings to EFF for updates and I'm not sure if it can be disabled. I guess this is OK since if the user does not trust EFF they would have not installed the extension. I'm not sure if the option should be opt-out, or opt-in, with the extension asking the user when first installed. Opt-out seems better since the user may close the tab the extension opens on startup without paying any attention to its content, resulting in them not getting the ruleset updates quickly.

Hainish commented 7 years ago

There are a number of reasons for out-of-band ruleset updates.

Lax security of AMO updates A year ago today, Ryan Duff posted about a vulnerability discovered accidentally in the way Mozilla does key pinning in Firefox. This would allow anyone granted a misissued certificate for addons.mozilla.org to update extensions arbitrarily. Given the broad permissions traditional extensions are granted, this is a serious issue that is mitigated in part with the XPCOM deprecation. But it's a troubling sign that the security of AMO was circumvented by pure happenstance, and does not engender confidence in the Mozilla addon update infrastructure.
Waiting for code review On the topic of AMO, we have in the past had to wait up to a couple of weeks before a new release is reviewed and published. In communication with Mozilla, we've been able to have them prioritize review, and this time span has shortened. But occasionally we'll observe that a new version will again be taking a long time for review. Of course, this is not a problem for the self-hosted version. But if there is a critical site that is in need of an update, the process of fixing it is made more difficult.
Reliance on AMO, Chrome Webstore, Opera Addons This is related to but not the same as the above. The release process is simply made more difficult and cumbersome by having to upload to the various browser addon stores every time a release is made. This is also the case for the self-hosted Firefox version, since a signature is required by Mozilla before posting. It makes the release process much more laborious.
Keys we control Given (1), Tor Browser is readying itself to disable the update channel for extensions. In order to ensure ruleset updates are reliably delivered, we have to move to an out-of-band delivery channel. With the sign-rulesets branch merged, updating the rulesets will be possible using keys that we control. This will rely on our own infrastructure and keys that are pinned in the extension itself.
Multiple update channels Moving in this direction also makes the extension more modular. With update_channels.js, we allow the extension to specify multiple update channels. This will facilitate, for example, an intranet to deliver another rulset bundle to its constituents without publicly divulging secret FQDNs. This will also allow the Tor Browser to deliver its own ruleset bundle, with its own pinned key, via HTTPS Everywhere. This possibility is powerful for the reasons Paul Syversion, Tor creator, describes in this whitepaper.
Possibility of granular updates in the future Currently, the rulesets comprise the vast majority of the extension download size. Of the 1.7M compressed xpi, a mere 421K is left to everything else - mostly translations. This means that delivery of the rulesets will be roughly equivalent to delivery of the extension itself, currently. But this transition opens up the possibility of delivering granular ruleset updates to extensions, which will vastly lower the total size of downloads in a unit time.

In the future we may also expose the update channel to users, so that they themselves can compile ruleset bundles. But this will require us weighing the costs and benefits of adding an additional maintenance burden.

jeremyn commented 7 years ago

@Hainish Thanks for the detailed response.

I would say that my concerns could be met by adding the following:

Inform users of the auto-update mechanism and allow them to enable/disable it. The Observatory UI is an example of what this might look like.
Allow users to reset the rulesets to the "stock" set that came with their current version of the add-on.
Version the ruleset updates and display this version to the user somewhere.

Optionally:

Publish information somewhere about the EFF infrastructure responsible for serving updates, to give confidence about its reliability. Maybe a status page/dashboard would be good here.

The motivations for all of these are, I hope, obvious, but I can elaborate if needed. None of these should interfere with the reasons you've given.

Also, to repeat, my experience with similar functionality in PB was that the update target (the whitelist) was repeatedly and quietly wiped. If we move to this update model for HTTPS Everywhere then I consider it a more-than-theoretical possibility that the rulesets could be similarly wiped or broken. The above items give users the ability to lock down or reset HTTPS Everywhere to a known-good version.

Hainish commented 7 years ago

@jeremyn

Also, to repeat, my experience with similar functionality in PB was that the update target (the whitelist) was repeatedly and quietly wiped. If we move to this update model for HTTPS Everywhere then I consider it a more-than-theoretical possibility that the rulesets could be similarly wiped or broken. The above items give users the ability to lock down or reset HTTPS Everywhere to a known-good version.

This bug was due to the fact that the whitelisted resources in Privacy Badger were not being verified before being applied. When the website had a bug, the whitelist would return a blank result. In this unfortunate edge-case, Privacy Badger happily applied this whitelist, thus causing the extension to malfunction.

This will not happen with the ruleset updates, since every ruleset update is cryptographically signed by the extension developers, then verified in each users' extension before being applied.

Given this, in what case would you envision the auto-update mechanism causing a fail case so large that we should expose extra options to the user? If privacy is the issue, we should keep in mind that already the users browser pings the extension store every day to check for new updates. EFF has a more stringent privacy policy than AMO or Google, so that shouldn't be a concern.

Hainish commented 7 years ago

This being said, I do think you're right in this:

Version the ruleset updates and display this version to the user somewhere.

Hainish commented 7 years ago

The ruleset updates will already have a timestamp associated with them, when applied. This is to keep track of whether we need to update the rulesets upon subsequent checks. We can simply use this timestamp, formatted as YYYY-MM-DD, as the version that is displayed to the user.

jeremyn commented 7 years ago

The problem isn't with a specific bug, it's about the process. It's like your criticism of Mozilla in "Lax security of AMO updates": a bug was found, Mozilla fixed it, but now you're uncomfortable with their update process. Same thing here, right? A bug in the EFF update process was found, the EFF fixed it, but now a reasonable person could be uncomfortable with the EFF's update process. This reasonable person should have the ability to disable or revert EFF updates.

For privacy, it's not about the privacy policy. It's about giving users control over outbound network requests and software updates. To quote myself from https://github.com/EFForg/privacybadger/issues/1466#issuecomment-315843564:

...there is an endless list of software creators who want to build in quiet, uncontrollable update processes to their software. They always think they are justified. Users complain because they feel a loss of control over their system and because it causes unexpected breakage. This is no different. The EFF heavily criticized Microsoft for this sort of thing during the Windows 10 rollout.

HTTPSNowhere commented 7 years ago

@Hainish

There's still the issue of the need of a fallback mirror; For example, in China EFF infrastructure is blocked there, so people with HTTPS Everywhere installed there can't benefit from this feature. Is there anyway to add some fallback mirror to for example directly from Github (which is thankfully not blocked there)?

cschanaj commented 7 years ago

@HTTPSNowhere I guess that Github is also blocked in China... see https://en.wikipedia.org/wiki/Censorship_of_GitHub#China

HTTPSNowhere commented 7 years ago

@cschanaj

No, it is no longer blocked.

ghost commented 7 years ago

@HTTPSNowhere Is CloudFront blocked in China?

HTTPSNowhere commented 7 years ago

@koops76

No (I'm not in China but it's known that CloudFront isn't blocked there, and it's the type of domain front that works with Tor there = i.e. meek-amazon in Tor Browser).

Also using CloudFront may seem like a waste when there are Github releases that are free and already offer "unlimited" bandwidth.

ghost commented 7 years ago

@HTTPSNowhere GitHub won't be happy if we would use it as a free CDN.

HTTPSNowhere commented 7 years ago

@koops76

It's only for fallback downloads, which will affect only a small percentage of total users for which EFF infrastructure is blocked. But if they're happy with using CloudFront for all users then that's as great as well.

Hainish commented 7 years ago

@HTTPSNowhere the fallback mirror is a good point. I expect if we simply register a new single-purpose domain to serve the rulesets from, this will be allowed through the firewall, but redundancy is always a good thing. We could just serve this from an Amazon S3 instance which is not blocked in China.

ghost commented 7 years ago

@Hainish ~~We may use S3 directly instead of CloudFront, but that will be much more expensive than using CloudFront over S3.~~ Better use Google Cloud Storage.

The advantage is that if the URL is similar to https://s3.amazonaws.com/EFForg/rulesets.json the only way to block it is to block the entire domain s3.amazonaws.com.

ghost commented 7 years ago

We may also use Google Cloud Storage which has global edge caching and has the same domain for all buckets, making blocking specifically HTTPS Everywhere ruleset updates difficult. It may also be cheaper than S3. Google is blocked in China. GCS isn't currently blocked but it likely will be in the future.

ghost commented 7 years ago

@HTTPSNowhere We will need a CDN anyway. We're talking millions of downloads for each ruleset update.

HTTPSNowhere commented 7 years ago

@Hainish

Thanks for stating that a new domain will be chosen exclusively for this task, I thought initially that it would be a subdomain for eff.org.

jeremyn commented 7 years ago

Assuming my suggestions in https://github.com/EFForg/https-everywhere/issues/12606#issuecomment-330706469 for enabling/disabling this update functionality are not implemented and users are required to accept these out-of-band updates, what's the intended behavior if the requests for updates always fails? An example might be users behind a firewall that simply drops outbound requests it doesn't recognize.

jeremyn commented 7 years ago

More concerns:

What about users with very limited or expensive bandwidth? There may be users who don't want to use or can't afford even tiny amounts of extra bandwidth. Currently these users can get all updates offline, if someone else downloads the entire extension to a disk and then shares the disk with them.

Regarding my UI suggestions, unfortunately it seems creating a UI for WebExtensions in Firefox for Android is especially challenging (see @Hainish 's comment here: https://github.com/EFForg/https-everywhere/issues/9958#issuecomment-317824367). so maybe it won't be possible to implement my UI suggestions there even if we wanted to. I suppose mobile users are also the most likely to be sensitive to bandwidth availability and price.

Also, what about users in areas who might be endangered by visiting eff.org or a known mirror? They should be able to turn off these requests.

ghost commented 7 years ago

@jeremyn I don't think we should approve third parties to supply updates through physical media. Even if the rulesets are signed the media may contain malware in addition to the rulesets.

ghost commented 7 years ago

@jeremyn Re. the third point: These users should not install HTTPSE if it's too dangerous. We shouldn't have others' safety depend on how well we hide our requests.

ghost commented 7 years ago

Overall, I suggest multiple mirrors for people who don't want their update requests to go directly to EFF.

jeremyn commented 7 years ago

@koops76 You can download HTTPS Everywhere directly from the EFF at https://www.eff.org/files/https-everywhere-latest.xpi with right-click "Save Link As". In fact it might be the same person who downloads the .xpi in one location and then takes the disk with them to another location to install on other systems. In this case the only parties involved are the EFF and the user. HTTPS Everywhere is also distributed indirectly through Tor, Tails, and Debian, all of which can also go on physical media.

For

These users should not install HTTPSE if it's too dangerous.

I don't think most users expect HTTPS Everywhere to make outbound calls so they might not be prepared for it to do this. I was surprised when I found out that PB does this.

HTTPS Everywhere is recommended at https://techsolidarity.org/resources/basic_security.htm, which is aimed at what I think is an important use case: a less-technical activist or journalist, one who possibly has had their computer/phone/internet security configured by a technical person in a one-off setting. Take a look at this Hacker News discussion regarding the Tech Solidarity link I gave, in particular the comment from Thomas Ptacek (tptacek) that reads in part:

These instructions are written for unsophisticated users, particularly journalists and activists, and were written with feedback from those users....

We're simultaneously working with the airport lawyer groups (there's a huge one at ORD). It's been jarring to realize how many compromises are required to make things workable for groups of non-experts to use. Just getting software installed is a major hassle, so anything you install or customize needs to be really worth the effort.

It is not reasonable to assume that these less-technical end users will understand and manage the risk involved with their phone contacting eff.org when they take it with them into for example a war zone.

I can't speak for the EFF and I can't say for certain whether it endorses this class of user for HTTPS Everywhere. I strongly believe that it does, or at least gives the impression that it does (see https://ssd.eff.org/en/module/how-circumvent-online-censorship), and if so, it needs to take this use case into account.

ghost commented 7 years ago

I suggest to obfuscate the requests behind innocuous domains provided by cloud services.

Examples:

s3.amazonaws.com
storage.googleusercontent.com
raw.githubusercontent.com
cdn.rawgit.com

ghost commented 7 years ago

@jeremyn Solution: do not go into a war zone?

HTTPSNowhere commented 7 years ago

@jeremyn

I think @Hainish took the best decision when he said,

the fallback mirror is a good point. I expect if we simply register a new single-purpose domain to serve the rulesets from, this will be allowed through the firewall, but redundancy is always a good thing. We could just serve this from an Amazon S3 instance which is not blocked in China.

So there's no worry about someone trying to connect to *.eff.org since ruleset updates will be fetched from an entirely different domain, and if they fail they'll be downloaded from a fallback mirror at s3.amazonaws.com.

@koops76

For what it's worth, googleusercontent.com and rawgit.com are blocked in China.

Bisaloo commented 7 years ago

On a related note, will it be possible to route ruleset updates through Tor? I seem to recall that the SSL observatory had this option.

cschanaj commented 7 years ago

Using CDNs as a redundancy could be dangerous if EFF infrastructure was block by local governments. This is because local governments may request CDNs service providers to collect IP addresses (through legal actions) for various "legal reasons", to be allowed to enter the market.

Routing updates through Tor could be a remedy. (Otherwise, the updates should fail AS-IS).

jeremyn commented 7 years ago

@HTTPSNowhere You make a good point that requests to domains like s3.amazonaws.com should be less obviously suspicious than requests to eff.org.

On the other hand, any unusual activity can be used to fingerprint or highlight users. I have no idea what non-suspicious internet traffic looks like in places like rural Afghanistan or South Sudan but I guess there are places where even requests to s3.amazonaws.com can flag users as suspicious. It may also be that mirror A is normal in one place but mirror B isn't, with the reverse being true somewhere else. Maybe all the popular sites in one place use Amazon as a CDN, but all the popular sites in another place use Google instead.

Also I agree with @cschanaj that mirrors are at risk of legal pressure. I mean, rawgit.com was suggested, but apparently rawgit.com is run by just one person. He might be a swell guy but that's obviously a huge single point of failure. That's an extreme example, of course.

I suppose it is possible to set up mirrors in a way that minimizes risk to users, but I think that would require some careful thought and actual research, not just speculation like we're all doing here.

I don't understand the suggestions to use Tor. Are people arguing that we should include a mini-Tor client in HTTPS Everywhere? Or are people talking about just the use case involving the Tor Browser?

Bisaloo commented 7 years ago

I don't understand the suggestions to use Tor. Are people arguing that we should include a mini-Tor client in HTTPS Everywhere? Or are people talking about just the use case involving the Tor Browser?

As far as I remember, there was an option in the SSL observatory to route the traffic through Tor if it was set up on the computer.

I am not saying this would completely solve the issues you are raising but it could help in some cases and I think it would be nice to have as an option.

Bisaloo commented 7 years ago

Other than that. I personally like the out-of-band updates but I fully agree that an offline option is necessary as well for the various reasons brought by @jeremyn.

I am not sure which should be the default though.

jeremyn commented 7 years ago

@Bisaloo I found this ( https://tor.stackexchange.com/a/1879 ) regarding HTTPS Everywhere and Tor from @diracdeltas :

When you enable SSL Observatory, we POST a copy of the certificate chain you saw to an EFF server, along with the time, server domain, and server IP address. We don't keep logs of your IP address, but for extra anonymity, we try to send the POST request through Tor if you have Tor installed. You don't have to be running Tor browser for this to happen.

So you already have to have Tor installed and running, it doesn't entirely come with HTTPS Everywhere.

ghost commented 7 years ago

In WebExtensions, to make a request over a proxy, you would need a native application installed separately on the user's system.

One example of an extension that needs a native application to work is DNSSEC Validator.

ghost commented 7 years ago

@jeremyn There are places where ANY request to foreign websites is suspicious, let alone using Tor, so your argument is not valid. S3 is often used by many websites to host files. Not sure why a request to S3 can be suspicious.

jeremyn commented 7 years ago

@koops76 As you say, "There are places where ANY request to foreign websites is suspicious...". HTTPS Everywhere needs to be very careful before forcing customers to make these requests.

ghost commented 7 years ago

@jeremyn They're screwed either way, to install HTTPSE they would have to access Google, Mozilla or EFF servers.

jeremyn commented 7 years ago

@koops76

they would have to access Google, Mozilla or EFF servers.

The user can download the extension while they're somewhere safe as I described before and then disable all updates while they're somewhere less safe.

I'm not really sure where you're coming from in this discussion. The point I'm making at the moment is something like, "Unexpected and uncontrollable web requests can potentially put some users at risk, so HTTPS Everywhere shouldn't do that." Do you entirely disagree with that? Do you think the UI suggestions I made wouldn't fix it? Do you think the potential risk is so small that it's not worth the added complexity? It's okay if we disagree but I'm not sure what we are disagreeing about, exactly.

ghost commented 7 years ago

It's still possible to detect the redirects HTTPS Everywhere makes.

ghost commented 7 years ago

Need a code example?

jeremyn commented 7 years ago

@koops76 "Need a code example?" No, I can imagine what you mean.

It might help to recognize what I'm talking about as a form of user fingerprinting. Just because it's possible to attack/observe a user in one way doesn't mean HTTPS Everywhere should go out of its way to make it easier to do that in other ways. For example the EFF talks about steps people can take to defend against browser fingerprinting at https://panopticlick.eff.org/about#defend-against even though a complete defense is effectively impossible against a really determined adversary.

Hainish commented 7 years ago

I'll remind participants in this discussion to maintain a respectful tone with regard to other contributors and maintainers. The tone of this thread has veered at times to be less than respectful, and this is a project that necessitates an atmosphere where everyone feels welcome, regardless of difference of opinion or background.

We shouldn't prescribe use cases for HTTPS Everywhere users, but assume that all sorts of populations around the world will be using the extension. This includes populations at risk, and be mindful that these at-risk populations are located in globally disparate locations. We should also assume that there is a point of diminishing returns when considering the different scenarios users may face, and at some point make a decision given our best judgement. This is all we can do without rabbit-holeing into a discussion of calculated risk and which group deserves more attention vis-a-vis another at-risk group. At some point you just need to make the call based on the available information.

On a related note, will it be possible to route ruleset updates through Tor? I seem to recall that the SSL observatory had this option.

In the past, we've detected if Tor was already running on a system before making the option available to enable submission of certificates through Tor. This entailed first attempting a SOCKS request on port 9150, which is the Tor daemon running in Tor Browser. As a fallback, we've tried 9050 to see if a system Tor process was running.

As certificate introspection is not (yet) available within WebExtensions, we don't have this code included in HTTPS Everywhere. It is important to keep in mind that use of Tor is in most cases detectable, unless the user is using a pluggable transport which specifically tries to obfuscate traffic analysis. This itself may put users at risk.

Choosing a single-purpose domain for ruleset distribution, with a fallback, seems the most reasonable path at the moment. The usage of HTTPS Everywhere is already fingerprintable. It has a massive effect on browser traffic patterns which is trivial to detect. So the fact that the domain has a single purpose, to distribute rulesets, places our users in no additional risk. The fallback option has to be dealt with internally, pursuant to our privacy policy, so that any CDN that we use will be able to see the IP addresses of our users. We have to ensure that they aren't irresponsible with this data, which is a matter of policy.

EFForg / https-everywhere

Discussion about out-of-band ruleset updates #12606