Provide ruleset updates over an onion service

J0WI commented 6 years ago

Split from https://github.com/EFForg/https-everywhere/issues/14907#issuecomment-375753526 Type: feature request

Ruleset updates for the Tor Browsers should be shipped over an Onion service. The main benefits are:

saving bandwidth on exit notes
avoid traffic analysis on exit nodes
strong end-to-end encryption (avoid CAs, but onion domain optionally be signed)

gopherit commented 6 years ago

Is it possible to first check the .onion hidden service for updates, and if there's no response, fall back to the clearnet site? There are some use cases where people may not be using the Tor Browser but still have access to .onion sites (such as via their router).
In addition to just ruleset updates, can the extension update be done through the .onion hidden service? And if the .onion site doesn't resolve or is unavailable, fall back to the clearnet URL. I'm not sure what browser limitations there are; i.e., does Firefox allow for more than one update URL? Does Chrome?

jeremyn commented 6 years ago

I think the traffic analysis argument is compelling. The extension is supposed to check for updates daily. You can guess that a given user will usually check for updates around the same time of day, and maybe at some particular part of their session (the start or end). An exit node watching enough traffic might be able to connect different sessions and make guesses about the timezone, habits, or configuration of a particular user.

Also I wonder how the Tails developers will handle these updates, since Tails by its nature does not use persistent storage. If they enable the updates then the updates will be downloaded every session. In that case an exit node can guess that a user who is downloading an update is statistically more likely to be a Tails user than a regular Tor Browser user.

@gopherit I don't think there should be an automatic fallback from a more secure to a less secure URL. If we add something like that, the user should be prompted somehow, or maybe the update should just fail (some users will not know how to safely evaluate the prompt options).

ghost commented 6 years ago

We'll need a EV TLS certificate (Only EV certificates can have .onion domains, I recommend DigiCert EV) since by default the only onion service authentication is 80-bit long part of service public key's SHA-1 thumbprint. While brute-force attacks are still impractical, collision attacks on SHA-1 are practical and were demonstrated by Google.

ghost commented 6 years ago

I have a powerful GPU (GTX 980 Ti) and want to help by generating a .onion domain starting with httpsrules.

Since it will use TLS, I won't be able to masquerade as a legit domain even with a key. In addition the rulesets themselves are signed. Regardless of that, I will securely erase the key after I will send it to the EFF.

The key will be ready in about a day. If you still want to generate your own domain, you can use Scallion, but you will need a very strong GPU.

jeremyn commented 6 years ago

@Hainish Could you please comment here, particularly about @epicminecrafting 's offer to generate a httpsrules .onion domain? I'm not supporting or opposing it at the moment, I'm just asking you comment since @epicminecrafting is offering to use/is currently using their own hardware to generate the domain.

ghost commented 6 years ago

To clarify, I could send the generated key to the NSA and they won't be able to do anything with it, because to push malicious rulesets to the users they would need to break both TLS and ruleset signatures. I currently run Scallion and the GPU is under 100% load and I can't game or do anything GPU-intensive, so if you don't need this key, tell me.

jeremyn commented 6 years ago

Yes, however, you may want to save the electricity/computing costs until @Hainish has confirmed that this is something the EFF is willing to use. It is up to you though.

ghost commented 6 years ago

@jeremyn Don't worry about that, it's fine.

ghost commented 6 years ago

I'm going to add that while all requests to https-rulesets.org go to Fastly, all requests to httpsrulesets.onion will go directly to the backend. That also means it will get more load than any other .onion service. I'm not sure if it will be able to handle such a load.

Hainish commented 6 years ago

Thanks for raising this issue @J0WI, and thank you for the offer @epicminecrafting.

Facebook has set up a CDN for its onion service, but the setup is rather complicated. I've spoken with the Facebook engineers involved in setting this up, and it's quite an engineering feat. I think this alone would make it impractical for us to use a .onion service, unless CDNs start supporting it, which I find unlikely. All Tor Browser clients directly accessing a .onion service we control would likely be more bandwidth than we could sustain.

As for traffic analysis, Tor Browser rotates its circuits for a given site every two hours. This means that by the time a new request for https-rulesets.org is made, a new circuit (and thus exit node) would be used. But more fundamentally, Tor Browser features first-party tor circuit isolation, meaning that no first party request can learn anything about requests to any other first-party. This means that as an exit node, even in the absence of circuit rotation, I wouldn't be able to determine anything about a particular instance of Tor Browser except that it's connecting to https-rulesets.org on a periodic basis. Which is public information anyway.

It's worth noting that this is already the case for connections to addons.mozilla.org, except for the fact that in that case, the connection is a bit more fingerprintable. If you don't have the Tor Browser default set of addons, analysis of TLS stream length may be enough to fingerprint you. Even then, it won't be able to de-anonymize you, since every other first-party connection will run through a different circuit.

For saving bandwidth on exit-nodes, this is true, since there will be no exit-nodes in the onion service circuit. But if you measure overall bandwidth, this will make matters worse. This is because onion service traffic traverses six relays: three chosen by the client, three by the onion service. This means a greater load on the Tor network overall. Granted, exit nodes are a rarer commodity, and one could argue that exit node bandwidth is more valuable. But in terms of connectivity, this increases latency for the client and introduces more potential points of failure.

All other things being equal, the end-to-end argument alone would offer one potential additional layer of security. However, given that we're signing rulesets with an isolated RSA key as well as providing rulesets over TLS, I don't think this actually gives us much. Sure, we'd be less reliant on the CA infrastructure, but where does the threat lie in that? Is it in the potential of state actors to possibly MITM TLS connections? If this is the case, we could pin the https-rulesets.org TLS key into the extension itself.

Given what I've outlined here, I can see significant drawbacks in offering a onion service, with only limited or elusive benefits.

ghost commented 6 years ago

[...] If this is the case, we could pin the https-rulesets.org TLS key into the extension itself.

To do that, we need to get webRequest TLS introspection API propsal accepted.

jeremyn commented 6 years ago

@Hainish My understanding has been that the default extensions that the Tor Browser provides, like HTTPS Everywhere, are updated with the Tor Browser itself over Tor, and not separately through AMO. Am I wrong about this? For example https://blog.torproject.org/tor-browser-80a5-released advertises that it updates HTTPS Everywhere to 2018.3.13.

For the traffic analysis situation I mentioned at https://github.com/EFForg/https-everywhere/issues/14979#issuecomment-377255455 , a made-up example is that if an exit node knows that statistically a user who updates at time Y is 80% likely to be in timezone X, and the exit node sees an update at time Y, it can guess with reasonable certainly that the user is in timezone X. I'm not sure if that is what you meant by "public information", but it is probably not information that the typical Tor user expects the exit node to be able to guess about them.

gopherit commented 6 years ago

a made-up example is that if an exit node knows that statistically a user who updates at time Y is 80% likely to be in timezone X, and the exit node sees an update at time Y, it can guess with reasonable certainly that the user is in timezone X. I'm not sure if that is what you meant by "public information", but it is probably not information that the typical Tor user expects the exit node to be able to guess about them.

In this hypothetical example, if this is a real threat worth addressing, maybe randomize when TBB connects to https-rulesets.org? Instead of once per 24h, update at a random time >24h since the last update && <48h?

Is an exit node geo-fingerprinting a TBB user's time zone even a realistic scenario? i.e. is there historical data available (via fastly?) that's granular enough to know when Tor exit node IP addresses connect to update HTTPSE? Enough to build a statistical model of when Tor users are likely to be active and somehow associate that data to the geographic origin of said request? e.g. "Most Tor users who launch TBB at 17:00 UTC are likely to be in the USA since it's daytime there"? Seems like a limited attack scenario with very high degree of error.

jeremyn commented 6 years ago

@gopherit Randomization can't fix the problem entirely because a user's device will never update while it is off or not connected to a network, and you can guess something about the user during these times, for example that they are asleep.

I'm not sure what you mean by realistic. There are certainly governments and companies out there in a position to watch a statistically significant amount of traffic and see the patterns I'm suggesting, if these patterns exist. You can guess the patterns for Tor users based on patterns for regular users, which is convenient because there are a lot more regular users than Tor users.

An invented attack scenario based loosely on the Edward Snowden situation might be this: you're the NSA and you suspect you have a leaker working for you in Hawaii. You also suspect this leaker has an email account at Lavabit. You run a lot of Tor exit nodes and can guess a user's time zone based on their HTTPS Everywhere ruleset update time. One day at one of your exit nodes you see a user update their rulesets at a time where you guess they are in Hawaii, then the user visits Lavabit, and then they look up travel information for Hong Kong. You look at flights from Hawaii to Hong Kong and see one of your contractors has a flight that leaves in a week. You've got them.

How realistic this scenario is depends on how many Tor users are in Hawaii, how strong the update patterns are, how many NSA Tor exit nodes there, and how badly the NSA wants to identify this leaker. I can only guess at these factors but I think the situation is plausible.

Anyway I don't mean to say that these situations absolutely require the EFF to provide a .onion service, but I'd like to make sure this risk is understood, since @Hainish suggested a .onion service would provide only "limited or elusive benefits".

ghost commented 6 years ago

@jeremyn This won't work since Tor uses a separate exit node for each website.

J0WI commented 6 years ago

I just stumbled over https://trac.torproject.org/projects/tor/ticket/17216 so this seems to be a general enhancement for Tor. So such an onion update infrastructure could be controlled by the Torproject directly.

Granted, exit nodes are a rarer commodity, and one could argue that exit node bandwidth is more valuable. But in terms of connectivity, this increases latency for the client and introduces more potential points of failure.

I still think bandwidth is more important than latency in the Tor network. The update process is seamless for the user.

In this hypothetical example, if this is a real threat worth addressing, maybe randomize when TBB connects to https-rulesets.org? Instead of once per 24h, update at a random time >24h since the last update && <48h?

That's a common practice to reduce load peeks on update servers. But I'm not sure if this would work for us/Tor users, because it depends on how much time you spent in the browser.

Also I wonder how the Tails developers will handle these updates, since Tails by its nature does not use persistent storage.

Good point. With some statistics it might also be possible to estimate the amount of Tor users, Tails users or even traffic per user (let's say the entry point is unknown).

Hainish commented 6 years ago

@Hainish My understanding has been that the default extensions that the Tor Browser provides, like HTTPS Everywhere, are updated with the Tor Browser itself over Tor, and not separately through AMO. Am I wrong about this? For example https://blog.torproject.org/tor-browser-80a5-released advertises that it updates HTTPS Everywhere to 2018.3.13.

HTTPS Everywhere is built into the Tor Browser at compile time. The update mechanism used to be through AMO, but for the last few years the extension has been updated via the self-hosted eff.org endpoint. NoScript is updated via AMO, though. So in that respect, requests to https-rulesets.org are no different than what is already happening with extension updates in Tor Browser.

Again, the traffic analysis wouldn't work unless there was an ongoing sybil attack on Tor (since short burst sybil attacks are very detectable). And at that point, the fundamental anonymity guarantee that Tor provides would be broken.

jeremyn commented 6 years ago

@YegorIevlev Tor doesn't use a separate exit node for each website, but instead uses the same circuit/path for a short period of time, currently 10 minutes. This would be enough time for the user in my previous hypothetical scenario to do the things that I said might identify them.

@J0WI Does the ticket you linked mean that Tor doesn't download its own updates over a .onion service? I thought it did.

@Hainish So to clarify are you saying that currently a given version of Tor Browser will check for updates to HTTPS Everywhere using a non-.onion server? In that case I agree that adding a new check to https-rulesets.org isn't significantly worse for the user, in the sense that they were already vulnerable to the sort of attack I described.

This is getting a little sidetracked here, but: for your second paragraph I'm not sure what you are arguing, but I assume (without proof) that the US government owns or monitors as many Tor nodes as it can get away with, particularly exit nodes; that most of these are activated in a way that is indistinguishable from small private owners without very sophisticated analysis; and that its control of the network is only seriously limited by its own desire not to compromise the value of the Tor network to itself, and by other security agencies and powerful actors that are also trying to exert control over the network. (I only glanced through your linked article so I apologize if these assumptions are directly addressed in there.)

Anyway could you please clarify exactly where this discussion is at? If you're sympathetic to making a .onion update service but the expense is simply too much then that's one thing, or, if it's not a matter of cost but rather that you think these updates are harmless, then that's another thing. In other words I'd like to know what is primarily blocking this from happening and if there is any discussion we could have to overcome that block. Otherwise this is not going to be a very productive discussion going forward.

ghost commented 6 years ago

@Hainish We can only host version data on an .onion service, and keep the actual downloads on the CDN. What do you think?

@gopherit

In this hypothetical example, if this is a real threat worth addressing, maybe randomize when TBB connects to https-rulesets.org? Instead of once per 24h, update at a random time >24h since the last update && <48h?

That's a good idea for another reason (avoiding network overload). I will implement it in a short time.

ghost commented 6 years ago

@jeremyn

Tor doesn't use a separate exit node for each website

We're talking about the Tor Browser and it does use a different circuit for each different first party domain, just open the Tor Browser and go to eff.org, then open a new tab and go to ghacks.net and compare the two circuits.

jeremyn commented 6 years ago

I didn't know that the Tor Browser uses a different circuit for each first party ("address bar") domain, thanks for clarifying. @Hainish said that at https://github.com/EFForg/https-everywhere/issues/14979#issuecomment-380894795 but I either didn't read that part or didn't understand it. In that case I agree that the tracking I described in my scenario at https://github.com/EFForg/https-everywhere/issues/14979#issuecomment-381109901 is much less likely or actually impossible.

I guess this is in the documentation at https://www.torproject.org/projects/torbrowser/design/#identifier-linkability (see also https://tor.stackexchange.com/a/14635 ) which says in part:

The Cross-Origin Identifier Unlinkability design requirement is satisfied through first party isolation of all browser identifier sources. First party isolation means that all identifier sources and browser state are scoped (isolated) using the URL bar domain.

I assume, but have been unable to confirm for sure in the linked document, that "identifier sources and browser state" includes or implies the Tor circuit. Subsection 8 "Tor circuit and HTTP connection linkability" of the linked section says that new circuits are used for third party requests to the same domain which originate from different first party requests, so I assume that if they are doing that for the same third party requests, they are also doing it for the different first party requests

Mikaela commented 5 years ago

Would OnionBalance be any help towards this issue?

The OnionBalance software allows for Tor hidden service requests to be distributed across multiple backend Tor instances. OnionBalance provides load-balancing while also making onion services more resilient and reliable by eliminating single points-of-failure.

Debian and Torproject say to be using it and I think Debian might have more traffic than HTTPS Everywhere would or at least larger files.

zoracon commented 3 years ago

Considering that this extension is on it's way to sunset around next year I am closing this out.

EFForg / https-everywhere

Provide ruleset updates over an onion service #14979