cbeuw / Cloak

A censorship circumvention tool to evade detection by authoritarian state adversaries
GNU General Public License v3.0
3.33k stars 297 forks source link

Question: redirection to vhost #50

Open beanslel opened 5 years ago

beanslel commented 5 years ago

As RedirAddr I use the IP of one of my personal websites. On the client, I specify the domain that resolves to this IP as ServerName. This website has an SSL certificate for the domain specified in ServerName.

However, the website is hosted on a shared hosting and the domain name is used as a virtual host. As such, simply browsing to the RedirAddr shows an error page saying "this website is not installed" - because it expects a domain name as vhost. The SSL certificate of the error page does not match the domain specified in ServerName (a certificate with the hoster's domain name is served e.g. server1.hostingcompany.com).

Browsing to the cloak public IP therefore does redirect all traffic to the correct IP where my website is hosted, but it does not serve my personal website (it serves the error page), and the SSL certificate does not match ServerName.

Does this impact the active probing mitigation of cloak?

cbeuw commented 5 years ago

Behaviour like this is to be expected on all of the websites where direct IP access isn’t allowed (such as websites behind a CDN). When you are visiting a website with a domain name through your browser, your browser will tell the server the domain name, so the server can decide which actual site to serve. If you visit the site with only an IP, the browser doesn’t know which domain you want to visit, so the server doesn’t know either. Even when direct IP access is allowed, since TLS certificates are issued to domains not IPs, the browser will warn of invalid certificate if you visit a website without providing a domain name since it can’t check the domain name in the certificate against the domain name the user wishes to visit, since the user didn’t tell the browser the domain name.

This isn’t relevant to cloak because cloak is transparent. The behaviour you get from visiting the site with or without a domain name is exactly the same as when you visit the cloak server. When the adversaries are probing, they will provide a domain name in their probe packets if they want to imitate a normal browser, or they won’t provide the domain name to imitate direct IP access. In either case cloak won’t affect the behaviour of the site at all so the prober would believe this is a normal website.

If you edit your host file and force the system to resolve your domain name as cloak’s IP, then when you visit the domain you’ll actually connect to cloak but since you’ve provided a domain name, the website will work as normal.

Sent with GitHawk

beanslel commented 5 years ago

Thanks for the comprehensive reply. I understand that this is expected behaviour and for an external observer there is no difference. Editing the host file indeed works as expected.

The reason I was wondering about the impact on active probing is because most large websites don't allow direct IP access, but they will still serve the correct SSL certificate. For example, browsing to 204.79.197.200 will give an error, but an SSL certificate issued to www.bing.com is served. I was wondering if this situation is not "better" than my example with my personal website, since this leaves no doubt that the IP I'm visiting belongs to bing.com, and not another domain. But like you say, the adversary will probably include the domain name in their probes, so I suppose it shouldn't matter.

Another point of thought: in one of the docs you mentioned that we're making it look like Cloak is a CDN node. If our Cloak were a real CDN node, then shouldn't ServerName resolve to the Cloak IP? My reasoning is as follows: to the observer, we are connecting to the Cloak IP, but we're making it look like we're connecting to ServerName. ServerName, however, resolves to a different IP (RedirAddr) . If we browse to RedirAddr directly (and present the proper domain name), we are not redirected to the Cloak IP, but the website is served from that IP. So then why are we connecting to the Cloak IP in the first place? Even more: how did we even find the Cloak IP? Our website didn't redirect us to it, and the domain name doesn't resolve to it. Doesn't this indicate to the observer that something is wrong?

cbeuw commented 5 years ago

The mismatch between ServerName's resolution and Cloak server's public IP has indeed always been an issue. There is somewhat of a leeway that since a domain can have multiple DNS records, the firewall may not be confident enough that it has all the potential ANAMEs to flag it as a resolution mismatch; although it would be very easy just to check against the root record.

It may be better to set the ServerName to your own domain so that it resolves to Cloak server's IP, and I think that's what most users who have their own domain are doing. However there are two issues: 1. Not everyone has a domain since it's not free, and 2. Your own domain generally gets very little traffic and the firewall wouldn't be hesitant to block the domain since it has little collateral damage.

It's easy to change IP (with AWS EC2 you only need to stop and start the server to be assigned a new IP), but hard to change domain. If the ServerName is something like www.bing.com, they can't block that domain but only the IP of your VPS; if your own domain gets blocked, you'll have to buy a new one.

So to sum it up, if you use a big name as ServerName, it's easier to detect but hard to block; if you use your own domain as ServerName so that the resolution result matches correctly, it's harder to detect but easy to block. So far I don't know which one is a better practice. We'll just have to find out.

There is a "middle ground" though, which is to do a reverse DNS lookup on your Cloak server's IP, and set that domain as ServerName. Though these domains generally look like ec2-12-34-56-58.eu-north-1.compute.amazonaws.com, so I don't know how well that'll work. But it does make the resolution match; and the "domain" is easy to change in case it gets blocked.

There are 2 potential solutions to this:

  1. Use https://blog.cloudflare.com/esni/. This is very easy to implement (actually reduces the complexity) and it stops the need of setting a ServerName at all since the firewall simply doesn't see the domain. This hasn't been widely implemented by browsers and web servers yet, so the usage of ESNI itself may trigger the firewall, but I do see this being the solution for the future.

  2. Use meek style domain fronting. So the idea is that, in the TLS handshake layer we set the domain name as one that's allowed, but under the TLS we actually make a HTTP request which specifies the host name or IP pointing to Cloak's server. Cloak then actually connects to a CDN like Cloudfront. The CDN will then redirect this HTTP request to Cloak's server, according to the host specified in the HTTP header. Since one CDN server may serve many domains, we can set ServerName to a domain that resolves to the same CDN server, and the DNS record would match but the firewall can't simply block the domain either due to high collateral damage. The problems with this are:

    1. It adds an additional hop which may increase latency and reduce throughput.
    2. Some CDNs, especially CloudFlare, checks the server name record in the TLS handshake against the host in the HTTP request, and refuses to serve it if they don't match. We can't set them both to a legitimate domain because then the CDN would redirect the traffic to that legitimate server according to the HTTP header, instead of the Cloak server; we can't set them both to own domain name either because that domain may be easily blocked.

    Some CDNs doesn't care about the mismatch between TLS' server name and HTTP header's host, like Amazon's CloudFront. In this case, Cloak would be really difficult to block since both domain blocking and IP (which will be the CDN's) blocking will cause a very high collateral damage.

cbeuw commented 5 years ago

The next big step of Cloak would be to add meek-style CDN domain fronting support, but instead of HTTP, I believe Cloak will use WebSockets instead. This is because HTTP follows a strictly request-response model, and the server cannot push data to the client. Meek solves this problem by using the client to poll data from the server regularly, which not only adds a tremendous amount of latency, the regular polling requests adds a lot of risk for it to be identified through measuring packet timing. WebSockets is widely supported by CDNs and it can be used just like a regular duplex TCP connection.

Though this may have to wait until I finish binge watching The Expanse first.

bash99 commented 4 years ago

So to sum it up, if you use a big name as ServerName, it's easier to detect but hard to block; if you use your own domain as ServerName so that the resolution result matches correctly, it's harder to detect but easy to block. So far I don't know which one is a better practice. We'll just have to find out.

Can them detect fake big name usage by check the root CA chain of TLS handshake? Sure we can not got a real cert of "big name".

aboka2k commented 4 years ago

@cbeuw hi, found this thread while searching for solution on howto setup Cloudfront @ https://github.com/cbeuw/Cloak/issues/126

i follow everything on the wiki, but still couldn't make it work. could you give some advice on that? thanks for sharing this wonderful work of yours here. cheers,