GrapheneOS / os-issue-tracker

Issue tracker for GrapheneOS Android Open Source Project hardening work. Standalone projects like Auditor, AttestationServer and hardened_malloc have their own dedicated trackers.
https://grapheneos.org/
335 stars 18 forks source link

DNS queries leaks if always-on VPN with killswitch enabled malfunctions #3442

Open ryrona2 opened 2 months ago

ryrona2 commented 2 months ago

I was testing out the VPN functionality while monitoring all network traffic on my Wifi hotspot, and while intentionally trying to sabotage the connection, just to see that the killswitch works as intended.

It turns out it is possible to cause DNS queries to leak outside the VPN when it malfunctions. This breaks the expected security a VPN with killswitch should provide, that your internet activity like what sites you visit remains hidden from your ISP even when VPN malfunctions.

Steps to reproduce:

  1. Install Mullvad VPN app from F-Droid repository.
  2. Login to Mullvad account and connect.
  3. Use any app, like Vanadium web browser. No DNS requests leaks outside VPN.
  4. Go to Settings, App permissions and remove Network permission for Mullvad VPN.
  5. Use any app, like Vanadium web browser. Now DNS requests goes out outside VPN, even if no other network traffic works.

This cannot possibly be an issue with the Mullvad VPN app, since I have revoked its network permissions so it cannot be the one sending the DNS queries, so the leak is somewhere in GrapheneOS.

thestinger commented 2 months ago

This cannot possibly be an issue with the Mullvad VPN app, since I have revoked its network permissions so it cannot be the one sending the DNS queries, so the leak is somewhere in GrapheneOS.

No, this is an assumption you're making. Mullvad is setting up the configuration for how the OS VPN functionality works. You're assuming that it's not doing something wrong.

ryrona2 commented 2 months ago

This cannot possibly be an issue with the Mullvad VPN app, since I have revoked its network permissions so it cannot be the one sending the DNS queries, so the leak is somewhere in GrapheneOS.

No, this is an assumption you're making. Mullvad is setting up the configuration for how the OS VPN functionality works. You're assuming that it's not doing something wrong.

Okay. Yes I assumed that all network traffic got routed through the app. Is there some VPN app known to be coded right that I can try to reproduce this issue with too? Or can I dump the VPN configuration the app has set up somehow to check if it is correctly set up? If the DNS query is not made by the app, shouldn't the killswitch block it anyway?

thestinger commented 2 months ago

Try using the official WireGuard app instead.

ryrona2 commented 2 months ago

Try using the official WireGuard app instead.

Unfortunately this didn't tell me anything, because the official Wireguard app detects when Network permission is removed from it, and immediately disables the VPN. No DNS is leaked in that case, but all apps believe there is no internet, so maybe they are not even trying. In the case of the Mullvad VPN app, it still believes there is internet, so doesn't disable its VPN configuration.

I would have expected the killswitch to block the DNS query anyway, however Mullvad VPN app has set up their configuration, since the killswitch should be blocking all traffic not going out over the VPN, with the VPN app having no say on the matter.

Tryptamine9 commented 2 months ago

So you have confirmed that a correctly written VPN app, such as the official WireGuard app, that is known to function correctly DOES kill all network traffic when it looses network connection and the kill switch is activated!

Also you have found that a poorly written VPN app DOES NOT kill all traffic when it looses connectivity, and there is a leak when DNS requests are made. Sounds to me like this should be filed with the development team behind Mulvad VPN, not here...

ryrona2 commented 2 months ago

@Tryptamine9 Sure, the Mullvad VPN app could probably act better here. But isn't the whole idea with a kill switch that apps shouldn't be able to make connections outside the VPN when the VPN is malfunctioning? The kill switch is named "Block connections without VPN". This is not happening here. DNS queries goes out without the VPN. And the kill switch is provided by GrapheneOS, not any VPN app. So even if this specific VPN app certainly could improve, it sounds like the actual security issue is in GrapheneOS, either by implementation or expectation of functionality.

thestinger commented 2 months ago

GrapheneOS uses the standard implementation of this with no changes to it. None of the features or bugs with VPN support have to do with GrapheneOS at the moment.

ryrona2 commented 2 months ago

GrapheneOS uses the standard implementation of this with no changes to it. None of the features or bugs with VPN support have to do with GrapheneOS at the moment.

Okay. Do you expect me to file a bug ticket upstream, or do the GrapheneOS team handle it? Which bug tracker? I am not familiar with upstream here.

thestinger commented 2 months ago

It would be best to file an Android security issue and we'll do our own investigation to fix it early ourselves. It still needs to be determined what's happening and if it's an OS or app side issue. For example, it could simply be that the API is hard to use correctly and there is supposed to be some configuration done that's not being done by the apps.

ryrona2 commented 2 months ago

@thestinger Yeah, I will investigate more before escalating to AOSP or app developer. The short investigation I did a few days ago showed that the VPN app detected the Network permission being removed, and tore down the VPN interface and then immediately set it up again. But the kill switch was not torn down, and it looks like it is set up in a proper way. And all the rest was set up in an identical way to before removing the Network permission, so I do not yet understand what caused DNS queries to suddenly go out outside VPN.

My current suspicion is that apps do not actually send DNS queries themselves, but that there is a system wide DNS resolver running as another UID which does all DNS queries, and something happens so apps suddenly are able to query the system wide DNS resolver, and that one was not bridged back to the VPN app properly during the rapid tear down and setup of the VPN interface, and since that one is not blocked by the kill switch, it ends up sending out the queries outside the VPN.

If you know how DNS is handled in GrapheneOS, please tell me, since that could speed up my investigations a little. Also if you know how the removal of Network permissions is implemented. I have at least confirmed the permission removal isn't implemented the same way as the kill switch, but still don't know how. I initially suspected they may clash, but that is probably not what is happening after all.

I will prioritize the multicast leak ticket since that is a confirmed real problem. For this one, if the issue is only happening if Network permission is removed from the VPN app, it is not such a serious issue, since there is no logical reason a user would remove the Network permission for the VPN app. Still a bug, but not so serious. I will try to find another way to reproduce the issue that works on AOSP too, but I guess the easiest way is to first find what the actual issue is.

thestinger commented 2 months ago

@ryrona2 Most apps use the system DNS resolver which is meant to send requests through the VPN provided DNS implementation. Native DNS are handled differently from other requests due to the caching, etc.

Rawa commented 1 month ago

Hello! I'm a developer from Mullvad VPN whom been looking into this issue over the last week and thought I might shim in and give some more context.

The DNS leak is possible to reproduce in the Wireguard app as well. We have reported the issue to Google, you can find the issue here, including steps to reproduce: https://issuetracker.google.com/issues/337961996

From our testing we can observe the following (also stated in our issue to Google):

  1. If no DNS server is configured on the VPN, DNS requests may leak.
  2. When a tunnel is torn down, the system will leak as well. So setting up two tunnels in the Wireguard app, both with DNS and then switching between them will cause a DNS leak as well.

From our testing we see that the android DNSResolver and DatagramSocket with DatagramPacket won't go out, but some browsers will leak (e.g Chrome), maybe because it uses the Native DNS as mentioned by @thestinger.

Also below you can find a gist with a HTML file that has javascript embedded. The HTML file can be opened in your browser of choice and will do GET requests to unique URLs, thus resulting in new DNS requests. By running this in Chrome with case 1 & 2 and observing network traffic with e.g tcpdump you can see this leak in action. https://gist.github.com/Rawa/dcc636e45f95143a8ea65ba3ca366ae8

Thanks for creating this issue request and reporting it also to us.

no-usernames-left commented 1 month ago

Mullvad has a good, information-dense writeup on this issue here.

thestinger commented 1 month ago

Their post acknowledges that they fixed a bug in their app which resolved a major part of the issues. The issue while reconnecting looks a whole lot like a race condition and it's not yet clear if the issue is on the OS side or the app side. The other multicast issue looks like an OS bug, but this one needs further research to determine the cause. It's likely that the OS can prevent these leaks by working around how apps behave even if it's an app bug but that doesn't imply that the leak blocking toggle was meant to do that. It's meant to block access when the VPN is down, not if the VPN sets a partial configuration or does the setup in the wrong order.

thestinger commented 1 month ago

Their previous post about connectivity checks claims something that's working by design without issues is a leak. It was highly misleading and largely inaccurate. Android VPN configuration is per-profile and system wide traffic goes through the Owner user VPN by default. Connectivity checks, NTP and the traffic from the VPN app itself is explicitly opted out from going through the VPN. GrapheneOS doesn't use NTP because it's insecure and we simply have our HTTPS network time updates go through the Owner user VPN since they use TCP rather than UDP which may not work through a VPN. We're also fine with users having to fix their clock if it's incorrect to the point that the VPN certificates aren't verifying. Connectivity checks would not work if they went through the VPN. The whole point is detecting if each underlying network works and triggering captive portals which then triggers a UI for handling them which also doesn't get routed through the VPN so users can deal with a captive portal without fully disabling their VPN.

The new post about these DNS leaks is making a lot of assumptions about it and we don't agree with the conclusions that are being drawn. We believe the issue is a race condition where DNS configuration is updated after VPN configuration and we believe it may be possible for apps to avoid this on their own. We plan to implement some form of synchronization in the OS to prevent this but that's not going to help people outside GrapheneOS. As far as we're concerned, it's a very good thing for the OS to provide this functionality instead of each app being given immense privileged access and trying to implement it on their own with no incompatibilities with other apps using those privileges and full support for all the complex functionality supported by the OS in the way that it's intended to work. How is it realistic to do it any other way? If this is an app bug, it's still probably possible to work around it in the OS and block these kinds of leaks, which is a major advantage of the approach. Portraying it as a bad approach and insecure is very silly. It's not as if these apps have anything close to a spotless reputation of avoiding leaks elsewhere.

Connectivity checks are not leaks and it hurts Mullvad's credibility each time that's claimed. The fact that the connectivity check article is positive about GrapheneOS doesn't change how we feel about that.

no-usernames-left commented 1 month ago

Their previous post about connectivity checks claims something that's working by design without issues is a leak.

Daniel, I am the first one to admit that you have forgotten more about Android than I am ever going to know. However, step 10 of their reproduction instructions is pretty damning; if VPN killswitch is enabled at the OS level then I believe it is a completely fair expectation that absolutely no plaintext DNS queries should ever be "on the wire", and to see them in Wireshark on the other end of the Wi-Fi is absolutely a bug (one that is not Graphene's fault).

In some countries, those leaks could get someone killed.

thestinger commented 1 month ago

@no-usernames-left It's entirely supported to send DNS queries to the regular network DNS while using a VPN with the kill switch enabled. The feature is very flexible and allows doing a lot of different things with the feature. Bear in mind that the feature is not only used with an actual VPN and is meant to support split tunneling features natively without the VPN having to split things itself although in practice the VPN has to handle that itself if it wants non-DNS traffic to be "leaked" on purpose to either the local network or specific apps passing through. It's a supported configuration to not send DNS through it though. The leak toggle is there to prevent leaks that the application can't avoid itself because it's down or when it's up. Half of what Mullvad is calling a bug doesn't really appear to be a bug. The other half may be a bug, but it's possible that could be avoided by the application too by avoiding the race itself. Perhaps it should be figured out what the cause is and whether the app can fully avoid it BEFORE assuming that it's actually a bug in Android's toggle, just a thought.

thestinger commented 1 month ago

In some countries, those leaks could get someone killed.

Perhaps you should use the built-in IPSec support if it's serious. We can't make any promises about whether apps leak since we don't control them. If it was our app we'd be looking into whether we could change how it brings up the VPN to over telling the OS that the VPN is up before the DNS configuration is processed.

ryrona2 commented 1 month ago

It's entirely supported to send DNS queries to the regular network DNS while using a VPN with the kill switch enabled.

I just want to say this is totally unexpected behavior from the user's perspective, whether it is as designed or not.

If you set up a kill switch for the VPN on Linux, you would add an iptables rule that blocks all network traffic not going to the specific IP:port combination belonging to the VPN. This would bring peace in mind that even if something does go wrong with the VPN, the system DNS resolver nor anything else can send out traffic on the network.

Also, the kill switch is really just a poor approximation of how a VPN setup should be designed when privacy matters. Whonix is doing it right with their Workstation and Gateway VMs, and QubesOS supports doing a similar setup for VPNs. That is leak proof in every single way.

With that said, I am happy if this leak is resolved, whether by the app or the OS or both. I acknowledge GrapheneOS may want to keep a small delta to AOSP, so I also think it is good the Mullvad developers reported this issue to AOSP, as I think that is where it should be fixed if it isn't GrapheneOS specific, which it looks like it isn't.

thestinger commented 1 month ago

The built-in OS VPN support can do a better fail safe kill switch than how VPN apps need to work because the OS doesn't know what the VPN app is meant to do and has to allow all traffic it sends and DNS configurations it chooses to use.

thestinger commented 1 month ago

Should be blocked in the latest release:

https://grapheneos.org/releases#2024050900

Some major improvements should really be made to this infrastructure but doing it downstream would be quite questionable.

no-usernames-left commented 1 month ago

Thanks Daniel!

mateusz-markowicz commented 1 month ago

We did some testing on our side (ProtonVPN) and what's happening is that after establishing the tunnel all API requests that our app does fail with DNS errors. When openining the tunnel we set in VpnService.Builder DNS server to 10.2.0.1 (see https://developer.android.com/reference/android/net/VpnService.Builder#addDnsServer(java.net.InetAddress)). We noticed that when the app doesn't set DNS server at all or we set it to some public DNS like 1.1.1.1 everything works fine. So it seems like your fix somehow interferes with the way we set DNS server for the connection.

albin-mullvad commented 1 month ago

Hey! I'm working with @Rawa on the Mullvad VPN Android app. We also did some testing and could see similar results as reported by @mateusz-markowicz. By the time we were going to report back here a few days ago we saw that the related fix already had been reverted (https://github.com/GrapheneOS/platform_system_netd/commit/296ccdc5eb955f67ff8ced0a2612e90a5bd77624).

While the fix was addressing the original issue, we could see that regular API communication was broken while having a connected tunnel. In our test setup we could see that DNS queries would leak outside the tunnel BUT still use the tunnel DNS server. In the following capture, 192.168.1.128 is the device and 10.64.0.1 is the DNS server configured for the tunnel.

09:32:59.797907 IP 192.168.1.128.10157 > 10.64.0.1.53: 42899+ A? ipv4.am.i.mullvad.net. (39)

thestinger commented 1 month ago

This didn't work out and will be reverted in the next GrapheneOS release.