GrapheneOS / os-issue-tracker

Issue tracker for GrapheneOS Android Open Source Project hardening work. Standalone projects like Auditor, AttestationServer and hardened_malloc have their own dedicated trackers.
https://grapheneos.org/
351 stars 19 forks source link

Apps can bypass VPN with killswitch by sending multicast packets #3443

Closed ryrona2 closed 3 days ago

ryrona2 commented 5 months ago

I have been trying out the security of the VPN functionality in GrapheneOS to see that there are no leaks or VPN bypass when killswitch is enabled. I have monitored the network traffic using Wireshark on my Wifi hotspot.

Here I have observed that apps are able to send multicast packets not going out over the VPN, but instead going to the local network, despite killswitch being enabled. This even with apps like Mullvad VPN app that clearly says it blocks all local network traffic. This is a clear privacy issue in the common case for using a VPN where the local network is untrusted, because the user expects no one on the network learning about your apps or usage when on the VPN. This issue is reproducible with both Mullvad VPN app and official Wireguard app, and probably others, so likely an issue in GrapheneOS rather than the apps.

Steps to reproduce:

  1. Install Mullvad VPN app from F-Droid or official Wireguard app from their website.
  2. Login to or configure your VPN app, and make sure always-on and killswitch is enabled for the VPN in settings.
  3. Install VLC app from F-Droid.
  4. Start VLC and click to menus to browse videos, audio and other media. Broadcast packets are being sent out on the local network to scan for media sources, bypassing the VPN.

The broadcast packets are regular IP broadcast packets of UDP and IGMP kind. I don't know if the app can receive answers to these broadcasts.

ryrona2 commented 5 months ago

Possible relevant forum thread, Spotify may be sending multicast packets as well: https://discuss.grapheneos.org/d/10337-spotify-communicating-with-other-devices-on-local-network/6

Quote from John-longson:

Long story short, I made an anonymous Spotify account (that has never seen the internet without a vpn and has never interacted with another account) and was listening to music on WiFi when my girlfriend got home and turned on her Spotify and mine had a pop up asking me to join her session. This shouldn't be possible since we both use a VPN. I contacted the VPN and also tried other VPNs and the result is always the same. The VPN has LAN disabled, and all other settings set to avoid this, including killswitch.

Spotify must have found some vulnerability either in the OS or the networking and is able to see other devices on the network or has just enough data to fingerprint your network and discover if two Spotify devices are on the same network. I confirmed this by turning off WiFi, and her device disappeared.

This clearly illustrate the privacy violation happening, and what nefarious things apps can do that the user does not expect when distrusting the local network by using a VPN with killswitch. It was found later in the thread it likely is mDNS multicast packets doing this.

Quote from alfred:

I was not able to replicate getting the Spotify session to come up, but it could be my set up. Verified with a packet capture that Spotify does use mdns (Multicast DNS), which does not go over the VPN even if the kill switch and blocking LAN is enabled.

ryrona2 commented 5 months ago

I have been digging a little bit into this issue. The source code suggests routing tables are modified using netlink to enable and disable the kill switch. It does not look like iptables or similar is being used to implement the kill switch at all, judging from the source code alone.

I could actually print all routing tables and rules without root permissions on the release build of the device, so I could look into this without having managed to set up my build environment yet. I couldn't see if iptables is used though, since that required root, but as seen below, the kill switch is indeed set up using routing rules.

Here is the output of running "ip rule" which list the rules for IPv4. The comments are mine.

# Loopback and broadcast permitted on all interfaces including real ones? Uncertain about what this is.
0:  from all lookup local 

# Dead rule, will be skipped
10000:  from all fwmark 0xc0000/0xd0000 lookup 99

# This allows the system (uid=0) to send over Wifi (wlan0), it also sets Wifi as default route
11000:  from all iif lo oif dummy0 uidrange 0-0 lookup 1002 
11000:  from all iif lo oif wlan0 uidrange 0-0 lookup 1046

# Dead rule, will be skipped
12000:  from all iif tun0 lookup 97 

# This allows all apps to send over VPN (tun0), it also sets VPN as default route
13000:  from all fwmark 0x0/0x20000 iif lo uidrange 1300000-1399999 lookup 1051 
13000:  from all fwmark 0xc0067/0xcffff lookup 1051 

# This is the kill switch that blocks the packet from being sent, unless sent by a rule above
14000:  from all fwmark 0x0/0x20000 iif lo uidrange 1300000-1310152 prohibit
14000:  from all fwmark 0x0/0x20000 iif lo uidrange 1310154-1320152 prohibit
14000:  from all fwmark 0x0/0x20000 iif lo uidrange 1320154-1399999 prohibit

... there are many more rules, but regular app traffic should have been blocked at this point ...

Here are the actual routing tables:

$ ip route list table 1051                                              
default dev tun0 proto static scope link 
10.0.0.1 dev tun0 proto static scope link

$ ip route list table local                                             
local 10.0.0.1 dev tun0 proto kernel scope host src 10.0.0.1 
broadcast 127.0.0.0 dev lo proto kernel scope link src 127.0.0.1 
local 127.0.0.0/8 dev lo proto kernel scope host src 127.0.0.1 
local 127.0.0.1 dev lo proto kernel scope host src 127.0.0.1 
broadcast 127.255.255.255 dev lo proto kernel scope link src 127.0.0.1 
broadcast 192.168.0.0 dev wlan0 proto kernel scope link src 192.168.0.100
local 192.168.0.100 dev wlan0 proto kernel scope host src 192.168.0.100
broadcast 192.168.0.255 dev wlan0 proto kernel scope link src 192.168.0.100

I am wondering about those broadcast rules, do they permit multicast too maybe? I will try to dig a little bit deeper later, but this is as far as I got now.

maade93791 commented 5 months ago

@ryrona2 filtering for killswitch vpn is done via netd per UID not iptables. see https://github.com/GrapheneOS/platform_frameworks_base/blob/14/services/core/java/com/android/server/connectivity/Vpn.java#L2046

thestinger commented 5 months ago

@ryrona2 It's implemented with eBPF, not routing tables.

no-usernames-left commented 5 months ago

I can confirm that Spotify cannot detect a Spotify Connect enabled speaker on my home LAN unless I disable Always-On mode for the VPN, even though Local Network Access is always enabled from within the Mullvad app.

Perhaps the multicast discovery packets may be making it out (hence the popup that the forum poster saw from his girlfriend's Spotify) but the replies are being lost?

FYI, Spotify Lite is much less bloated (and likely more privacy-friendly), but with limited functionality (such as no Spotify Connect).

ryrona2 commented 5 months ago

@maade93791 Yes. I have gone down that path already, and then to netd, and there I saw that netd sets up the kill switch using netlink to manipulate the routing table rules, and nothing else. So, I do not believe iptables is being used either.

@thestinger Really? I felt pretty confident I have found the actual logic being used, and that was manipulation of routing table rules. But I guess I will know soon anyway, I have made a userdebug build and am trying to figure out how one actually flashes the thing. After that I should be easily able to confirm whether I have found the right place or not by simply manually removing the routing table rules I believe is the killswitch and see what happens.

@no-usernames-left That could be so. I know the packets are definitely sent out, but not more than that. And since it isn't iptables being used, it is a bit hard for me to understand what the rules actually do block and not. But I will see if I can figure it out. It is a bit odd they didn't go with iptables, since iptables is designed to support use-cases like this.

alfred6427 commented 5 months ago

@ryrona2 I wonder if the issue is not related to the VPN kill switch. It sounds like eBPF is used to filter/tag traffic, so I wonder if multicast traffic is not evaluated in the eBPF program, or maybe not assigned to the user profile.

multicast range is 224.0.0.0/4, so one potential solution would be to block that range as part of the VPN kill switch.

thestinger commented 5 months ago

I think the issue is likely with the eBPF code. Linux tries VERY hard to receive and send packets. It will receive and send them in essentially any way by default. Filtering has to be implemented via eBPF or netfilter (iptables/nftables). It appears Android is using eBPF for this right now. If you looked at the iptables rules and determined it wasn't done there, that doesn't mean it's done with route configuration.

thestinger commented 5 months ago

On our servers, we use nftables to emulate a strong host model for input, but not currently output, and we exclude loopback since there are many things requiring the weak host model for that including some of the services we run and the synproxy functionality we use as part of DDoS protection:

https://github.com/GrapheneOS/infrastructure/blob/3b1c43d29fa4132cf44ac33126d25bbfe31187c9/nftables/nftables-ns1.conf#L35-L38

ryrona2 commented 5 months ago

@alfred6427 @thestinger There is something in the routing rules about tagged packets, so if either of you know where the eBPF code is, I would appreciate if you could give me a hint. I cannot promise I will get any time to look into this soon though, so anyone else is free to pick up the task. I don't know a think about eBPF, even if I know iptables/nftables well, so it would probably be faster for someone else to find the issue anyway. Or maybe we should just report it upstream.

alfred6427 commented 4 months ago

@ryrona2 I don't know much about eBPF or where the code would be. I think they are typically .bp extension.

thestinger commented 1 month ago

We have a partial fix for this implemented. It blocks apps sending or receiving multicast packets. However, it doesn't yet block the kernel generated IGMP packets triggered by apps. There may also be kernel generated MLD packets for IPv6.

It's not merged yet since some tweaks may be required. This issue will be closed when it's landed and we'll open a new issue for the remaining IGMP and potentially also MLD issue. We can probably have a single issue for both ICMP and MLD if applicable since the fix likely wouldn't be specific to one or the other.

thestinger commented 2 weeks ago

This is now fully fixed by the combination of https://github.com/GrapheneOS/platform_packages_modules_Connectivity/commit/615c33e677bd19ee023178e4aab11c43989123c7 and https://github.com/GrapheneOS/platform_system_netd/commit/61811e6b628b5183375a516ab4328edb2393b29b.

thestinger commented 2 weeks ago

Fixing this was much more involved than we had expected. We needed eBPF enforcement to address what was reported here but we discovered other issues requiring more complexity for the eBPF enforcement. We also discovered an issue where users could send multicast via each other's VPN which we had to address via iptables since it wasn't clear how to do that via eBPF. It's a lot more code than anticipated but it's not at all invasive and should hopefully be easy to keep maintained and ported to new versions. We're not going to try to upstream it in the near future.

thestinger commented 2 weeks ago

These are the relevant release notes:

This shouldn't be able to cause any compatibility issues as we experienced with DNS leak blocking. We need to revisit the DNS leak blocking to make it stricter while avoiding the app compatibility issues caused by our initial approach next.

thestinger commented 1 week ago

This caused minor app compatibility issues we can likely easily resolve and unfortunately major carrier/network compatibility issues which weren't reported during ~20 hours of Beta testing so we need to rush out a new release reverting these changes and we have a big support issue to deal with. This is unfortunately likely going to be delayed until after Android 15 and we're going to have to be very cautious about shipping it.

thestinger commented 3 days ago

Will be resolved by the next release and hopefully we don't need to roll it back again.

no-usernames-left commented 2 days ago

issues which weren't reported during ~20 hours of Beta testing

With the greatest of respect, this is not a sufficient period for beta testing, and this example is proof.

Beta testing should be AT LEAST a week to catch the edge cases. There is almost never a rush to release, and if there is then a hotfix can be pushed with one tightly-scoped patch, with everything else waiting for the next release.

thestinger commented 2 days ago

Beta testing should be AT LEAST a week to catch the edge cases.

We can't do that for this, we have lots of security patches to release regularly and other important fixes.

thestinger commented 2 days ago

There is almost never a rush to release, and if there is then a hotfix can be pushed with one tightly-scoped patch, with everything else waiting for the next release.

The release had a very important sandboxed Google Play compatibility fix and security patches. We do not have the resources to do a huge number of OS releases. We already do a large number of app and OS releases especially considering we are building separate releases for a total of 18 devices for the main releases right now, which will drop to 15 devices with Android 15 with 5th gen being legacy extended support on a legacy Android 14 branch.

thestinger commented 2 days ago

The new release is in alpha now.

no-usernames-left commented 2 days ago

Beta testing should be AT LEAST a week to catch the edge cases.

We can't do that for this, we have lots of security patches to release regularly and other important fixes.

It is my opinion that new and untested functionality/refactoring should not be released with the same urgency as security fixes.

The release had a very important sandboxed Google Play compatibility fix and security patches.

Then the release could (and, in hindsight, should) have included only those, and the VPN refactoring should have been held back for the next release, where it would have benefitted from a few days of beta testing, unhurried by unrelated higher-priority items which you'd have already released.

thestinger commented 2 days ago

The multicast leak blocking improvements are security patches from our perspective. The initial release with them also had various other Linux kernel security patches since that's a never ending stream of patches. Most of our releases contain security patches due to the Linux kernel having an endless stream of security patches.

The fact that it took so long is because it was very difficult to implement a solution and then got delayed by compatibility issues with networks and apps. It was one of our highest priorities since it was reported, but we initially didn't have someone to work on it. We hired someone who spent a lot of their initial time working on both the DNS and multicast leaks alongside 2-factor fingerprint authentication which they had largely completed as a volunteer before we hired them.

This is the solution to what was reported here and additional issues we discovered along the way since apps can trigger the kernel to send multicast packets via lots of system calls related to multicast, which is difficult to handle:

https://github.com/GrapheneOS/platform_packages_modules_Connectivity/commit/558cc240147744955d3b4d64e959cd76fc673774

We also discovered that mutlicast bypassed the existing prevention of apps using VPN tunnels created by other profiles, which we solved with netfilter despite not expecting that would be part of what we needed to do:

https://github.com/GrapheneOS/platform_system_netd/commit/036d9afd8c3c240fd4ae3a0d2a5059bcaf43fd91

Regardless of the fact that this was very hard to implement and has a lot of subtleties, it's still a fix for what we consider a fairly serious security issue.

thestinger commented 2 days ago

and the VPN refactoring should have been held back for the next release

Issues detected in testing are very rare and we don't want to start expecting failure and therefore having to do many more releases to work around it. We had tested the changes quite a lot ourselves before releasing them.

The apps with compatibility issues with our changes were clearly buggy but we still have to work around buggy apps. It's not a bug in the initial changes which caused the app breakage though.

It is a bug in our changes which caused the IPv6 network compatibility issues, and it's extremely strange that none of the people who experienced it reported it. There were certainly people who had the issue and did not report it. There are a lot of Beta / Alpha testers and a 24 hour total period is more than enough time to report that mobile data completely broke with their carrier. You assume that if we had left it in the testing channels longer we would have gotten reports but that's not necessarily true. We got reports quickly once it was in Alpha but quickly is relative and most people are already updated by around 5 hours after a release.