QubesOS / qubes-issues

The Qubes OS Project issue tracker
https://www.qubes-os.org/doc/issue-tracking/
526 stars 46 forks source link

Improve fail-closed behavior in Qubes Firewall #5269

Open strugee opened 4 years ago

strugee commented 4 years ago

Qubes OS version

R4.0

Affected component(s) or functionality

Qubes Firewall

Brief summary

If a VM has DNS names in its firewall configuration, a single entry for which DNS fails to resolve causes all traffic from that VM to be filtered.

To Reproduce

  1. Create a new VM (or do this to an existing VM if you want, but creating a new one is easier)
  2. Set the new VM's network to be filtered with the firewall (i.e. "Limit outgoing connections to..." in the "Firewall rules" tab of the VM settings window)
  3. Add a nonsensical domain to the firewall rules list; I used nonexistant.foobar
  4. Open a terminal in the VM
  5. Run ping 1.1.1.1

(You can also see in sys-firewall's systemd journal that qubes-firewall.service is logging about the failed invalid DNS query and saying it's denying traffic.)

Expected behavior

The ping succeeds.

Actual behavior

ping reports "Packet filtered" even though ICMP is always supposed to work unless you muck with qvm-firewall directly. Note also that if you add a second, valid entry in the firewall list, you can't connect to the address listed in the second, valid entry because of the first, invalid entry causing all traffic to be dropped.

Solutions you've tried

None, besides removing the offending entry.

Relevant documentation you've consulted

https://www.qubes-os.org/doc/firewall/

Related, non-duplicate issues

3641

marmarek commented 4 years ago

This works as intended - if firewall for a given VM fails to load for any reason, traffic is blocked to avoid unintended leaks, mistakes etc. Did you get any notification about the typo? If not, this should be improved.

One thing that could be changed, is to block everything only if "deny" rule fails to load, but if that was only "allow" rule, then VM will have less access, so it should be safe. But if notification doesn't work, it would be even harder to debug what is wrong.

strugee commented 4 years ago

Did you get any notification about the typo? If not, this should be improved.

I did not, except in the firewall logs.

One thing that could be changed, is to block everything only if "deny" rule fails to load, but if that was only "allow" rule, then VM will have less access, so it should be safe.

Right, this is essentially the behavior that this ticket is proposing. I just assumed it was a bug instead of an enhancement :P

If the notification doesn't work, it'll still be hard to debug what's wrong... but I don't think it will be any more difficult than before. It might even be easier because then it would be obvious that just the one host wasn't reachable, so you'd have an idea that maybe you should look at that particular rule in the firewall.

What originally triggered this bug for me (I believe) is that I had cdn-fastly.debian.org plaintext HTTP traffic allowed in my firewall config to let the VM take APT updates, but Debian seems to have stopped using this particular domain name. (Previously deb.debian.org would redirect there - IIRC, if you used plaintext HTTP.) In that case if the firewall had just ignored that failing allow rule, I never would've even noticed a problem because the domain wasn't being used anyway.

strugee commented 4 years ago

It's also worth noting that I've been observing a race condition for many months that I think is explained by this bug. Basically I would start Qubes, observe that several of my VMs didn't have network access, open the VM's settings, and change to the firewall tab and hit "Apply". This usually fixed the networking for the VM. I think the firewall's DNS resolution was failing because the network wasn't fully up yet, and then this problem was triggering. If that's a separate bug, let me know and I'd be happy to file a followup issue.

andrewdavidwong commented 4 years ago

It's also worth noting that I've been observing a race condition for many months that I think is explained by this bug. Basically I would start Qubes, observe that several of my VMs didn't have network access, open the VM's settings, and change to the firewall tab and hit "Apply". This usually fixed the networking for the VM. I think the firewall's DNS resolution was failing because the network wasn't fully up yet, and then this problem was triggering. If that's a separate bug, let me know and I'd be happy to file a followup issue.

This sounds like a separate bug to me.

JarrahG commented 4 years ago

I am also experiencing this bug, except when the firewall is set to "allow all outgoing internet connections" in the Qube Manager GUI. Any invalid DNS based rule, even when disabled causes that VM not to have network connectivity.

3hhh commented 4 years ago

This works as intended - if firewall for a given VM fails to load for any reason, traffic is blocked to avoid unintended leaks, mistakes etc. Did you get any notification about the typo? If not, this should be improved.

One thing that could be changed, is to block everything only if "deny" rule fails to load, but if that was only "allow" rule, then VM will have less access, so it should be safe. But if notification doesn't work, it would be even harder to debug what is wrong.

The notification issue is #3880

PetrVladimirov commented 3 years ago

This works as intended - if firewall for a given VM fails to load for any reason, traffic is blocked to avoid unintended leaks, mistakes etc. Did you get any notification about the typo? If not, this should be improved.

One thing that could be changed, is to block everything only if "deny" rule fails to load, but if that was only "allow" rule, then VM will have less access, so it should be safe. But if notification doesn't work, it would be even harder to debug what is wrong.

@marmarek - thank you for your comment. I agree that if the firewall fails traffic should be blocked, but I would suggest to treat the name resolution error itself as a bug inside the firewall and get it fixed, so the firewall does not fail if it faces with such issue (unresolvable domain name).

On practice it means that if the firewall cannot resolve a domain name from a particular record it should just skip this record (+notify, ideally in the GUI) and not block the other valid records that are resolvable.

There are a few use cases where such approach will be beneficial and I was not able to find any serious security drawback:

v6ak commented 3 years ago

This makes kind of sense for allowing rules (i.e., probably for most of the rules), not for denying rules (though denying rules with DNS name are likely less frequent). And even for allowing rules, it can cause some confusion (i.e., most of the network works, but some part does not), so I am not convinced the benefits outweight the drawbacks. Maybe a better communication of the issue is more important than fault-tolerancy.

PetrVladimirov commented 3 years ago

No doubts that having a proper communications is absolutely critical - e.g. pop-up when rules are being applied + graying out the unresolvable record in the GUI.

If everything works except just a single service/site (as proposed), the first place to go for a standard user would be the firewall tab of the relevant qube. If there is an indication that particular record doesn't work (unresolvable), the user will have to make a decision. E.g. he may decide to leave it for now, reboot qube, change network, DNS provider, etc.

Same time, if he doesn't have time/need to investigate all of the rest in the qube should work as normal. It is a matter of usability, and potentially security deterioration because some users may prefer simplicity ("allow any to any") over complexity of managing/investigating DNS records all the time. Moreover adversary that control DNS-server may use this as a flaw to force a Qubes user to change filtering policy (make it broader up to "allow any to any").

I've reread this thread again and it is still not quite clear for me what are the drawbacks of skipping rules with unresolvable records (with proper communication) and let everything else work as expected. At least for allow rules it should be even more secure (less attack surface).