StreisandEffect / streisand

Streisand sets up a new server running your choice of WireGuard, OpenConnect, OpenSSH, OpenVPN, Shadowsocks, sslh, Stunnel, or a Tor bridge. It also generates custom instructions for all of these services. At the end of the run you are given an HTML file with instructions that can be shared with friends, family members, and fellow activists.
https://twitter.com/streisandvpn
Other
23.17k stars 1.99k forks source link

Iptables lock race with `streisand-openconnect.service` service at boot #948

Closed witte-de-with closed 7 years ago

witte-de-with commented 7 years ago

Expected behavior:

All Streisand-related services to start up after bouncing a VM that has been Streisand-provisioned.

Actual Behavior:

Lately I'm seeing one service failing to start up pretty frequently on boot (this is Ubuntu on Linode) -- see systemctl status output below. When that service has failed, the VPN won't work. If I ssh in and start the service, things work again. Here's the status info:

root@localhost:~# systemctl status streisand-openconnect.service 
● streisand-openconnect.service - LSB: Persist OpenConnect firewall rules for Streisand
   Loaded: loaded (/etc/init.d/streisand-openconnect; bad; vendor preset: enabled)
   Active: failed (Result: exit-code) since Sat 2017-09-16 19:37:35 UTC; 28min ago
     Docs: man:systemd-sysv-generator(8)
  Process: 1206 ExecStart=/etc/init.d/streisand-openconnect start (code=exited, status=4)

Sep 16 19:37:35 localhost systemd[1]: Starting LSB: Persist OpenConnect firewall rules for Streisand...
Sep 16 19:37:35 localhost streisand-openconnect[1206]: Another app is currently holding the xtables lock. Perhaps you want to use the -w option?
Sep 16 19:37:35 localhost systemd[1]: streisand-openconnect.service: Control process exited, code=exited status=4
Sep 16 19:37:35 localhost systemd[1]: Failed to start LSB: Persist OpenConnect firewall rules for Streisand.
Sep 16 19:37:35 localhost systemd[1]: streisand-openconnect.service: Unit entered failed state.
Sep 16 19:37:35 localhost systemd[1]: streisand-openconnect.service: Failed with result 'exit-code'.

Steps to Reproduce:

Provision a Linode Streisand instance, then reboot the instance. The service above will fail to start.

cpu commented 7 years ago

Hi @i-s-o-g-r-a-m - thanks for taking the time to file an issue.

When that service has failed, the VPN won't work.

To be 100% clear - when you say VPN here you mean an OpenConnect VPN connection fails to work? I suspect that's the case but its best to confirm :-)

witte-de-with commented 7 years ago

@cpu ah, yeah I should clarify that a bit: so I can still make a successful VPN connection from my laptop (using Viscosity on macOS), but then no network traffic will seem to make it through -- I can't talk to the world -- until disconnecting again from the Streisand-based VPN. If I go and start the service manually after ssh'ing in, everything is back to working as expected. But again: this doesn't prevent a successful VPN connection -- it just prevents me from actually sending traffic over the VPN once connected. It's weird because I only started to see this recently. It may be that it just happens sometimes and not other times if it is a race-y thing. I happen to reboot the VPS that the VPN is hosted on quite a bit -- maybe others do not do this, so don't see the problem. Thanks!

cpu commented 7 years ago

@i-s-o-g-r-a-m Thanks for clarifying. That definitely sounds like the behaviour I would expect if the problem was with the firewall rules. Looking at the streisand-openconnect.service it's doing nothing other than applying a firewall rule. I also notice in the journalctl output that you shared there's an iptables error:

Another app is currently holding the xtables lock. Perhaps you want to use the -w option?

I think that -w option is exactly what we need as a quick fix. The manual says:

       -w, --wait [seconds]
              Wait  for  the xtables lock.  To prevent multiple instances of the program from running concurrently, an attempt will
              be made to obtain an exclusive lock at launch.  By default, the program will exit if the  lock  cannot  be  obtained.
              This  option  will  make  the  program  wait  (indefinitely  or for optional seconds) until the exclusive lock can be
              obtained.

I think you're right that this is a race-y thing. There's a couple other services (openvpn, l2tp-ipsec, and wireguard) that also setup iptables rules at boot. Long term there's probably a better way to manage a system-wide firewall than with a combination of ufw and iptables thrown through /etc/init.d. I opened https://github.com/jlund/streisand/issues/950 to track that larger effort.

In the mean time I'll add -w across the iptables invocations in the services and we can see if that prevents the race from happening on your system.

cpu commented 7 years ago

I have a PR up to add --wait to all of the iptables invocations: https://github.com/jlund/streisand/pull/951 I would be interested in hearing if this branch prevents any further issues with openconnect for you after reboots.

witte-de-with commented 7 years ago

@cpu Thank you! Looks like you got this sorted out, and some folks have already tested it. Let me know if I can help any further with testing, etc. Thanks again for your work on Streisand 👍

cpu commented 7 years ago

@i-s-o-g-r-a-m Thanks! It's merged to master now. Please let me know if you run into this bug again now that we believe its fixed. I appreciate the bug report!

witte-de-with commented 7 years ago

@cpu will do, thanks again!