TritonDataCenter / pkgsrc-legacy

Automatically updated conversion of the "pkgsrc" module from anoncvs.netbsd.org
http://www.pkgsrc.org
127 stars 64 forks source link

OpenVPN client can NOT reconnect if something goes wrong #442

Open jcea opened 7 years ago

jcea commented 7 years ago

I have been using OpenVPN as a client for ages. One of the recent clients is a SmartOS native zone instance (base-64 16.3.1). If the VPN connection is severed in anyway (timeout, server restart, etc) the SmartOS client can not connect again.

My OpenVPN is just stock "pkg install". Restarting the SMF service does nothing. After some trying I just restart the zone. Everything is fine after that.

Checking the logfile I see this:

[...]
Wed Dec 14 15:48:31 2016 open_tun: got dynamic interface 'tun0'
Wed Dec 14 15:48:31 2016 Can't set PPA 0: File exists (errno=17)
Wed Dec 14 15:48:31 2016 Exiting due to fatal error
[ Dec 14 15:48:31 Stopping because all processes in service exited. ]

The SMF service will restart the OpenVPN client that cycle of connection & crashing until I restart the zone.

I use the same configuration (ovpn file) in my iOS devices and multiple Linux machines without any issue.

jperkin commented 7 years ago

I'm pretty sure this is caused by joyent/smartos-live#626 and is an issue in the tun/tap driver rather than the pkgsrc openvpn package. I'll leave this bug open in the meantime though so we have an extra data point and test candidate when that bug is resolved.

jcea commented 7 years ago

Any workaround in the meantime? Current situation is quite painful because it requires manual intervention involving a zone reboot.

jcea commented 7 years ago

Any workaround in the meantime? Current situation is quite painful because it requires manual intervention involving a zone reboot.

jclulow commented 7 years ago

@jcea I make use of the OpenVPN client in a few places. I think I had to add persist-tun to the configuration file to make it work correctly on reconnection.

The full configuration appears below:

client
dev tun
persist-tun
proto udp
resolv-retry infinite
nobind
ca ssl/ca.crt
cert ssl/client.crt
key ssl/client.key
comp-lzo
verb 3
remote-cert-tls server

remote X.X.X.X 1194
jcea commented 7 years ago

@jclulow I already have "persistent-tun" in my configuration. I think the problem is when the server refused the connection because, for instance, disk full. I have a connection validation script in the server and if something goes wrong with the database, for instance, connections are temporary refused. The problem is that this zone OpenVPN client will die... good, but when the OpenVPN SMF restarts the process, it will fail. This situation persists until the zone is rebooted.

Regular reconnections seems to work ok.

@jperkin This issue is not related to interference between zones, I think. I have more native zones in this machine, but no one else is using TUN at the same time.

I have a trivial way to reproduce this problem:

  1. You have a running OpenVPN client running in a native zone.
  2. You locate the PID of OpenVPN client and just kill it with "kill -9". Use "-9" to be sure the process can not do any cleanup.
  3. Now monitor your SMF OpenVPN log and see the process being created, connecting to the OpenVPN server and dying. Looping forever.
YanChii commented 7 years ago

Hi, I believe problem is this:

Can't unlink interface(ip): Not owner (errno=1)

It can be worked around by manually unplumbing the tap interface after stop.

ifconfig tun0 unplumb

Adding it to /opt/local/lib/svc/method/openvpn (before starting the openvpn) works for me. But if you use multiple tun interfaces, you somehow should find out which tunX interface to unplumb.

Jan

cron2 commented 4 years ago

There is an open bug on the OpenVPN side for this as well

https://community.openvpn.net/openvpn/ticket/1078

unfortunately I am totally lost what all these PPA and I_PUSH etc. things do on Solaris - on restart (because a server went away and came back) we do try to clean up the existing tun/tap device, and then re-init from scratch.

Unfortunately, something seems to be missing in "cleaning up", so we get these "EEXIST" errors... help from someone who understands Solaris network stuff better is certainly welcome.