Open MixyMisus opened 1 year ago
Your description definitely sounds like a bug (and a long-standing one at that).
I tried to reproduce this with current master, connecting to a (non-local) SOCKs proxy, and then onward to a TCP server. Triggering the two different kind of reconnects with SIGUSR1
or SIGHUP
did not do anything unusual (so it's not totally easily reproduced). Will try timeouting and network errors next.
"Killing the SOCKS proxy" is also not sufficient, this will produce a correct reconnect towards the SOCKS proxy (and as soon as it's back, proper re-establishment). So, network errors between proxy and server, and timeout next...
Mmmh, this is going to be a tougher nut. Tried --keepalive 10 60
on the client, without --ping
on the server, which triggers a timeout restart after 60 seconds without user traffic
2023-08-03 12:26:49 SIGUSR1[soft,ping-restart] received, process restarting
2023-08-03 12:26:49 Restart pause, 1 second(s)
2023-08-03 12:26:50 TCP/UDP: Preserving recently used remote address: [AF_INET]194.97.144.213:1080
2023-08-03 12:26:50 Socket Buffers: R=[65536->65536] S=[32768->32768]
2023-08-03 12:26:50 Attempting to establish TCP connection with [AF_INET]194.97.144.213:1080
2023-08-03 12:26:50 TCP connection established with [AF_INET]194.97.144.213:1080
2023-08-03 12:26:50 TCPv4_CLIENT link local: (not bound)
2023-08-03 12:26:50 TCPv4_CLIENT link remote: [AF_INET]194.97.144.213:1080
2023-08-03 12:26:50 TLS: Initial packet from [AF_INET]194.97.144.213:1080, sid=1af58c31 a8b415b8
... which nicely reconnects to the SOCKs proxy...
@MixyMisus I wonder if we're doing something special to "localhost" here. Can you reproduce if the SOCKS proxy is accessed via the LAN IP (so, still running on the same machine, but no "127.0.0.1" in internal data structures)? Wild guess, though...
Okay, ignore the tests above, this was all about TCP (I misread and tested TCP server, not UDP server) - so I retried with UDP, and I can reproduce this.
It does NOT happen on SIGUSR1 or SIGHUP restarts, but it DOES happen on ping-restart timeouts:
SIGHUP:
2023-08-03 12:32:34 TCP/UDP: Preserving recently used remote address: [AF_INET]194.97.144.213:1080
2023-08-03 12:32:34 Socket Buffers: R=[42080->42080] S=[9216->9216]
2023-08-03 12:32:34 Attempting to establish TCP connection with [AF_INET]194.97.144.213:1080
2023-08-03 12:32:34 TCP connection established with [AF_INET]194.97.144.213:1080
2023-08-03 12:32:34 SOCKS proxy wants us to send UDP to [AF_INET]194.97.144.213:26792
2023-08-03 12:32:34 UDPv4 link local: (not bound)
2023-08-03 12:32:34 UDPv4 link remote: [AF_INET]199.102.77.82:51194
2023-08-03 12:32:34 TLS: Initial packet from [AF_INET]199.102.77.82:51194, sid=a954d6f2 78b6de9a
ping-restart misconnects once, then does a full SIGUSR1 "socks-error" restart, and succeeds
2023-08-03 12:37:31 SIGUSR1[soft,ping-restart] received, process restarting
2023-08-03 12:37:31 Restart pause, 1 second(s)
2023-08-03 12:37:32 TCP/UDP: Preserving recently used remote address: [AF_INET]199.102.77.82:51194
2023-08-03 12:37:32 Socket Buffers: R=[42080->42080] S=[9216->9216]
2023-08-03 12:37:32 Attempting to establish TCP connection with [AF_INET]199.102.77.82:51194
2023-08-03 12:37:32 TCP connection established with [AF_INET]199.102.77.82:51194
2023-08-03 12:37:37 socks_handshake: TCP port read timeout expired: Operation now in progress (errno=36)
2023-08-03 12:37:37 SIGUSR1[soft,socks-error] received, process restarting
2023-08-03 12:37:37 Restart pause, 1 second(s)
2023-08-03 12:37:38 TCP/UDP: Preserving recently used remote address: [AF_INET]194.97.144.213:1080
2023-08-03 12:37:38 Socket Buffers: R=[42080->42080] S=[9216->9216]
2023-08-03 12:37:38 Attempting to establish TCP connection with [AF_INET]194.97.144.213:1080
2023-08-03 12:37:38 TCP connection established with [AF_INET]194.97.144.213:1080
2023-08-03 12:37:38 SOCKS proxy wants us to send UDP to [AF_INET]194.97.144.213:45751
2023-08-03 12:37:38 UDPv4 link local: (not bound)
2023-08-03 12:37:38 UDPv4 link remote: [AF_INET]199.102.77.82:51194
2023-08-03 12:37:39 TLS: Initial packet from [AF_INET]199.102.77.82:51194, sid=a1e70ebb a057c8ad
(not sure why it failed so fast, and takes longer for @MixyMisus to give up and retry)
Summoning @schwabe - do you have an idea? I guess we're caching something somewhere that we shouldn't (easy guess), but why is this reset on normal SIGUSR1 restarts, and not on ping-restart
SIGUSR1?
@cron2 Ha! Caught a real bug! Lucky me!
Upon some thought, I think that I've encountered at least two more odd quirks (possible bugs) in OpenVPN(UDP)+SOCKS5 behavior, and one is probably related here (the other I'll submit separately if it "keeps happening")
So... The relevant possible additional bug here is odd timeout disconnects with SOCKS proxy (have reproduced it with localhost SOCKS5 and with a remote SOCKS on a VPS) and remote VPN servers (which is relevant by virtue of triggering the above bug)
It is most reliably reproducible when socks proxy is part of proxy chain or a Shadowsocks-rust setup (but I've reproduced it with a vanilla server/client hev-socks5-server )
The setup goes for shadowsocks
OpenVPN(UDP)=SOCKS5(sslocal process)=[remote sserver instance]=[remote OpenVPN*]
For HEV I reproduced like in the following manner:
OpenVPN(UDP)=SOCKS5(remote VPS running hev-socks5-server)==[remote OpenVPN*]
*Remote VPN for reproducing can be just any VPN service such a Mullvad VPN service or a free tier Proton VPN server
After some time the connection will timeout, triggering a reset and thus by extension the originally reported bug.
MEANWHILE an OpenVPN instance that connects to the remote server without SOCKS, a-la
OpenVPN(UDP)=[remote OpenVPN*]
runs just fine for hours and hours and hours (both test boxes share same wifi, so it's unlikely to be a network problem)
Hi! Just found this thing via googling around and reporting that 1) it reproduces on all my OpenVPN installs and 2) it is somewhat disruptive to my stuff. I understand there are higher priority concerns and this is open source product so like, no hard feelings on my part, just leaving my two cents here hoping this bug will be picked up and fixed one day
Thanks :)
Describe the bug
I am running OpenVPN through a SOCKS proxy (UDP mode). The connection is established normally and runs okay for minutes to hours. Periodically the connection tears down with timeout (I am fully confident it is not a network issue) and then OpenVPN attempts restart at which point it hangs for minutes. After running it with high verbosity logs I found out that while the "hang" part consists of two stages
stage 1: Logs do not show anomalies however the software communicating via OpenVPN's tun0 loses connectivity (timeout)
stage 2
For some reason, OpenVPN attempts to reconnect via TCP to the remote OpenVPN server (NOT THE SOCKS PROXY WHICH IS RUNNING ON LOCALHOST)
At this point it can spend minutes (unsuccessfully) trying to reach the remote server directly, finally concludes that network is unreachable and does full restart, which successfully restores connectivity
Here are the captured logs (exclamation marks added by me, interesting part also reproduced in github issue text below)
To Reproduce Start OpenVPN in UDP mode with SOCKS5 enabled (socks-proxy in config, so socks proxy must support UDP. Ideally it should be on localhost since in my setup it is)
Run it for a while, with moderate load.
P.S. Upon further experiments I discovered that the "strange attempt to bypass socks proxy" also happens upon "normal" connectivity loss events (turning off the wifi router to which the machine with VPN client is connected, waiting for timeout then turning on)
Expected behavior Everything to work without weird connectivity losses when there is no good reason for connectivity loss.
If a connectivity loss happens for normal reasons (such as physical connection getting temporarily disconnected) do not perform useless attempt to connect to remote server bypassing the socks proxy and re-connect to socks proxy (which it eventually does but only after very long wait), when socks proxy option is set the re-connect attempt should be performed to the socks proxy (and not to the last remembered remote OpenVPN server as it appears to attempt in log above)
Version information (please complete the following information):
Tested on two following OpenVPN versions (reproduces):
OpenVPN 2.4.7 x86_64-pc-linux-gnu [SSL (OpenSSL)] [LZO] [LZ4] [EPOLL] [PKCS11] [MH/PKTINFO] [AEAD] built on Apr 28 2021 library versions: OpenSSL 1.1.1n 15 Mar 2022, LZO 2.10
OpenVPN 2.6.5 x86_64-pc-linux-gnu [SSL (OpenSSL)] [LZO] [LZ4] [EPOLL] [PKCS11] [MH/PKTINFO] [AEAD] [DCO] library versions: OpenSSL 3.1.1 30 May 2023, LZO 2.10
Additional context Socks capability is why I'm using openvpn in the first place so any suggestion on how to work around this would be extremely welcome