abrasive / nxBender

Open source client for netExtender SSL VPNs
BSD 3-Clause "New" or "Revised" License
41 stars 20 forks source link

Connection drops fairly quickly with "Broken pipe" #11

Open tomkcook opened 4 years ago

tomkcook commented 4 years ago

This appears to be related to the latter comments on #9. When I start a VPN connection with nxBender and then try to move some traffic over it, I get:

INFO     Logging in...
DEBUG    Starting new HTTPS connection (1): censored.com:4433
send: 'POST /cgi-bin/userLogin HTTP/1.1\r\nHost: censored.com:4433\r\nAccept-Encoding: identity\r\nX-NE-pda: true\r\nUser-Agent: SonicWALL NetExtender for Linux 8.6.800\r\nX-NE-SESSIONPROMPT: true\r\nContent-Length: 65\r\nContent-Type: application/x-www-form-urlencoded\r\n\r\nusername=tom.cook&domain=censored&password=censored&login=true'
reply: 'HTTP/1.0 200 OK\r\n'
header: Server: SonicWALL SSLVPN Web Server
header: X-NE-tfresult: 0
header: MC-bookmarks: 1
header: Set-Cookie: swap=MzdmNDhjMjhub2ZyYXJhcA==; path=/;
header: Connection: close
header: Content-Type: text/html; charset=UTF-8
DEBUG    https://censored.com:4433 "POST /cgi-bin/userLogin HTTP/1.1" 200 None
INFO     Starting session...
DEBUG    Resetting dropped connection: censored.com
send: 'GET /cgi-bin/sslvpnclient?launchplatform=mac&neProto=3&supportipv6=no HTTP/1.1\r\nHost: censored.com:4433\r\nAccept-Encoding: identity\r\nX-NE-pda: true\r\nUser-Agent: SonicWALL NetExtender for Linux 8.6.800\r\nCookie: swap=MzdmNDhjMjhub2ZyYXJhcA==\r\n\r\n'
reply: 'HTTP/1.0 200 OK\r\n'
header: Server: SonicWALL SSLVPN Web Server
header: Set-Cookie: swap=37f48c28nofrarap; path=/;
header: Connection: close
header: Content-Type: text/html; charset=UTF-8
DEBUG    https://censored.com:4433 "GET /cgi-bin/sslvpnclient?launchplatform=mac&neProto=3&supportipv6=no HTTP/1.1" 200 None
DEBUG    srv_option 'NELaunchX1.userName' = '"tom.cook"'
DEBUG    srv_option 'NELaunchX1.domainName' = '"LocalDomain"'
DEBUG    srv_option 'SessionId' = 'QkMO6MFoLUdjNiCNLyakRw=='
DEBUG    srv_option 'Route' = 'censored'
DEBUG    srv_option 'Route' = 'censored'
DEBUG    srv_option 'Route' = 'censored'
DEBUG    srv_option 'Route' = 'censored'
DEBUG    srv_option 'Route' = 'censored'
DEBUG    srv_option 'Route' = 'censored'
DEBUG    srv_option 'dns1' = 'censored'
DEBUG    srv_option 'dns2' = 'censored'
DEBUG    srv_option 'ipv6Support' = 'no'
DEBUG    srv_option 'dnsSuffix' = 'censored.com'
DEBUG    srv_option 'dnsSuffixes' = 'censored.com'
INFO     Duplicated srv_options value dnsSuffixes = censored.local
DEBUG    srv_option 'dnsSuffixes' = 'censored.local'
INFO     Duplicated srv_options value dnsSuffixes = censored.net
DEBUG    srv_option 'dnsSuffixes' = 'censored.net'
INFO     Duplicated srv_options value dnsSuffixes = censored.local
DEBUG    srv_option 'dnsSuffixes' = 'censored.local'
INFO     Duplicated srv_options value dnsSuffixes = censored.local
DEBUG    srv_option 'dnsSuffixes' = 'censored.local'
DEBUG    srv_option 'pppFrameEncoded' = '0'
DEBUG    srv_option 'PppPref' = 'async'
DEBUG    srv_option 'TunnelAllMode' = '0'
DEBUG    srv_option 'ExitAfterDisconnect' = '0'
DEBUG    srv_option 'UninstallAfterExit' = '0'
DEBUG    srv_option 'NoProfileCreate' = '0'
DEBUG    srv_option 'AllowSavePassword' = '1'
DEBUG    srv_option 'AllowSaveUser' = '1'
DEBUG    srv_option 'AllowSavePasswordInKeychain' = '1'
DEBUG    srv_option 'AllowSavePasswordInKeystore' = '1'
DEBUG    srv_option 'ClientIPLower' = '"192.168.40.100"'
DEBUG    srv_option 'ClientIPHigh' = '"192.168.40.149"'
INFO     Logging out ...
DEBUG    Resetting dropped connection: censored.com
send: 'GET /cgi-bin/userLogout HTTP/1.1\r\nHost: censored.com:4433\r\nAccept-Encoding: identity\r\nX-NE-pda: true\r\nUser-Agent: SonicWALL NetExtender for Linux 8.6.800\r\nCookie: swap=MzdmNDhjMjhub2ZyYXJhcA==\r\n\r\n'
reply: ''
INFO     Dialing up tunnel...
INFO     Remote routing configured, VPN is up
ERROR    Broken pipe
INFO     Shutting down...

This appears to be related to the amount of traffic carried, not the time the connection has been up. It definitely carries traffic for a bit, and can stay up for some minutes if no traffic goes across it, but if I try to copy a large file across it lasts perhaps ten seconds.

Happy to test any changes or provide further information if requested.

tomkcook commented 4 years ago

Here's a sample syslog where this occurred:

May 27 08:40:04 localhost pppd[14034]: pppd options in effect:
May 27 08:40:04 localhost pppd[14034]: debug debug#011#011# (from command line)
May 27 08:40:04 localhost pppd[14034]: logfd 2#011#011# (from command line)
May 27 08:40:04 localhost pppd[14034]: ktune#011#011# (from command line)
May 27 08:40:04 localhost pppd[14034]: dump#011#011# (from command line)
May 27 08:40:04 localhost pppd[14034]: noauth#011#011# (from command line)
May 27 08:40:04 localhost pppd[14034]: lock#011#011# (from /etc/ppp/options)
May 27 08:40:04 localhost pppd[14034]: crtscts#011#011# (from /etc/ppp/options)
May 27 08:40:04 localhost pppd[14034]: local#011#011# (from command line)
May 27 08:40:04 localhost pppd[14034]: noaccomp#011#011# (from command line)
May 27 08:40:04 localhost pppd[14034]: asyncmap 0#011#011# (from /etc/ppp/options)
May 27 08:40:04 localhost pppd[14034]: mru 1280#011#011# (from command line)
May 27 08:40:04 localhost pppd[14034]: mtu 1280#011#011# (from command line)
May 27 08:40:04 localhost pppd[14034]: nopcomp#011#011# (from command line)
May 27 08:40:04 localhost systemd-udevd[760]: Network interface NamePolicy= disabled on kernel command line, ignoring.
May 27 08:40:04 localhost pppd[14034]: lcp-echo-failure 2#011#011# (from command line)
May 27 08:40:04 localhost pppd[14034]: lcp-echo-interval 10#011#011# (from command line)
May 27 08:40:04 localhost pppd[14034]: hide-password#011#011# (from /etc/ppp/options)
May 27 08:40:04 localhost pppd[14034]: novj#011#011# (from command line)
May 27 08:40:04 localhost pppd[14034]: noipdefault#011#011# (from command line)
May 27 08:40:04 localhost pppd[14034]: usepeerdns#011#011# (from command line)
May 27 08:40:04 localhost pppd[14034]: noccp#011#011# (from command line)
May 27 08:40:04 localhost pppd[14034]: noipx#011#011# (from /etc/ppp/options)
May 27 08:40:04 localhost pppd[14034]: pppd 2.4.7 started by ubuntu, uid 0
May 27 08:40:04 localhost pppd[14034]: using channel 7
May 27 08:40:04 localhost pppd[14034]: Using interface ppp0
May 27 08:40:04 localhost pppd[14034]: Connect: ppp0 <--> /dev/pts/1
May 27 08:40:04 localhost pppd[14034]: sent [LCP ConfReq id=0x1 <mru 1280> <asyncmap 0x0> <magic 0x21dea40>]
May 27 08:40:04 localhost pppd[14034]: sent [LCP ConfReq id=0x1 <mru 1280> <asyncmap 0x0> <magic 0x21dea40>]
May 27 08:40:04 localhost systemd-udevd[14035]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
May 27 08:40:04 localhost networkd-dispatcher[1319]: WARNING:Unknown index 19 seen, reloading interface list
May 27 08:40:04 localhost dnsmasq[25771]: reading /etc/resolv.conf
May 27 08:40:04 localhost dnsmasq[25771]: using nameserver 127.0.0.53#53
May 27 08:40:04 localhost dnsmasq[25771]: reading /etc/resolv.conf
May 27 08:40:04 localhost dnsmasq[25771]: using nameserver 127.0.0.53#53
May 27 08:40:04 localhost dnsmasq[25760]: reading /etc/resolv.conf
May 27 08:40:04 localhost dnsmasq[25760]: using nameserver 127.0.0.53#53
May 27 08:40:04 localhost pppd[14034]: rcvd [LCP ConfReq id=0x1 <asyncmap 0x0> <magic 0x827f31d4>]
May 27 08:40:04 localhost pppd[14034]: sent [LCP ConfAck id=0x1 <asyncmap 0x0> <magic 0x827f31d4>]
May 27 08:40:04 localhost pppd[14034]: rcvd [LCP ConfAck id=0x1 <mru 1280> <asyncmap 0x0> <magic 0x21dea40>]
May 27 08:40:04 localhost pppd[14034]: sent [LCP EchoReq id=0x0 magic=0x21dea40]
May 27 08:40:04 localhost pppd[14034]: sent [IPCP ConfReq id=0x1 <addr 0.0.0.0> <ms-dns1 0.0.0.0> <ms-dns2 0.0.0.0>]
May 27 08:40:04 localhost pppd[14034]: rcvd [IPCP ConfReq id=0x1 <addr 192.0.2.1> <ms-dns1 0.0.0.0> <ms-dns2 0.0.0.0>]
May 27 08:40:04 localhost pppd[14034]: sent [IPCP ConfRej id=0x1 <ms-dns1 0.0.0.0> <ms-dns2 0.0.0.0>]
May 27 08:40:04 localhost pppd[14034]: rcvd [CCP ConfReq id=0x1 <mppe -H -M -S -L -D +C>]
May 27 08:40:04 localhost pppd[14034]: Unsupported protocol 'Compression Control Protocol' (0x80fd) received
May 27 08:40:04 localhost pppd[14034]: sent [LCP ProtRej id=0x2 80 fd 01 01 00 0a 12 06 00 00 00 01]
May 27 08:40:04 localhost pppd[14034]: rcvd [LCP EchoRep id=0x0 magic=0x827f31d4]
May 27 08:40:04 localhost pppd[14034]: rcvd [IPCP ConfNak id=0x1 <addr 192.168.40.102>]
May 27 08:40:04 localhost pppd[14034]: sent [IPCP ConfReq id=0x2 <addr 192.168.40.102> <ms-dns1 0.0.0.0> <ms-dns2 0.0.0.0>]
May 27 08:40:04 localhost pppd[14034]: rcvd [IPCP ConfReq id=0x2 <addr 192.0.2.1>]
May 27 08:40:04 localhost pppd[14034]: sent [IPCP ConfAck id=0x2 <addr 192.0.2.1>]
May 27 08:40:04 localhost pppd[14034]: rcvd [IPCP ConfAck id=0x2 <addr 192.168.40.102> <ms-dns1 0.0.0.0> <ms-dns2 0.0.0.0>]
May 27 08:40:04 localhost systemd-networkd[1198]: ppp0: Gained carrier
May 27 08:40:04 localhost pppd[14034]: local  IP address 192.168.40.102
May 27 08:40:04 localhost pppd[14034]: remote IP address 192.0.2.1
May 27 08:40:04 localhost pppd[14034]: Script /etc/ppp/ip-up started (pid 14041)
May 27 08:40:04 localhost dnsmasq[25771]: reading /etc/resolv.conf
May 27 08:40:04 localhost dnsmasq[25771]: using nameserver 127.0.0.53#53
May 27 08:40:04 localhost dnsmasq[25771]: reading /etc/resolv.conf
May 27 08:40:04 localhost dnsmasq[25771]: using nameserver 127.0.0.53#53
May 27 08:40:04 localhost dnsmasq[25760]: reading /etc/resolv.conf
May 27 08:40:04 localhost dnsmasq[25760]: using nameserver 127.0.0.53#53
May 27 08:40:04 localhost dnsmasq[25760]: reading /etc/resolv.conf
May 27 08:40:04 localhost dnsmasq[25760]: using nameserver 127.0.0.53#53
May 27 08:40:04 localhost pppd[14034]: Script /etc/ppp/ip-up finished (pid 14041), status = 0x0
May 27 08:40:14 localhost pppd[14034]: sent [LCP EchoReq id=0x1 magic=0x21dea40]
May 27 08:40:14 localhost pppd[14034]: rcvd [LCP EchoRep id=0x1 magic=0x827f31d4]
May 27 08:40:24 localhost pppd[14034]: sent [LCP EchoReq id=0x2 magic=0x21dea40]
May 27 08:40:24 localhost pppd[14034]: rcvd [LCP EchoRep id=0x2 magic=0x827f31d4]
May 27 08:40:34 localhost pppd[14034]: sent [LCP EchoReq id=0x3 magic=0x21dea40]
May 27 08:40:34 localhost pppd[14034]: rcvd [LCP EchoRep id=0x3 magic=0x827f31d4]
May 27 08:40:44 localhost pppd[14034]: sent [LCP EchoReq id=0x4 magic=0x21dea40]
May 27 08:41:04 localhost pppd[14034]: sent [LCP EchoReq id=0x6 magic=0x21dea40]
May 27 08:41:04 localhost pppd[14034]: Hangup (SIGHUP)
May 27 08:41:04 localhost systemd-networkd[1198]: ppp0: Lost carrier
May 27 08:41:04 localhost pppd[14034]: Connect time 1.0 minutes.
May 27 08:41:04 localhost pppd[14034]: Sent 13288 bytes, received 5769 bytes.
May 27 08:41:04 localhost dnsmasq[25771]: reading /etc/resolv.conf
May 27 08:41:04 localhost dnsmasq[25760]: reading /etc/resolv.conf
May 27 08:41:04 localhost dnsmasq[25771]: using nameserver 127.0.0.53#53
May 27 08:41:04 localhost dnsmasq[25760]: using nameserver 127.0.0.53#53
May 27 08:41:04 localhost pppd[14034]: Script /etc/ppp/ip-down started (pid 14102)
May 27 08:41:04 localhost pppd[14034]: sent [LCP TermReq id=0x3 "User request"]
May 27 08:41:04 localhost pppd[14034]: Script /etc/ppp/ip-down finished (pid 14102), status = 0x0
May 27 08:41:07 localhost pppd[14034]: sent [LCP TermReq id=0x4 "User request"]
May 27 08:41:10 localhost pppd[14034]: Connection terminated.
May 27 08:41:10 localhost dnsmasq[25771]: reading /etc/resolv.conf
May 27 08:41:10 localhost dnsmasq[25771]: using nameserver 127.0.0.53#53
May 27 08:41:10 localhost dnsmasq[25760]: reading /etc/resolv.conf
May 27 08:41:10 localhost dnsmasq[25760]: using nameserver 127.0.0.53#53
May 27 08:41:10 localhost pppd[14034]: Modem hangup
May 27 08:41:10 localhost pppd[14034]: Exit.
richardash1981 commented 4 years ago

I was getting something rather like this - I could fetch data down the link OK, but sending stuff up (of any size) caused python to go into an infinite loop (100% CPU), which did not transfer pppd session data, and so pppd timed out and the session terminated.

pppd: sent [LCP EchoReq id=0x5 magic=0x868b6c49]
pppd: rcvd [LCP EchoRep id=0x5 magic=0xc6035427]
pppd: sent [LCP EchoReq id=0x6 magic=0x868b6c49]
pppd: sent [LCP EchoReq id=0x7 magic=0x868b6c49]
pppd: No response to 2 echo-requests
pppd: Serial link appears to be disconnected.
pppd: Connect time 1.4 minutes.
pppd: Sent 76165 bytes, received 128951 bytes.
pppd: Script /etc/ppp/ip-down started (pid 14062)
pppd: sent [LCP TermReq id=0x4 "Peer not responding"]
pppd: Script /etc/ppp/ip-down finished (pid 14062), status = 0x0
pppd: sent [LCP TermReq id=0x5 "Peer not responding"]
pppd: Connection terminated.
pppd: Modem hangup
ERROR    pppd exited with code 16

Note two LCP EchoReq sent and no RX. During this time the python process was at 100% CPU spinning the PPPSession::_pump() method but not achieving anything.

In nxbender/sslconn.py, the write_pump() method pushes the data from pppd up over the TLS socket in blocks of maximum 1514 bytes at a time:

             packet = self.wbuf[:1514]
             buf = struct.pack('>L', len(packet)) + packet
             self.s.sendall(buf)
             self.wbuf = self.wbuf[len(packet):]

I noticed that in the ~/.netextender configuration file used by the official binary client, there is a (global) line mtu=1280. So I took a guess and changed the constant in the above code to 1280, so that we send shorter lines to the server (more of them, obviously).

             packet = self.wbuf[:1280]
             buf = struct.pack('>L', len(packet)) + packet
             self.s.sendall(buf)
             self.wbuf = self.wbuf[len(packet):]

This seems to fix the problem - the web page submission which was 100% repeat-ably getting the client into the infinite loop now does not do so, and I seem to have a stable connection.

I don't believe that 1280 is the optimum size here, just that it's small enough to fit down whatever the actual link MTU is. I suspect that is smaller than 1500 (and hence the original setup was having to fragment everything) because my internet connection is Virgin Media cable (DOCSIS) and the MTU is said to be fixed at 1500 bytes ...

This suggests that a 1514 byte payload will produce fragmented packets, a 1280 byte payload will not. If 1500 is the path max (and it seems to be), then we want to limit to 1500 minus the overhead, which should be:

So looks like we need an MTU option with a conservative default value, which users can tune upwards if they care about performance, and their link is capable of larger packets.

tomkcook commented 4 years ago

Yes! It works! Brilliant! Thank you! More exclamation marks!!!!

richardash1981 commented 4 years ago

There are some comments on #15, this is not really MTU or MSS, just a line length thing in the remote server. But capping it downwards does make the connection stable.