Closed robmukai closed 7 years ago
This message is printed by code recently added to openfortivpn (74dc069):
DEBUG: waitpid: pppd exit status code 16
According to the pppd documentation it means:
The link was terminated because the peer is not responding to echo requests.
For some reason pppd is not able to reach its peer - the gateway. Could be a pppd error in the worst case, or problems with the VPN tunnel itself.
I really don't know how a microwave connection works. Is this a little bit like Wi-Fi, where the connection could be reset, resulting for example in a new DHCP lease? If so, could you check the logs of the microwave connection and find whether something happened when pppd failed?
@DimitriPapadopoulos The microwave connection is pretty transparent. The antenna connects directly into my wifi router. There really isn't anything on my end to look at. Looking at the router logs, there isn't anything that jumps out.
If I do restart the openfortivpn, the rsync continues until it hits another random spot, then the vpn dies again with the same message. So it appears that the openfortivpn is somehow losing a connection maybe? Or maybe it is timing out too quickly?
Among possible causes:
I can't help much. Unless some other maintainer can help, I can only suggest:
@DimitriPapadopoulos Thanks for following up.
On your two suggestions
Thanks for your help!
Thank you for trying these suggestions.
Since openforticlientvpn/Ubuntu and FortiClient/Ubuntu share the same behavior, this is probably not an openfortivpn bug - at worst this is a "feature" shared by both clients! More seriously, my gut feeling is that this is related to networking parameters (such as MRU and MTU) - cause 4 in my list of possible causes above. Since FortiClient/Windows in VPN-SSL mode does not share the same behavior and works properly, it could be these networking parameters are properly set on Windows. Could be interesting to investigate network settings on either systems - not sure how to collect these settings out of my head though.
If a direct rsync doesn't work, then that's definitely a network issue you need to debug without VPN. But I believe it will work. By the way it would have been better to rsync to the same server with/without VPN - but that's probably not possible since you need a VPN in the first place! If I understand correctly, the problem is that each additional encapsulation add its own extra payload in packets, which means you may need lower initial MTU/MRU values to leave room for that additional payload. Unfortunately all this happens in network layer 2 (data link), which I'm even less familiar with than layer 3....
You could perhaps try this recipe, where you increase the size of packets sent by ping until it stops working: Troubleshooting MTU size over IPSEC VPN
I'll try playing around with that see if that makes a difference. Thanks for the ideas.
Ok, so I'm not sure what I am looking at but a ping -M do -s 1326 XXX.XXX.XXX.XXX
Gives a good ping PING XXX.XXX.XXX.XXX (XXX.XXX.XXX.XXX) 1326(1354) bytes of data. 1334 bytes from XXX.XXX.XXX.XXX: icmp_seq=1 ttl=63 time=274 ms
ping -M do -s 1327 XXX.XXX.XXX.XXX ping: local error: Message too long, mtu=1354
So what would you suggest I set the MTU on the Wifi Connection at?
I think the MTU is set on the inner encapsulated layer (here that would be pppd?) but again I'm not a specialist. For pppd the MTU can be set in the relevant options file of pppd (probably somewhere in /etc/ppp
) or passed as a parameter to pppd (for that the openfortivpn code would need to be modified to pass proper options to pppd).
So I'd give a try to modifying options using a pppd option file, probably under /etc/ppp
). Unfortunately I don't have time to help much more right now, I don't know how to set options in pppd.
I'd try setting the MTU on the Wi-Fi only if rsync and/or ping fail also without VPN.
@DimitriPapadopoulos I think we can close this. After testing for a day, changing the MTU on the WIFI connection to 1326 makes it work as well as the windows version does on my connection. Which is to say, it still closes, but only randomly and after it has run for a long time. Thanks for your help in thinking this through!
@robmukai Thank you for coming back to us. This will hopefully help other users of the software.
This does look like an issue with your network setup after all, however I'm not 100% certain there's nothing we could to help within openfortivpn - such as adding an option to set MTU for pppd or at least writing a paragraph about MTU in the documentation.
Also, how long is a long time in your case? Please note that there's a default timeout on the FortiGate server - set by default at 8 hours if I recall correctly.
@DimitriPapadopoulos I'm wondering about that. I'd be surprised if Windows 10 handles fragmented packets better than Ubuntu? If that is not the case, Is there something in the way that openfortivpn handles fragmented packets that causes the shut down? Don't know the answer to that, but the work around seems to be working well.
Not sure what a "Long Time" is. I usually run it over night, and it is down when I get to it in the morning. However, large files (as in GB sized files) have been transferred. It does occasionally drop in less than 8 hours as well, but that could be due to instability on the Microwave connection. Is there a way to log uptime on the connection? I'll see what the timeout is set for on the FortiGate. Also, is there a reason for the default timeout on the Fortigate?
@robmukai I doubt Ubuntu cannot handle fragmented packets as well as Windows. I've read in some of the web pages I've read these last days that fragmented packets may be dropped by firewalls because they are a security issue (DoS) - in this case the Fortigate could drop the fragmented packets.
It could just be that the MTU is set correctly on Windows but not Ubuntu. Perhaps because there's some sort of driver for the microwave link on the Windows machine - which could perhaps properly set the MTU at 1326.
About the timeout, it's best to have a look at the logs FortiGate-side and check whether it shows a reason for the connection closing. Perhaps Forti support can help. Also ask them the rationale behind the FortiGate-side timeout.
@DimitriPapadopoulos So I ran a quick check on the Windows 10 box and get this:
netsh interface ipv4 show subinterfaces
MTU MediaSenseState Bytes In Bytes Out Interface
1354 1 3372 35244 fortissl 4294967295 1 1304 46329 Loopback Pseudo-Interface 1 1500 5 0 0 Ethernet 4 1500 1 13897964 1967937 Wi-Fi 3 1500 5 0 0 Ethernet 5 1500 5 0 0 Local Area Connection* 18
So the connection for the Forticlient MTU is 1354 (less 28 is 1326) So somewhere, it is setting the MTU correctly in windows. Not sure if it is the forticlient or windows itself doing it. You'll notice that the Wi-Fi connection is at 1500.
The microwave connection is completely transparent to the machines downstream. I actually run a small Inn and my guests don't have to do anything special to connect, and they bring all manner of devices from windows, apple, android, etc. I don't have any drivers or anything installed for it.
I'll keep an eye on the connecttion. If I can find the time it drops out, I can have my buddy, who owns the fortigate, check the logs to see what the fortigate sees at the time of disconnect.
Thanks so much for your help!
@robmukai Great to have all this information, it will help debug future issues.
For future reference, let's recap what we know so far:
Input/output error
.fortissl
interface (1354 instead of usual 1500), and you experience no errors with FortiClient/Windows.What I don't know is whether MTU should always be set to a value lower than 1500, or only sometimes depending on MTU values along the path.
Also should MTU be set to a constant value, and if so which one, or variable values depending on MTU values along the path? In the latter case, how to discover MTU values along the path?
Other sources refer to setting MSS, not MTU.
On Linux there are tools to discover MTU values along the path. See for example tracepath. I have also read MTU woes in IPsec tunnels and how you can fix it and Path MTU discovery in practice and although I don't have time to really understand it, setting MTU does not seem that a robust technique after all...
Some links:
@DimitriPapadopoulos I agree with all 5 of your bullet points. Unfortunately, your questions go beyond my abilities. I guess my thought would be to see if we can figure out how FortiClient/Windows figures out the MTU and how it lowers it on the tunnel.
@robmukai For what it's worth, I've just looked up MTU values of the different interfaces on Ubuntu 16.04 LTS :
wlp2s0b1 Link encap:Ethernet HWaddr xx:xx:xx:xx:xx:xx
inet addr:xxx.xxx.x.xx Bcast:xxx.xxx.x.xxx Mask:255.255.255.0
inet6 addr: xxxx::xxxx:xxxx:xxxx:xxxx/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
ppp0 Link encap:Point-to-Point Protocol
inet addr:xxx.xxx.xx.x P-t-P:1.1.1.1 Mask:255.255.255.255
UP POINTOPOINT RUNNING NOARP MULTICAST MTU:1354 Metric:1
The MTU of the PPP connection is set to 1354 automatically. I haven't had to force or specify anything here. Isn't that the case when you run openfortivpn?
@DimitriPapadopoulos Ok this is really weird. So I reset my MTU on the wifi conntection back to "auto" and now the ppp0 connections is showing an MTU:1354 as well. So funny thing, it works without changing the MTU now. Not sure what would have caused it to not work before?
Strange indeed. As far as i can see, openfortivpn does not set the MTU. It has to be handled by pppd.
From the PPPD(8) man page:
mtu n Set the MTU [Maximum Transmit Unit] value to n. Unless the peer requests a smaller value via MRU negotiation, pppd will request that the kernel networking code send data packets of no more than n bytes through the PPP network interface. Note that for the IPv6 protocol, the MTU must be at least 1280.
My guess is that this was a problem with path MTU discovery over pppd. Sometimes MTU discovery doesn't work because of poorly configured “security” appliances. Perhaps something changed along the path?
If MTU discovery does not work as expected, users should probably work around the issue in the software responsible for MTU discovery, namely pppd as far as I can tell.
So one answer might be that this is not openfortivpn issue.
On the other hand openfortivpn could have an option to force MTU, that would simply be passed to pppd as option mtu.
For the record the MRU is actually set by openfortivpn to 1354:
char *args[] = {
"/usr/sbin/pppd", "38400", "noipdefault", "noaccomp",
"noauth", "default-asyncmap", "nopcomp", "receive-all",
"nodefaultroute", ":1.1.1.1", "nodetach",
"lcp-max-configure", "40", "mru", "1354",
NULL, NULL, NULL, NULL,
NULL, NULL, NULL, NULL,
NULL
};
This a mystery! I'll close the issue for now, but do not hesitate to come back to us if needed.
Iḿ using Fedora 26. I installed version 1.5.0
Iḿ getting this issue: pppd: The link was terminated because the peer is not responding to echo requests.
What can I do ?
Two things you can do :-)
This may be similar to #154 so if it is please close it. I am running on Ubuntu 16.04.3. Latest version of openfortivpn compiled from source. I run a backup over SSH through the openfortvpn to a box behind a fortigate. I am in a pretty remote location and my internet is over a microwave connection. Although the signal is usually pretty good. Also, I can run this backup on a Windows 10 machine using the Forticlient from Fortinet. It does disconnect on occasion however more randomly.
What happens is, I can connect to the Fortigate through the openfortivpn just fine. I can also start the Rsync process just fine. However at the same point in the backup for each directory, it seems to "hang", and the openfortivpn closes. I back up a few different directories, and all the directories will "hang" this way. This happens using different source hard drives, and different destination hard drives.
Here is the end of the session: ` DEBUG: pppd ---> gateway (201 bytes) pppd: 00 21 45 00 00 c7 7e 51 40 00 01 11 ff d9 0a 00 01 01 ef ff ff fa cf ee 07 6c 00 b3 68 54 4d 2d 53 45 41 52 43 48 20 2a 20 48 54 54 50 2f 31 2e 31 0d 0a 48 4f 53 54 3a 20 32 33 39 2e 32 35 35 2e 32 35 35 2e 32 35 30 3a 31 39 30 30 0d 0a 4d 41 4e 3a 20 22 73 73 64 70 3a 64 69 73 63 6f 76 65 72 22 0d 0a 4d 58 3a 20 31 0d 0a 53 54 3a 20 75 72 6e 3a 64 69 61 6c 2d 6d 75 6c 74 69 73 63 72 65 65 6e 2d 6f 72 67 3a 73 65 72 76 69 63 65 3a 64 69 61 6c 3a 31 0d 0a 55 53 45 52 2d 41 47 45 4e 54 3a 20 47 6f 6f 67 6c 65 20 43 68 72 6f 6d 65 2f 36 31 2e 30 2e 33 31 36 33 2e 39 31 20 4c 69 6e 75 78 0d 0a 0d 0a
DEBUG: pppd ---> gateway (201 bytes) pppd: 00 21 45 00 00 c7 7e b7 40 00 01 11 ff 73 0a 00 01 01 ef ff ff fa cf ee 07 6c 00 b3 68 54 4d 2d 53 45 41 52 43 48 20 2a 20 48 54 54 50 2f 31 2e 31 0d 0a 48 4f 53 54 3a 20 32 33 39 2e 32 35 35 2e 32 35 35 2e 32 35 30 3a 31 39 30 30 0d 0a 4d 41 4e 3a 20 22 73 73 64 70 3a 64 69 73 63 6f 76 65 72 22 0d 0a 4d 58 3a 20 31 0d 0a 53 54 3a 20 75 72 6e 3a 64 69 61 6c 2d 6d 75 6c 74 69 73 63 72 65 65 6e 2d 6f 72 67 3a 73 65 72 76 69 63 65 3a 64 69 61 6c 3a 31 0d 0a 55 53 45 52 2d 41 47 45 4e 54 3a 20 47 6f 6f 67 6c 65 20 43 68 72 6f 6d 65 2f 36 31 2e 30 2e 33 31 36 33 2e 39 31 20 4c 69 6e 75 78 0d 0a 0d 0a
DEBUG: pppd ---> gateway (25 bytes) pppd: c0 21 05 02 00 17 50 65 65 72 20 6e 6f 74 20 72 65 73 70 6f 6e 64 69 6e 67
DEBUG: pppd ---> gateway (25 bytes) pppd: c0 21 05 03 00 17 50 65 65 72 20 6e 6f 74 20 72 65 73 70 6f 6e 64 69 6e 67
ERROR: read: Input/output error INFO: Cancelling threads... INFO: Setting ppp interface down. INFO: Restoring routes... DEBUG: ip route del to XX.XXX.XXX.XXX/255.255.255.255 via XXX.XXX.X.X dev wlp2s0 INFO: Removing VPN nameservers... DEBUG: Waiting for pppd to exit... DEBUG: waitpid: pppd exit status code 16 INFO: Terminated pppd. INFO: Closed connection to gateway. DEBUG: Gateway certificate validation failed. DEBUG: Gateway certificate digest found in white list. INFO: Logged out. `
The last pppd message is:
À!Peer not responding
The ERROR: read: Input/output error is the same as #154 , but the cause is different. Any and all help is appreciated. I am willing to do any testing that may be required.