adrienverge / openfortivpn

Client for PPP+TLS VPN tunnel services
GNU General Public License v3.0
2.64k stars 317 forks source link

sudden delay spikes #643

Open Zappelphilipp opened 4 years ago

Zappelphilipp commented 4 years ago

i dont know if this is already known:

for me everything is working perfectly (even DNS and routing) but every 30sec i get for about 10sec no response for every packet i send (i get like 10 pings back with 8000 to 20000 ms responsetime.)

this means every RDP session etc. is freezing for at least 10sec every 30sec.

tested this on ubuntu and debian. no problems with the official windows and mac based forticlient.

mnsgs commented 4 years ago

Hi @Zappelphilipp I am seeing the same behavior (ubuntu 19.10) - occurs on both this client and the proprietary (FortiClient SSLVPN 4.4.2336). It is driving me nuts :confused:

My colleagues working from Windows do not seem to run into it and it seems to work for most of my other colleagues working from Linux as well (we are a few encountering this).

Zappelphilipp commented 4 years ago

its really weird. I also have to mention that i am using the official repo sources of ubuntu and debian so i am using version 1.10.0-1 BUT i have used openfortivpn on this version already without any issues a few months ago.

DimitriPapadopoulos commented 4 years ago

@Zappelphilipp So what has changed exactly between the moment it worked a few months ago and the moment you started experiencing these freezes every 30s? I understand the version of openfortivpn has not changed.

Is there any way we can easily reproduce that? For example by pinging a machine behind the VPN server?

By the way it's good to know FortiClient shares the same problem: it probably means this is not an openfortivpn issue :smiley: Perhaps it's specific to VPN SSL (always used by Linux clients) as opposed to VPN IPSec (only used by Windows and macOS clients).

DimitriPapadopoulos commented 4 years ago

I cannot reproduce that on my Ubuntu 20.04 workstation. Isn't there anything of interest in the system journal? A change in routing, a DHCP lease, other events?

Zappelphilipp commented 4 years ago

Is there any way we can easily reproduce that? For example by pinging a machine behind the VPN server?

yes, i "debugged" it by pinging a random server behind the firewall/in the network and it looks like this:

64 bytes from 192.168.99.84: icmp_seq=468 ttl=127 time=20.5 ms
64 bytes from 192.168.99.84: icmp_seq=469 ttl=127 time=31.2 ms
64 bytes from 192.168.99.84: icmp_seq=470 ttl=127 time=172 ms
64 bytes from 192.168.99.84: icmp_seq=471 ttl=127 time=25.9 ms
64 bytes from 192.168.99.84: icmp_seq=472 ttl=127 time=32.1 ms
64 bytes from 192.168.99.84: icmp_seq=473 ttl=127 time=21.1 ms
64 bytes from 192.168.99.84: icmp_seq=474 ttl=127 time=19.8 ms
64 bytes from 192.168.99.84: icmp_seq=475 ttl=127 time=21.3 ms
64 bytes from 192.168.99.84: icmp_seq=476 ttl=127 time=19.5 ms
64 bytes from 192.168.99.84: icmp_seq=477 ttl=127 time=24.5 ms
64 bytes from 192.168.99.84: icmp_seq=478 ttl=127 time=43.0 ms
64 bytes from 192.168.99.84: icmp_seq=479 ttl=127 time=190 ms
64 bytes from 192.168.99.84: icmp_seq=480 ttl=127 time=21.7 ms
64 bytes from 192.168.99.84: icmp_seq=481 ttl=127 time=19.9 ms
64 bytes from 192.168.99.84: icmp_seq=482 ttl=127 time=21.9 ms
64 bytes from 192.168.99.84: icmp_seq=483 ttl=127 time=25.8 ms
64 bytes from 192.168.99.84: icmp_seq=484 ttl=127 time=24.0 ms
64 bytes from 192.168.99.84: icmp_seq=485 ttl=127 time=22.1 ms
64 bytes from 192.168.99.84: icmp_seq=486 ttl=127 time=101 ms
64 bytes from 192.168.99.84: icmp_seq=487 ttl=127 time=24.0 ms
64 bytes from 192.168.99.84: icmp_seq=488 ttl=127 time=435 ms
64 bytes from 192.168.99.84: icmp_seq=489 ttl=127 time=22.7 ms
64 bytes from 192.168.99.84: icmp_seq=490 ttl=127 time=30.3 ms
64 bytes from 192.168.99.84: icmp_seq=491 ttl=127 time=159 ms
64 bytes from 192.168.99.84: icmp_seq=492 ttl=127 time=24.4 ms
64 bytes from 192.168.99.84: icmp_seq=493 ttl=127 time=22.4 ms
64 bytes from 192.168.99.84: icmp_seq=494 ttl=127 time=53.4 ms
64 bytes from 192.168.99.84: icmp_seq=495 ttl=127 time=24.9 ms
64 bytes from 192.168.99.84: icmp_seq=496 ttl=127 time=28.7 ms
64 bytes from 192.168.99.84: icmp_seq=497 ttl=127 time=20.2 ms
64 bytes from 192.168.99.84: icmp_seq=498 ttl=127 time=8035 ms
64 bytes from 192.168.99.84: icmp_seq=499 ttl=127 time=7072 ms
64 bytes from 192.168.99.84: icmp_seq=500 ttl=127 time=6048 ms
64 bytes from 192.168.99.84: icmp_seq=501 ttl=127 time=5024 ms
64 bytes from 192.168.99.84: icmp_seq=502 ttl=127 time=4000 ms
64 bytes from 192.168.99.84: icmp_seq=503 ttl=127 time=2980 ms
64 bytes from 192.168.99.84: icmp_seq=504 ttl=127 time=1956 ms
64 bytes from 192.168.99.84: icmp_seq=505 ttl=127 time=932 ms
64 bytes from 192.168.99.84: icmp_seq=506 ttl=127 time=31.3 ms
64 bytes from 192.168.99.84: icmp_seq=507 ttl=127 time=24.9 ms
64 bytes from 192.168.99.84: icmp_seq=508 ttl=127 time=26.5 ms
64 bytes from 192.168.99.84: icmp_seq=509 ttl=127 time=35.9 ms
64 bytes from 192.168.99.84: icmp_seq=510 ttl=127 time=23.4 ms

I am working on Pop!_OS 19.10 x86_64 and on DeepinOS 15.11, same problem. I also working on MacOS Sierra i think and Windows 10 Pro freshly patches. both with the official client and no problems.

i dont know what changed. The firmware of the Fortigate at least did not. are there any logs for openfortivpn?

DimitriPapadopoulos commented 4 years ago

I suspect openfortivpn logs won't help: the logs will just lag behind like the rest without giving an explanation for the lag. Anyway, openfortivpn can be run in verbose mode by adding multiple -v options (up to 4 if I recall correctly for maximum verbosity). Then pppd logs can be collected using option --pppd-log=.

It would be much more interesting to:

What sort of machines do you connect to on the other end? These machines may have changed. See for example: https://superuser.com/questions/1481191/remote-desktop-intermittently-freezing

mnsgs commented 4 years ago

@Zappelphilipp , for me it does not freeze every 30sec but in a more sporadic approach. We migrated to the service a ~month ago, and for me this has always been an issue. AFAICS, it appears 1) agnostic to the client connecting through, 2) agnostic to the network it connects to and I thus suspect the Fortinet infrastructure (load perhaps?) to be the cause . What puzzles me, is that it appears occurring for only a few Linux users.

I am using gping towards either Linux machines or network switches, which visualizes the issue very well - haven't managed to find any entry in system logs anywhere when issues occurs.

DimitriPapadopoulos commented 4 years ago

If it's specific to RDP sessions, it could be related to IP fragmentation and MTU or to RDP issues: https://www.google.com/search?q=RDP+IP+fragmentation+MTU

On the other hand if you can reproduce the lags with other software and plain pings, I don't know.

Note that Linux clients are restricted to VPN SSL while Windows and macOS clients often use VPN IPSec, at least by default.

mnsgs commented 4 years ago

@DimitriPapadopoulos , I use plain ping and ssh - no RDP

DimitriPapadopoulos commented 4 years ago

It could be that VPN SSL has issues that VPN IPSec does not have on this Fortinet appliance. It would be interesting to compare SSL and IPSec from a Windows machine with a FortiClient capable of both SSL and IPSec.

mrbaseman commented 4 years ago

I could reproduce these spikes on Ubuntu 16.04 with a heavy load on the tunnel. My ping times were quite stable at about 22 ms, but went up to 800-1000 ms when I started to sync large files through scp on the same vpn tunnel. The 30 seconds could be an email client that regularly checks the imap folders, something like that.

DimitriPapadopoulos commented 4 years ago

Interesting, then it would be worth closing mail clients and other usual suspects. If it doesn't help, inspecting the network traffic with Wireshark might give a clue.

mrbaseman commented 4 years ago

We have even noticed that one user can not ping anymore through his vpn tunnel (heavy packet losses) when another user transfers large files through another vpn connection. Maybe the fact that we have configured a software switch (which is not recommended) plays a role here.