OpenVPN / tap-windows6

Windows TAP driver (NDIS 6)
Other
802 stars 238 forks source link

rxpath: fix broken TCP connections on Windows Server #147

Closed lstipakov closed 2 years ago

lstipakov commented 2 years ago

Userspace OpenVPN implementation doesn't support concurrent pending writes to tun. This is not an issue, assuming that tun write is instantaneous (why would it not be?).

In short, this is what current driver implementation does on tun write:

  1. allocate NBL
  2. mark write requests as "pending"
  3. indicate NBL to NDIS
  4. when NDIS returns NBL ownership, complete pending write request

On Windows Server 2019 and 2022 there are scenarios where there is a noticeable (over second) second delay between 3. and 4., and during that time no tun write requests are performed. This breaks TCP communication inside tun, like "iperf3 -R".

I managed to break into debugger during that delay and the current owner of NBL was TCP/IP protocol driver. It is not clear for me why this delay in TCP/IP driver happens and apparently only on Server editions.

More info: https://community.osr.com/discussion/293415/delay-in-calling-returnnetbufferlistshandler-on-windows-server-2022

As a workaround, use NDIS_RECEIVE_FLAGS_RESOURCES flag with NdisMIndicateReceiveNetBufferLists call, which tells NDIS to make a copy of NBL, which enables us to immediately free NBL and complete request in place. Using this flag may have performance implications according to MSFT, but driver's peformance is already poor and according to my tests this change doesn't make it worse. For the performance one should use dco-win.

Signed-off-by: Lev Stipakov lev@openvpn.net

lstipakov commented 2 years ago

@cron2 @mattock @selvanair Are you guys able to test this fix?

To reproduce, you need Windows Server 2019/2022 with openvpn-gui installed and some Linux machine. Make a VPN connection between Windows and Linux, run "iperf3 -s" on Linux side and "iperf3 -c -R" on Windows side. With existing driver that won't work:

c:\Temp\iperf>iperf3.exe -c 10.8.0.1 -R -V
iperf 3.1.3
CYGWIN_NT-10.0 WIN-SUV09ADT77Q 2.5.1(0.297/5/3) 2016-04-21 22:14 x86_64
Time: Tue, 12 Apr 2022 12:57:38 GMT
Connecting to host 10.8.0.1, port 5201
Reverse mode, remote host 10.8.0.1 is sending
      Cookie: WIN-SUV09ADT77Q.1649768258.143831.09
      TCP MSS: 0 (default)
[  4] local 10.8.0.2 port 50780 connected to 10.8.0.1 port 5201
Starting Test: protocol: TCP, 1 streams, 131072 byte blocks, omitting 0 seconds, 10 second test
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-15.04  sec  32.8 KBytes  17.9 Kbits/sec
[  4]  15.04-15.04  sec  0.00 Bytes  0.00 bits/sec
[  4]  15.04-15.04  sec  0.00 Bytes  0.00 bits/sec
[  4]  15.04-15.04  sec  0.00 Bytes  0.00 bits/sec
[  4]  15.04-15.04  sec  0.00 Bytes  0.00 bits/sec
[  4]  15.04-15.04  sec  0.00 Bytes  0.00 bits/sec
[  4]  15.04-15.04  sec  0.00 Bytes  0.00 bits/sec
[  4]  15.04-15.04  sec  0.00 Bytes  0.00 bits/sec
[  4]  15.04-15.04  sec  0.00 Bytes  0.00 bits/sec
[  4]  15.04-15.04  sec  0.00 Bytes  0.00 bits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
Test Complete. Summary Results:
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-15.04  sec   108 KBytes  58.8 Kbits/sec   21             sender
[  4]   0.00-15.04  sec   108 KBytes  58.8 Kbits/sec                  receiver
CPU Utilization: local/receiver 97.4% (45.2%u/52.2%s), remote/sender 0.0% (0.0%u/0.0%s)

iperf Done.

With fix:

c:\Temp\iperf>iperf3.exe -c 10.8.0.1 -R -V
iperf 3.1.3
CYGWIN_NT-10.0 WIN-SUV09ADT77Q 2.5.1(0.297/5/3) 2016-04-21 22:14 x86_64
Time: Tue, 12 Apr 2022 13:00:07 GMT
Connecting to host 10.8.0.1, port 5201
Reverse mode, remote host 10.8.0.1 is sending
      Cookie: WIN-SUV09ADT77Q.1649768407.354551.03
      TCP MSS: 0 (default)
[  4] local 10.8.0.2 port 52801 connected to 10.8.0.1 port 5201
Starting Test: protocol: TCP, 1 streams, 131072 byte blocks, omitting 0 seconds, 10 second test
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.00   sec  41.1 MBytes   344 Mbits/sec
[  4]   1.00-2.00   sec  44.3 MBytes   372 Mbits/sec
[  4]   2.00-3.00   sec  44.6 MBytes   374 Mbits/sec
[  4]   3.00-4.00   sec  42.6 MBytes   358 Mbits/sec
[  4]   4.00-5.00   sec  44.6 MBytes   374 Mbits/sec
[  4]   5.00-6.00   sec  44.3 MBytes   372 Mbits/sec
[  4]   6.00-7.00   sec  46.6 MBytes   391 Mbits/sec
[  4]   7.00-8.00   sec  40.8 MBytes   342 Mbits/sec
[  4]   8.00-9.00   sec  44.1 MBytes   370 Mbits/sec
[  4]   9.00-10.00  sec  45.2 MBytes   380 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
Test Complete. Summary Results:
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec   439 MBytes   368 Mbits/sec  271             sender
[  4]   0.00-10.00  sec   439 MBytes   368 Mbits/sec                  receiver
CPU Utilization: local/receiver 40.5% (16.7%u/23.8%s), remote/sender 0.2% (0.0%u/0.1%s)

iperf Done.

And this is why you need to use dco-win:

c:\Temp\iperf>iperf3.exe -c 10.8.0.1 -R -V
iperf 3.1.3
CYGWIN_NT-10.0 WIN-SUV09ADT77Q 2.5.1(0.297/5/3) 2016-04-21 22:14 x86_64
Time: Tue, 12 Apr 2022 13:01:22 GMT
Connecting to host 10.8.0.1, port 5201
Reverse mode, remote host 10.8.0.1 is sending
      Cookie: WIN-SUV09ADT77Q.1649768482.843634.65
      TCP MSS: 0 (default)
[  4] local 10.8.0.2 port 54982 connected to 10.8.0.1 port 5201
Starting Test: protocol: TCP, 1 streams, 131072 byte blocks, omitting 0 seconds, 10 second test
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.00   sec   132 MBytes  1.11 Gbits/sec
[  4]   1.00-2.00   sec   135 MBytes  1.13 Gbits/sec
[  4]   2.00-3.00   sec   136 MBytes  1.14 Gbits/sec
[  4]   3.00-4.00   sec   129 MBytes  1.09 Gbits/sec
[  4]   4.00-5.00   sec   133 MBytes  1.12 Gbits/sec
[  4]   5.00-6.00   sec   125 MBytes  1.05 Gbits/sec
[  4]   6.00-7.00   sec   139 MBytes  1.17 Gbits/sec
[  4]   7.00-8.00   sec   132 MBytes  1.11 Gbits/sec
[  4]   8.00-9.00   sec   111 MBytes   928 Mbits/sec
[  4]   9.00-10.00  sec   158 MBytes  1.33 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
Test Complete. Summary Results:
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  1.30 GBytes  1.12 Gbits/sec    0             sender
[  4]   0.00-10.00  sec  1.30 GBytes  1.12 Gbits/sec                  receiver
CPU Utilization: local/receiver 21.7% (5.5%u/16.2%s), remote/sender 3.3% (0.1%u/3.2%s)

iperf Done.

For comparison, here is wintun:

c:\Temp\iperf>iperf3.exe -c 10.8.0.1 -R -V
iperf 3.1.3
CYGWIN_NT-10.0 WIN-SUV09ADT77Q 2.5.1(0.297/5/3) 2016-04-21 22:14 x86_64
Time: Tue, 12 Apr 2022 13:02:09 GMT
Connecting to host 10.8.0.1, port 5201
Reverse mode, remote host 10.8.0.1 is sending
      Cookie: WIN-SUV09ADT77Q.1649768529.396845.64
      TCP MSS: 0 (default)
[  4] local 10.8.0.2 port 60078 connected to 10.8.0.1 port 5201
Starting Test: protocol: TCP, 1 streams, 131072 byte blocks, omitting 0 seconds, 10 second test
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.00   sec  40.7 MBytes   340 Mbits/sec
[  4]   1.00-2.00   sec  42.2 MBytes   354 Mbits/sec
[  4]   2.00-3.00   sec  41.4 MBytes   348 Mbits/sec
[  4]   3.00-4.00   sec  41.7 MBytes   350 Mbits/sec
[  4]   4.00-5.00   sec  41.7 MBytes   350 Mbits/sec
[  4]   5.00-6.00   sec  42.6 MBytes   358 Mbits/sec
[  4]   6.00-7.00   sec  41.0 MBytes   344 Mbits/sec
[  4]   7.00-8.00   sec  41.5 MBytes   348 Mbits/sec
[  4]   8.00-9.00   sec  43.1 MBytes   361 Mbits/sec
[  4]   9.00-10.00  sec  41.9 MBytes   351 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
Test Complete. Summary Results:
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec   418 MBytes   351 Mbits/sec   64             sender
[  4]   0.00-10.00  sec   418 MBytes   350 Mbits/sec                  receiver
CPU Utilization: local/receiver 54.5% (23.4%u/31.1%s), remote/sender 0.5% (0.0%u/0.4%s)

iperf Done.
rozmansi commented 2 years ago

Initially, we used NDIS_RECEIVE_FLAGS_RESOURCES in Wintun for simplicity too. We replaced NDIS_RECEIVE_FLAGS_RESOURCES mainly to avoid an additional memcpy by NdisMIndicateReceiveNetBufferLists(). However, I remember clearly that testing Wintun revealed that MINIPORT_RETURN_NET_BUFFER_LISTS handler was always called before NdisMIndicateReceiveNetBufferLists() even returned – regardless the NDIS_RECEIVE_FLAGS_RESOURCES - and on the same thread/callstack. NdisMIndicateReceiveNetBufferLists() proved synchronous on all my test runs. I see you found a case where NdisMIndicateReceiveNetBufferLists() actually does go async. NDIS – TCP/IP stack is not completely the same on Windows and Windows Server.

So, I would suggest you go forward and turn on the NDIS_RECEIVE_FLAGS_RESOURCES. You will get a simpler and obviously more reliable driver.

No, we haven’t had any bad experience with NDIS_RECEIVE_FLAGS_RESOURCES in Wintun v0.1-v0.4.