Closed lstipakov closed 2 years ago
@cron2 @mattock @selvanair Are you guys able to test this fix?
To reproduce, you need Windows Server 2019/2022 with openvpn-gui installed and some Linux machine. Make a VPN connection between Windows and Linux, run "iperf3 -s" on Linux side and "iperf3 -c
c:\Temp\iperf>iperf3.exe -c 10.8.0.1 -R -V
iperf 3.1.3
CYGWIN_NT-10.0 WIN-SUV09ADT77Q 2.5.1(0.297/5/3) 2016-04-21 22:14 x86_64
Time: Tue, 12 Apr 2022 12:57:38 GMT
Connecting to host 10.8.0.1, port 5201
Reverse mode, remote host 10.8.0.1 is sending
Cookie: WIN-SUV09ADT77Q.1649768258.143831.09
TCP MSS: 0 (default)
[ 4] local 10.8.0.2 port 50780 connected to 10.8.0.1 port 5201
Starting Test: protocol: TCP, 1 streams, 131072 byte blocks, omitting 0 seconds, 10 second test
[ ID] Interval Transfer Bandwidth
[ 4] 0.00-15.04 sec 32.8 KBytes 17.9 Kbits/sec
[ 4] 15.04-15.04 sec 0.00 Bytes 0.00 bits/sec
[ 4] 15.04-15.04 sec 0.00 Bytes 0.00 bits/sec
[ 4] 15.04-15.04 sec 0.00 Bytes 0.00 bits/sec
[ 4] 15.04-15.04 sec 0.00 Bytes 0.00 bits/sec
[ 4] 15.04-15.04 sec 0.00 Bytes 0.00 bits/sec
[ 4] 15.04-15.04 sec 0.00 Bytes 0.00 bits/sec
[ 4] 15.04-15.04 sec 0.00 Bytes 0.00 bits/sec
[ 4] 15.04-15.04 sec 0.00 Bytes 0.00 bits/sec
[ 4] 15.04-15.04 sec 0.00 Bytes 0.00 bits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
Test Complete. Summary Results:
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-15.04 sec 108 KBytes 58.8 Kbits/sec 21 sender
[ 4] 0.00-15.04 sec 108 KBytes 58.8 Kbits/sec receiver
CPU Utilization: local/receiver 97.4% (45.2%u/52.2%s), remote/sender 0.0% (0.0%u/0.0%s)
iperf Done.
With fix:
c:\Temp\iperf>iperf3.exe -c 10.8.0.1 -R -V
iperf 3.1.3
CYGWIN_NT-10.0 WIN-SUV09ADT77Q 2.5.1(0.297/5/3) 2016-04-21 22:14 x86_64
Time: Tue, 12 Apr 2022 13:00:07 GMT
Connecting to host 10.8.0.1, port 5201
Reverse mode, remote host 10.8.0.1 is sending
Cookie: WIN-SUV09ADT77Q.1649768407.354551.03
TCP MSS: 0 (default)
[ 4] local 10.8.0.2 port 52801 connected to 10.8.0.1 port 5201
Starting Test: protocol: TCP, 1 streams, 131072 byte blocks, omitting 0 seconds, 10 second test
[ ID] Interval Transfer Bandwidth
[ 4] 0.00-1.00 sec 41.1 MBytes 344 Mbits/sec
[ 4] 1.00-2.00 sec 44.3 MBytes 372 Mbits/sec
[ 4] 2.00-3.00 sec 44.6 MBytes 374 Mbits/sec
[ 4] 3.00-4.00 sec 42.6 MBytes 358 Mbits/sec
[ 4] 4.00-5.00 sec 44.6 MBytes 374 Mbits/sec
[ 4] 5.00-6.00 sec 44.3 MBytes 372 Mbits/sec
[ 4] 6.00-7.00 sec 46.6 MBytes 391 Mbits/sec
[ 4] 7.00-8.00 sec 40.8 MBytes 342 Mbits/sec
[ 4] 8.00-9.00 sec 44.1 MBytes 370 Mbits/sec
[ 4] 9.00-10.00 sec 45.2 MBytes 380 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
Test Complete. Summary Results:
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 439 MBytes 368 Mbits/sec 271 sender
[ 4] 0.00-10.00 sec 439 MBytes 368 Mbits/sec receiver
CPU Utilization: local/receiver 40.5% (16.7%u/23.8%s), remote/sender 0.2% (0.0%u/0.1%s)
iperf Done.
And this is why you need to use dco-win:
c:\Temp\iperf>iperf3.exe -c 10.8.0.1 -R -V
iperf 3.1.3
CYGWIN_NT-10.0 WIN-SUV09ADT77Q 2.5.1(0.297/5/3) 2016-04-21 22:14 x86_64
Time: Tue, 12 Apr 2022 13:01:22 GMT
Connecting to host 10.8.0.1, port 5201
Reverse mode, remote host 10.8.0.1 is sending
Cookie: WIN-SUV09ADT77Q.1649768482.843634.65
TCP MSS: 0 (default)
[ 4] local 10.8.0.2 port 54982 connected to 10.8.0.1 port 5201
Starting Test: protocol: TCP, 1 streams, 131072 byte blocks, omitting 0 seconds, 10 second test
[ ID] Interval Transfer Bandwidth
[ 4] 0.00-1.00 sec 132 MBytes 1.11 Gbits/sec
[ 4] 1.00-2.00 sec 135 MBytes 1.13 Gbits/sec
[ 4] 2.00-3.00 sec 136 MBytes 1.14 Gbits/sec
[ 4] 3.00-4.00 sec 129 MBytes 1.09 Gbits/sec
[ 4] 4.00-5.00 sec 133 MBytes 1.12 Gbits/sec
[ 4] 5.00-6.00 sec 125 MBytes 1.05 Gbits/sec
[ 4] 6.00-7.00 sec 139 MBytes 1.17 Gbits/sec
[ 4] 7.00-8.00 sec 132 MBytes 1.11 Gbits/sec
[ 4] 8.00-9.00 sec 111 MBytes 928 Mbits/sec
[ 4] 9.00-10.00 sec 158 MBytes 1.33 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
Test Complete. Summary Results:
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 1.30 GBytes 1.12 Gbits/sec 0 sender
[ 4] 0.00-10.00 sec 1.30 GBytes 1.12 Gbits/sec receiver
CPU Utilization: local/receiver 21.7% (5.5%u/16.2%s), remote/sender 3.3% (0.1%u/3.2%s)
iperf Done.
For comparison, here is wintun:
c:\Temp\iperf>iperf3.exe -c 10.8.0.1 -R -V
iperf 3.1.3
CYGWIN_NT-10.0 WIN-SUV09ADT77Q 2.5.1(0.297/5/3) 2016-04-21 22:14 x86_64
Time: Tue, 12 Apr 2022 13:02:09 GMT
Connecting to host 10.8.0.1, port 5201
Reverse mode, remote host 10.8.0.1 is sending
Cookie: WIN-SUV09ADT77Q.1649768529.396845.64
TCP MSS: 0 (default)
[ 4] local 10.8.0.2 port 60078 connected to 10.8.0.1 port 5201
Starting Test: protocol: TCP, 1 streams, 131072 byte blocks, omitting 0 seconds, 10 second test
[ ID] Interval Transfer Bandwidth
[ 4] 0.00-1.00 sec 40.7 MBytes 340 Mbits/sec
[ 4] 1.00-2.00 sec 42.2 MBytes 354 Mbits/sec
[ 4] 2.00-3.00 sec 41.4 MBytes 348 Mbits/sec
[ 4] 3.00-4.00 sec 41.7 MBytes 350 Mbits/sec
[ 4] 4.00-5.00 sec 41.7 MBytes 350 Mbits/sec
[ 4] 5.00-6.00 sec 42.6 MBytes 358 Mbits/sec
[ 4] 6.00-7.00 sec 41.0 MBytes 344 Mbits/sec
[ 4] 7.00-8.00 sec 41.5 MBytes 348 Mbits/sec
[ 4] 8.00-9.00 sec 43.1 MBytes 361 Mbits/sec
[ 4] 9.00-10.00 sec 41.9 MBytes 351 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
Test Complete. Summary Results:
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 418 MBytes 351 Mbits/sec 64 sender
[ 4] 0.00-10.00 sec 418 MBytes 350 Mbits/sec receiver
CPU Utilization: local/receiver 54.5% (23.4%u/31.1%s), remote/sender 0.5% (0.0%u/0.4%s)
iperf Done.
Initially, we used NDIS_RECEIVE_FLAGS_RESOURCES
in Wintun for simplicity too. We replaced NDIS_RECEIVE_FLAGS_RESOURCES
mainly to avoid an additional memcpy by NdisMIndicateReceiveNetBufferLists()
. However, I remember clearly that testing Wintun revealed that MINIPORT_RETURN_NET_BUFFER_LISTS
handler was always called before NdisMIndicateReceiveNetBufferLists()
even returned – regardless the NDIS_RECEIVE_FLAGS_RESOURCES
- and on the same thread/callstack. NdisMIndicateReceiveNetBufferLists()
proved synchronous on all my test runs. I see you found a case where NdisMIndicateReceiveNetBufferLists()
actually does go async. NDIS – TCP/IP stack is not completely the same on Windows and Windows Server.
So, I would suggest you go forward and turn on the NDIS_RECEIVE_FLAGS_RESOURCES
. You will get a simpler and obviously more reliable driver.
No, we haven’t had any bad experience with NDIS_RECEIVE_FLAGS_RESOURCES
in Wintun v0.1-v0.4.
Userspace OpenVPN implementation doesn't support concurrent pending writes to tun. This is not an issue, assuming that tun write is instantaneous (why would it not be?).
In short, this is what current driver implementation does on tun write:
On Windows Server 2019 and 2022 there are scenarios where there is a noticeable (over second) second delay between 3. and 4., and during that time no tun write requests are performed. This breaks TCP communication inside tun, like "iperf3 -R".
I managed to break into debugger during that delay and the current owner of NBL was TCP/IP protocol driver. It is not clear for me why this delay in TCP/IP driver happens and apparently only on Server editions.
More info: https://community.osr.com/discussion/293415/delay-in-calling-returnnetbufferlistshandler-on-windows-server-2022
As a workaround, use NDIS_RECEIVE_FLAGS_RESOURCES flag with NdisMIndicateReceiveNetBufferLists call, which tells NDIS to make a copy of NBL, which enables us to immediately free NBL and complete request in place. Using this flag may have performance implications according to MSFT, but driver's peformance is already poor and according to my tests this change doesn't make it worse. For the performance one should use dco-win.
Signed-off-by: Lev Stipakov lev@openvpn.net