OpenVPN / tap-windows6

Windows TAP driver (NDIS 6)
Other
794 stars 237 forks source link

Packet lost for pppoe over openvpn tap #158

Closed Jimmy01240397 closed 1 year ago

Jimmy01240397 commented 1 year ago

I tried to setup pppoe over openvpn tap. When my windows client connected openvpn and pppoe. There had packet lost at server. Here is a pcap comparison of vpn server and client. image

Here are pcap files. pcap.zip

Here are config files for server and client. openvpnconf.zip

BTW, I also tried linux client and it work normally.

cron2 commented 1 year ago

Hi,

On Fri, Apr 07, 2023 at 01:41:35PM -0700, Chumy wrote:

I tried to setup pppoe over openvpn tap. When my windows client connected openvpn and pppoe. There had package lost at server.

packet loss can always happen if the VPN is using UDP transport - so that's not a bug in itself but the nature of UDP. PPPoE can deal with it, and retransmit the lost packet.

Now, if it is always the same packet (PPP ACK) that is lost, and PPPoE handshake never finishes, this hints at a problem somewhere.

gert -- "If was one thing all people took for granted, was conviction that if you feed honest figures into a computer, honest figures come out. Never doubted it myself till I met a computer with a sense of humor." Robert A. Heinlein, The Moon is a Harsh Mistress

Gert Doering - Munich, Germany @.***

Jimmy01240397 commented 1 year ago

Hi, On Fri, Apr 07, 2023 at 01:41:35PM -0700, Chumy wrote: I tried to setup pppoe over openvpn tap. When my windows client connected openvpn and pppoe. There had package lost at server. packet loss can always happen if the VPN is using UDP transport - so that's not a bug in itself but the nature of UDP. PPPoE can deal with it, and retransmit the lost packet. Now, if it is always the same packet (PPP ACK) that is lost, and PPPoE handshake never finishes, this hints at a problem somewhere. gert -- "If was one thing all people took for granted, was conviction that if you feed honest figures into a computer, honest figures come out. Never doubted it myself till I met a computer with a sense of humor." Robert A. Heinlein, The Moon is a Harsh Mistress Gert Doering - Munich, Germany @.***

It's not because of UDP. I set Openvpn to unencrypted and checked vpn interface and physical interface on windows client. It didn't even send a vpn packet contain PPP ACK from the physical interface.

cron2 commented 1 year ago

Hi,

On Sun, Apr 09, 2023 at 04:00:04PM -0700, Arne Schwabe wrote:

the PPP ACK that the VPN/application that uses the tap interface. That has even less to do with the TAP driver.

I think what the original poster is doing is

I'm not sure why one would want to do this (there is nothing Windows PPPoE can do that OpenVPN couldn't do itself) but the TAP interface should be sufficiently transparent so that this should work...

gert -- Gert Doering - Munich, Germany @.***

Jimmy01240397 commented 1 year ago

So in fact, I have tested many times with OpenVPN GUI on different windows clients. Every time I connect to VPN Server to run PPPoE, there always will be an Ack loss, and if I switch to linux client, it will work normally every time. So I can only judge that there is a problem with the OpenVPN GUI of windows or there is a problem with the tap-windows driver.

Jimmy01240397 commented 1 year ago

Hi, On Sun, Apr 09, 2023 at 04:00:04PM -0700, Arne Schwabe wrote: the PPP ACK that the VPN/application that uses the tap interface. That has even less to do with the TAP driver. I think what the original poster is doing is - run openvpn in TAP mode - run PPPoE over the TAP interface I'm not sure why one would want to do this (there is nothing Windows PPPoE can do that OpenVPN couldn't do itself) but the TAP interface should be sufficiently transparent so that this should work... gert -- Gert Doering - Munich, Germany @.***

I think tap-windows has a bug when it sends a packet with a packet length below a certain value to the openvpn server.

Jimmy01240397 commented 1 year ago

I find the problem at txpath.c in tapNetBufferListNetBufferLengthsValid function. It only send packet when packet length >= Etherheader(14) + IPv4header(20), but my PPP IPCP ACK length is 32 bytes. PPPoE IPCP has no IP header.

image

cron2 commented 1 year ago

Good find.

(The ASSERT() is not what is causing the packet drop, but the length comparison two lines down)

lstipakov commented 1 year ago

@Jimmy01240397 can you test this installer and see if problem is now fixed?

Jimmy01240397 commented 1 year ago

@lstipakov My cpu architecture is amd64. Can you build a version of amd64?

lstipakov commented 1 year ago

Sorry, feel free to pick the right architecture :)

https://github.com/OpenVPN/openvpn-build/actions/runs/4805886974

@lstipakov My cpu architecture is amd64. Can you build a version of amd64?

Jimmy01240397 commented 1 year ago

It make windows crash when connect :( image image image

lstipakov commented 1 year ago

Is this amd64? Can you share memory.dmp?

Jimmy01240397 commented 1 year ago

Yes it is amd64. Here is openvpn log and memory.dmp. logandmemdump.7z.001.gz logandmemdump.7z.002.gz

lstipakov commented 1 year ago

I cannot open it:

Microsoft (R) Windows Debugger Version 10.0.22621.755 AMD64
Copyright (c) Microsoft Corporation. All rights reserved.

Loading Dump File [C:\Temp\dump\MEMORY.DMP]
Kernel Bitmap Dump File: Kernel address space is available, User address space may not be available.

Symbol search path is: srv*
Executable search path is: 
**************************************************************************
THIS DUMP FILE IS PARTIALLY CORRUPT.
KdDebuggerDataBlock is not present or unreadable.
**************************************************************************
Unable to read PsLoadedModuleList
**************************************************************************
THIS DUMP FILE IS PARTIALLY CORRUPT.
KdDebuggerDataBlock is not present or unreadable.
**************************************************************************
KdDebuggerData.KernBase < SystemRangeStart
Windows 10 Kernel Version 22000 MP (8 procs) Free x64
Product: WinNt, suite: TerminalServer SingleUserTS
Machine Name:
Kernel base = 0x00000000`00000000 PsLoadedModuleList = 0xfffff801`344296b0
Debug session time: Wed Apr 26 18:40:45.681 2023 (UTC + 3:00)
System Uptime: 0 days 0:24:35.509
**************************************************************************
THIS DUMP FILE IS PARTIALLY CORRUPT.
KdDebuggerDataBlock is not present or unreadable.
**************************************************************************
Unable to read PsLoadedModuleList
**************************************************************************
THIS DUMP FILE IS PARTIALLY CORRUPT.
KdDebuggerDataBlock is not present or unreadable.
**************************************************************************
KdDebuggerData.KernBase < SystemRangeStart
Loading Kernel Symbols
Unable to read PsLoadedModuleList
GetContextState failed, 0xD0000147
CS descriptor lookup failed
GetContextState failed, 0xD0000147
For analysis of this file, run !analyze -v
GetContextState failed, 0xD0000147
GetContextState failed, 0xD0000147
GetContextState failed, 0xD0000147
GetContextState failed, 0xD0000147
GetContextState failed, 0xD0000147
GetContextState failed, 0xD0000147
GetContextState failed, 0xD0000147
GetContextState failed, 0xD0000147
GetContextState failed, 0xD0000147
GetContextState failed, 0xD0000147
GetContextState failed, 0xD0000147
?: kd> !analyze -v
GetContextState failed, 0xD0000147
Unable to get program counter
GetContextState failed, 0xD0000147
Unable to get current machine context, NTSTATUS 0xC0000147
GetContextState failed, 0xD0000147
Unable to get current machine context, NTSTATUS 0xC0000147
GetContextState failed, 0xD0000147
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

SYSTEM_THREAD_EXCEPTION_NOT_HANDLED (7e)
This is a very common BugCheck.  Usually the exception address pinpoints
the driver/function that caused the problem.  Always note this address
as well as the link date of the driver/image that contains this address.
Arguments:
Arg1: ffffffffc0000005, The exception code that was not handled
Arg2: fffff80135f008c2, The address that the exception occurred at
Arg3: ffff8408e90bea38, Exception Record Address
Arg4: ffff8408e90be250, Context Record Address

Debugging Details:
------------------

GetContextState failed, 0xD0000147
Unable to get current machine context, NTSTATUS 0xC0000147
GetContextState failed, 0xD0000147
GetContextState failed, 0xD0000147
Unable to get current machine context, NTSTATUS 0xC0000147
***** Debugger could not find nt in module list, module list might be corrupt, error 0x80070057.

GetContextState failed, 0xD0000147
Unable to get current machine context, NTSTATUS 0xC0000147
GetContextState failed, 0xD0000147
Unable to get current machine context, NTSTATUS 0xC0000147
ReadControl failed - kernel symbols must be loaded first
ReadControl failed - kernel symbols must be loaded first
GetContextState failed, 0xD0000147
ReadControl failed - kernel symbols must be loaded first

and so on.

Jimmy01240397 commented 1 year ago

I got the same message. image

lstipakov commented 1 year ago

Okay, so we need to get a proper dump first.

There are few changes in the latest driver which might cause BSOS - one is PPPoE related change, another one is TCP performance fix for Windows Server 2022.

I've build the driver which has Windows Server 2022 TCP fix but not PPPoE fix. Can you give it a try - here is the link to the installer. Does it still give you BSOD?

Jimmy01240397 commented 1 year ago

Yes, It still gave me BSOD.

lstipakov commented 1 year ago

Thanks, can you test this one? It includes your fix but not Windows Server 2022 TCP fix.

Jimmy01240397 commented 1 year ago

Good! PPPoE connect success and no BSOD. image

Thanks!

lstipakov commented 1 year ago

The fix is in the 2.6.3-I003.