Closed Jimmy01240397 closed 1 year ago
Hi,
On Fri, Apr 07, 2023 at 01:41:35PM -0700, Chumy wrote:
I tried to setup pppoe over openvpn tap. When my windows client connected openvpn and pppoe. There had package lost at server.
packet loss can always happen if the VPN is using UDP transport - so that's not a bug in itself but the nature of UDP. PPPoE can deal with it, and retransmit the lost packet.
Now, if it is always the same packet (PPP ACK) that is lost, and PPPoE handshake never finishes, this hints at a problem somewhere.
gert -- "If was one thing all people took for granted, was conviction that if you feed honest figures into a computer, honest figures come out. Never doubted it myself till I met a computer with a sense of humor." Robert A. Heinlein, The Moon is a Harsh Mistress
Gert Doering - Munich, Germany @.***
Hi, On Fri, Apr 07, 2023 at 01:41:35PM -0700, Chumy wrote: I tried to setup pppoe over openvpn tap. When my windows client connected openvpn and pppoe. There had package lost at server. packet loss can always happen if the VPN is using UDP transport - so that's not a bug in itself but the nature of UDP. PPPoE can deal with it, and retransmit the lost packet. Now, if it is always the same packet (PPP ACK) that is lost, and PPPoE handshake never finishes, this hints at a problem somewhere. gert -- "If was one thing all people took for granted, was conviction that if you feed honest figures into a computer, honest figures come out. Never doubted it myself till I met a computer with a sense of humor." Robert A. Heinlein, The Moon is a Harsh Mistress Gert Doering - Munich, Germany @.***
It's not because of UDP. I set Openvpn to unencrypted and checked vpn interface and physical interface on windows client. It didn't even send a vpn packet contain PPP ACK from the physical interface.
Hi,
On Sun, Apr 09, 2023 at 04:00:04PM -0700, Arne Schwabe wrote:
the PPP ACK that the VPN/application that uses the tap interface. That has even less to do with the TAP driver.
I think what the original poster is doing is
I'm not sure why one would want to do this (there is nothing Windows PPPoE can do that OpenVPN couldn't do itself) but the TAP interface should be sufficiently transparent so that this should work...
gert -- Gert Doering - Munich, Germany @.***
So in fact, I have tested many times with OpenVPN GUI on different windows clients. Every time I connect to VPN Server to run PPPoE, there always will be an Ack loss, and if I switch to linux client, it will work normally every time. So I can only judge that there is a problem with the OpenVPN GUI of windows or there is a problem with the tap-windows driver.
Hi, On Sun, Apr 09, 2023 at 04:00:04PM -0700, Arne Schwabe wrote: the PPP ACK that the VPN/application that uses the tap interface. That has even less to do with the TAP driver. I think what the original poster is doing is - run openvpn in TAP mode - run PPPoE over the TAP interface I'm not sure why one would want to do this (there is nothing Windows PPPoE can do that OpenVPN couldn't do itself) but the TAP interface should be sufficiently transparent so that this should work... gert -- Gert Doering - Munich, Germany @.***
I think tap-windows has a bug when it sends a packet with a packet length below a certain value to the openvpn server.
I find the problem at txpath.c in tapNetBufferListNetBufferLengthsValid function. It only send packet when packet length >= Etherheader(14) + IPv4header(20), but my PPP IPCP ACK length is 32 bytes. PPPoE IPCP has no IP header.
Good find.
(The ASSERT() is not what is causing the packet drop, but the length comparison two lines down)
@Jimmy01240397 can you test this installer and see if problem is now fixed?
@lstipakov My cpu architecture is amd64. Can you build a version of amd64?
Sorry, feel free to pick the right architecture :)
https://github.com/OpenVPN/openvpn-build/actions/runs/4805886974
@lstipakov My cpu architecture is amd64. Can you build a version of amd64?
It make windows crash when connect :(
Is this amd64? Can you share memory.dmp
?
Yes it is amd64. Here is openvpn log and memory.dmp. logandmemdump.7z.001.gz logandmemdump.7z.002.gz
I cannot open it:
Microsoft (R) Windows Debugger Version 10.0.22621.755 AMD64
Copyright (c) Microsoft Corporation. All rights reserved.
Loading Dump File [C:\Temp\dump\MEMORY.DMP]
Kernel Bitmap Dump File: Kernel address space is available, User address space may not be available.
Symbol search path is: srv*
Executable search path is:
**************************************************************************
THIS DUMP FILE IS PARTIALLY CORRUPT.
KdDebuggerDataBlock is not present or unreadable.
**************************************************************************
Unable to read PsLoadedModuleList
**************************************************************************
THIS DUMP FILE IS PARTIALLY CORRUPT.
KdDebuggerDataBlock is not present or unreadable.
**************************************************************************
KdDebuggerData.KernBase < SystemRangeStart
Windows 10 Kernel Version 22000 MP (8 procs) Free x64
Product: WinNt, suite: TerminalServer SingleUserTS
Machine Name:
Kernel base = 0x00000000`00000000 PsLoadedModuleList = 0xfffff801`344296b0
Debug session time: Wed Apr 26 18:40:45.681 2023 (UTC + 3:00)
System Uptime: 0 days 0:24:35.509
**************************************************************************
THIS DUMP FILE IS PARTIALLY CORRUPT.
KdDebuggerDataBlock is not present or unreadable.
**************************************************************************
Unable to read PsLoadedModuleList
**************************************************************************
THIS DUMP FILE IS PARTIALLY CORRUPT.
KdDebuggerDataBlock is not present or unreadable.
**************************************************************************
KdDebuggerData.KernBase < SystemRangeStart
Loading Kernel Symbols
Unable to read PsLoadedModuleList
GetContextState failed, 0xD0000147
CS descriptor lookup failed
GetContextState failed, 0xD0000147
For analysis of this file, run !analyze -v
GetContextState failed, 0xD0000147
GetContextState failed, 0xD0000147
GetContextState failed, 0xD0000147
GetContextState failed, 0xD0000147
GetContextState failed, 0xD0000147
GetContextState failed, 0xD0000147
GetContextState failed, 0xD0000147
GetContextState failed, 0xD0000147
GetContextState failed, 0xD0000147
GetContextState failed, 0xD0000147
GetContextState failed, 0xD0000147
?: kd> !analyze -v
GetContextState failed, 0xD0000147
Unable to get program counter
GetContextState failed, 0xD0000147
Unable to get current machine context, NTSTATUS 0xC0000147
GetContextState failed, 0xD0000147
Unable to get current machine context, NTSTATUS 0xC0000147
GetContextState failed, 0xD0000147
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************
SYSTEM_THREAD_EXCEPTION_NOT_HANDLED (7e)
This is a very common BugCheck. Usually the exception address pinpoints
the driver/function that caused the problem. Always note this address
as well as the link date of the driver/image that contains this address.
Arguments:
Arg1: ffffffffc0000005, The exception code that was not handled
Arg2: fffff80135f008c2, The address that the exception occurred at
Arg3: ffff8408e90bea38, Exception Record Address
Arg4: ffff8408e90be250, Context Record Address
Debugging Details:
------------------
GetContextState failed, 0xD0000147
Unable to get current machine context, NTSTATUS 0xC0000147
GetContextState failed, 0xD0000147
GetContextState failed, 0xD0000147
Unable to get current machine context, NTSTATUS 0xC0000147
***** Debugger could not find nt in module list, module list might be corrupt, error 0x80070057.
GetContextState failed, 0xD0000147
Unable to get current machine context, NTSTATUS 0xC0000147
GetContextState failed, 0xD0000147
Unable to get current machine context, NTSTATUS 0xC0000147
ReadControl failed - kernel symbols must be loaded first
ReadControl failed - kernel symbols must be loaded first
GetContextState failed, 0xD0000147
ReadControl failed - kernel symbols must be loaded first
and so on.
I got the same message.
Okay, so we need to get a proper dump first.
There are few changes in the latest driver which might cause BSOS - one is PPPoE related change, another one is TCP performance fix for Windows Server 2022.
I've build the driver which has Windows Server 2022 TCP fix but not PPPoE fix. Can you give it a try - here is the link to the installer. Does it still give you BSOD?
Yes, It still gave me BSOD.
Thanks, can you test this one? It includes your fix but not Windows Server 2022 TCP fix.
Good! PPPoE connect success and no BSOD.
Thanks!
The fix is in the 2.6.3-I003.
I tried to setup pppoe over openvpn tap. When my windows client connected openvpn and pppoe. There had packet lost at server. Here is a pcap comparison of vpn server and client.
Here are pcap files. pcap.zip
Here are config files for server and client. openvpnconf.zip
BTW, I also tried linux client and it work normally.