Closed epiciskandar closed 1 year ago
it looks like a simple kernel buffer overflow issue happened? the VBoxUSB.sys
signatured by Oracle and verified with Windows Driver Verifier. so, whom should I expected to solve this problem?
@epiciskandar This is definitely a bug in the VBoxUsb.sys driver, but maybe we can work around it. In any case, it may be worthwhile to report this to VirtualBox a well (together with this analysis).
I found 2 occurrences of MmUnlockPages in the VBoxUsb code: 1) When a URB is created, but something fails half-way, then MmUnlockPages is part of the failure cleanup 2) When a URB is completed, after it was successfully sent.
The code for (2) looks OK: in that case the URB was fully created, sent, and completed. Nothing went wrong and the completion is very straightforward. Nothing weird there.
So (1) is the suspicious call. And I found something interesting: there is one code path where an MDL was successfully created, and probed (without exception), but the mapping fails:
From VBoxUsbRt.cpp
/* For some reason, passing a MDL in the URB does not work reliably. Notably
* the iPhone when used with iTunes fails.
*/
PVOID pBuffer = MmGetSystemAddressForMdlSafe(pMdlBuf, NormalPagePriority);
if (!pBuffer)
{
AssertMsgFailed((__FUNCTION__": MmGetSystemAddressForMdlSafe failed\n"));
Status = STATUS_INSUFFICIENT_RESOURCES;
break;
}
And then later in the function, the failure cleanup is done with:
if (pMdlBuf)
{
MmUnlockPages(pMdlBuf);
IoFreeMdl(pMdlBuf);
}
So, even if the MmGetSystemAddressForMdlSafe fails (and the comment section is already indicating that the developer thinks something fishy is going on...), then still MmUnlockPages is called. I think this call is the BSOD you are seeing.
The example code from Microsoft (https://docs.microsoft.com/en-us/windows-hardware/drivers/kernel/using-mdls):
VOID MyFreeMdl(PMDL Mdl)
{
PMDL currentMdl, nextMdl;
for (currentMdl = Mdl; currentMdl != NULL; currentMdl = nextMdl)
{
nextMdl = currentMdl->Next;
if (currentMdl->MdlFlags & MDL_PAGES_LOCKED)
{
MmUnlockPages(currentMdl);
}
IoFreeMdl(currentMdl);
}
}
Clearly, you are only supposed to call MmUnlockPages if the pages were indeed locked...
@epiciskandar
Now about what usbipd-win
can do to avoid this. First thing is to figure out why there is a shortage of resources. I don't think a memory leak in user mode code can cause this (user mode memory is all pageable). I also don't think there is a memory leak in VBoxUsb; somebody should have noticed this before and I've done transfers of many gigabytes over USB.
What I think that could be happening is "too many outstanding URBs". Maybe Linux is queuing URBs faster than they are completed. usbipd-win
just forwards every URB to VBoxUsb, without limitation. If the completion rate is lower than the submission rate, then surely you will run out of some resource. This also matches with your observation that bigger files cause it, smaller ones don't.
Fortunately, if you run usbipd server
on the console, with debug logging, then it also logs the number of pending requests. Can you find out if this number is increasing (before the BSOD hits you)? Numbers up to 10 are normal, 20 is the maximum I have ever seen...
Wow, this is really a impressive analyzing, I'm almost believing this is the real corruption point 😃. OK I will try the suggestion and watch that request count.
Yes you are right, with verbose logging server instance, it does not crash to BSOD anymore. due to logging time consumption?
I'm attaching the last parts of the logs, in case you can comfirm something.
trce: UsbIpServer.AttachedClient[1001]
actual: 8, requested: 512
trce: UsbIpServer.AttachedClient[1001]
USBIP_CMD_SUBMIT, seqnum=1127, flags=512, length=512, ep=3
trce: UsbIpServer.AttachedClient[1001]
actual: 24, requested: 512
trce: UsbIpServer.AttachedClient[1001]
USBIP_CMD_SUBMIT, seqnum=1128, flags=0, length=24, ep=2
trce: UsbIpServer.AttachedClient[1001]
actual: 24, requested: 24
trce: UsbIpServer.AttachedClient[1001]
USBIP_CMD_SUBMIT, seqnum=1129, flags=512, length=512, ep=3
trce: UsbIpServer.AttachedClient[1001]
USBIP_CMD_SUBMIT, seqnum=1130, flags=0, length=24, ep=2
trce: UsbIpServer.AttachedClient[1001]
actual: 24, requested: 512
trce: UsbIpServer.AttachedClient[1001]
actual: 24, requested: 24
trce: UsbIpServer.AttachedClient[1001]
USBIP_CMD_SUBMIT, seqnum=1131, flags=512, length=512, ep=3
Now I will try again without logging instance, it's highly possible down again, so I need to leave informations before starting.
Edit: without logging server instance, BSOD happened again.
This is definitely a bug in the VBoxUsb.sys driver
@dorssel are you planning to create an new issue for VBox's VBoxUsb.sys implementation?
@mi-hol Eventually, yes. But from experience I know it is quicker to work around it. So, we'll do that first.
I know it is quicker to work around it
Any idea how to work around this? I can compile and test locally on my environment, this driver problem still sometimes annoying me even after running with verbose logging.
@epiciskandar To be honest. Not really. I thought it was the queue depth. But since the latest master build still exhibits the problem, that cannot be it. The root cause is still unknown. You yourself had some success by running in debug mode. That changes the timing/performace, which seems to help. But any BSOD is always a driver problem; user mode software (like usbipd-win) cannot cause it, even if you try to (at least in theory...).
That changes the timing/performace, which seems to help
Follow this theory, what if I continue adding time consumption on certain steps, could that reducing the crashing possibility ?
currently the crash rate is about 20%, if it could reduced to less than 5%, that would be a big improvement to me now.
@epiciskandar You know better than I do. So far, you're the only one that can reproduce this. I've tried, but I never got a BSOD. The timing is jut a guess, since you reported that running in debug mode made it a little better. But the root cause is unclear. I found some suspicious code in VBoxUsb, but I cannot reverse engineer why/when/how that would be triggered. All my guessing is based on your reports...
Well, sounds a bad situation for me... Looks eventually I have no choice but digging into this project now and trying to figure out what's really going on back there.
Edit: adding 5ms sleeping to SUPUSB_IOCTL.SEND_URB
looks working, but that dramatically lower the performance. I know this is not trying to solve the real problem, but really acceptable to me now.🤔
BSOD +1. Just as @epiciskandar, every time I try to debug my App with Android studio, BSOD happens.
1: kd> !analyze -v
SYSTEM_PTE_MISUSE (da)
A driver has corrupted system PTEs.
Set HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management\TrackPtes
to a DWORD 3 value and reboot. If the same BugCheck occurs again the stack trace will
identify the offending driver.
Arguments:
Arg1: 0000000000000302, Type of error.
Arg2: ffffbb80d6f00000
Arg3: 0000000000000000
Arg4: 00000000000d6f00
Debugging Details:
------------------
KEY_VALUES_STRING: 1
Key : Analysis.CPU.mSec
Value: 2093
Key : Analysis.DebugAnalysisManager
Value: Create
Key : Analysis.Elapsed.mSec
Value: 6706
Key : Analysis.Init.CPU.mSec
Value: 2140
Key : Analysis.Init.Elapsed.mSec
Value: 74875
Key : Analysis.Memory.CommitPeak.Mb
Value: 97
FILE_IN_CAB: 042122-7765-01.dmp
DUMP_FILE_ATTRIBUTES: 0x1808
Kernel Generated Triage Dump
BUGCHECK_CODE: da
BUGCHECK_P1: 302
BUGCHECK_P2: ffffbb80d6f00000
BUGCHECK_P3: 0
BUGCHECK_P4: d6f00
BLACKBOXBSD: 1 (!blackboxbsd)
BLACKBOXNTFS: 1 (!blackboxntfs)
BLACKBOXPNP: 1 (!blackboxpnp)
BLACKBOXWINLOGON: 1
CUSTOMER_CRASH_COUNT: 1
PROCESS_NAME: usbipd.exe
DPC_STACK_BASE: FFFFB80778A37FB0
STACK_TEXT:
ffffb807`78a37318 fffff807`7ed3b10c : 00000000`000000da 00000000`00000302 ffffbb80`d6f00000 00000000`00000000 : nt!KeBugCheckEx
ffffb807`78a37320 fffff807`7ed3aca9 : 00000000`00000d4e ffff868f`412133f0 00000000`00000000 ffff868f`4fba2e60 : nt!MiReleasePtes+0x3ec
ffffb807`78a37470 fffff807`7ed389a1 : 00000000`00000000 ffff868f`4fba2e60 00000000`00000000 fffff807`a3224854 : nt!MmUnmapLockedPages+0x179
ffffb807`78a374e0 fffff807`ffa72ba1 : ffff868f`4fba2e60 ffff868f`41213400 ffff868f`33230dd0 00000000`00000000 : nt!MmUnlockPages+0x71
ffffb807`78a37580 ffff868f`4fba2e60 : ffff868f`41213400 ffff868f`33230dd0 00000000`00000000 ffff868f`40ccf443 : VBoxUSB+0x2ba1
ffffb807`78a37588 ffff868f`41213400 : ffff868f`33230dd0 00000000`00000000 ffff868f`40ccf443 fffff807`7ed35db7 : 0xffff868f`4fba2e60
ffffb807`78a37590 ffff868f`33230dd0 : 00000000`00000000 ffff868f`40ccf443 fffff807`7ed35db7 ffff868f`3934c770 : 0xffff868f`41213400
ffffb807`78a37598 00000000`00000000 : ffff868f`40ccf443 fffff807`7ed35db7 ffff868f`3934c770 00000000`00000000 : 0xffff868f`33230dd0
SYMBOL_NAME: VBoxUSB+2ba1
MODULE_NAME: VBoxUSB
IMAGE_NAME: VBoxUSB.sys
STACK_COMMAND: .cxr; .ecxr ; kb
BUCKET_ID_FUNC_OFFSET: 2ba1
FAILURE_BUCKET_ID: 0xDA_VBoxUSB!unknown_function
OSPLATFORM_TYPE: x64
OSNAME: Windows 10
FAILURE_ID_HASH: {82481b05-1d94-979d-554d-84d1270c9edb}
Followup: MachineOwner
---------
1: kd> !blackboxbsd
Version: 0xc0
Product type: 1
1: kd> !blackboxntfs
NTFS Blackbox Data
0 Slow I/O Timeout Records Found
0 Oplock Break Timeout Records Found
1: kd> !blackboxpnp
PnpActivityId : {00000000-0000-0000-0000-000000000000}
PnpActivityTime : 132950233952334558
PnpEventInformation: 2
PnpEventInProgress : 0
PnpProblemCode : 24
PnpVetoType : 0
DeviceId : USB\VID_22D9&PID_2772\532916e2
VetoString :
1: kd> lmvm VBoxUSB
Browse full module list
start end module name
fffff807`ffa70000 fffff807`ffaa5000 VBoxUSB T (no symbols)
Loaded symbol image file: VBoxUSB.sys
Image path: VBoxUSB.sys
Image name: VBoxUSB.sys
Browse all global symbols functions data
Timestamp: Tue Oct 19 01:50:33 2021 (616DB3E9)
CheckSum: 0003A1FC
ImageSize: 00035000
Translations: 0000.04b0 0000.04e4 0409.04b0 0409.04e4
Information from resource tables:
After I replaced VBoxUSB with that from VirtualBox-6.1.34-150636, BOSD is gone.
@yodamaster Thanks for investigating this! I will update the driver that usbipd-win ships with to this version.
Strangely enough, there were no code changes in VBoxUsb itself (see https://www.virtualbox.org/browser/vbox/trunk/src/VBox/HostDrivers/VBoxUSB/win/dev), but there may have been something in libusb that is linked in...
@yodamaster @epiciskandar I've created PR #354 that updates the drivers in the installer. The installer is at https://github.com/dorssel/usbipd-win/actions/runs/2204188147. Could you please test if this solves the problem?
@yodamaster @epiciskandar I've created PR #354 that updates the drivers in the installer. The installer is at https://github.com/dorssel/usbipd-win/actions/runs/2204188147. Could you please test if this solves the problem?
New installer is much better, BOSD happens only once today, so I can bare it at the moment. Anyway, thanks a lot!
These days I noticed the BSOD happens only be relevant to adb debugging, not the transfering stuff, and these two things always happen one by one(install and then debugger attaching it). If I transfer and install the .apk file manually, no BSOD will happen.
anyway, I'm tring this new package.
Now released in 2.3.0.
Not much frequently working with usbipd these days, but still happened once, still keep trying.
currently works fine, barely happen again, close this issue.
@klaus-vb For your information, please have a look at https://github.com/dorssel/usbipd-win/issues/248#issuecomment-1030681671
BSOD happened many times today, I've updated Windows 11 to latest beta build, not sure if this is related. So, this problem still exists apparently. @dorssel 😔
@epiciskandar That's sad... Can you confirm the post-mortem still points at VBoxUSB.sys? I have notified @klaus-vb from VirtualBox.
Can you confirm the post-mortem still points at VBoxUSB.sys?
Confirmed.
# Child-SP RetAddr Call Site
00 fffff803`57f43b28 fffff803`5a09d54a nt!KeBugCheckEx
01 fffff803`57f43b30 fffff803`5a09d2f1 nt!MiReleasePtes+0x20a
02 fffff803`57f43c80 fffff803`5a09c5eb nt!MmUnmapLockedPages+0x191
03 fffff803`57f43cf0 fffff803`a0f62ba1 nt!MmUnlockPages+0x6b
04 fffff803`57f43d90 fffff803`5a09b5e4 VBoxUSB+0x2ba1
05 fffff803`57f43dc0 fffff803`5a09b417 nt!IopfCompleteRequest+0x1b4
06 fffff803`57f43eb0 fffff803`5e1c3cc6 nt!IofCompleteRequest+0x17
07 (Inline Function) --------`-------- Wdf01000!FxIrp::CompleteRequest+0x13 [minkernel\wdf\framework\shared\inc\private\km\FxIrpKm.hpp @ 75]
08 fffff803`57f43ee0 fffff803`5e1c2031 Wdf01000!FxRequest::CompleteInternal+0x246 [minkernel\wdf\framework\shared\core\fxrequest.cpp @ 869]
SYSTEM_PTE_MISUSE (da)
A driver has corrupted system PTEs.
I have similar issue. I cannot connect a Logitech F710 Joystick controller to WSL2. The joystick has a Direct input mode and a XInput mode.
In direct input mode windows recognizes the joystick differently and i cannot navigate in the basic windows menu, but in xinput mode windows recognizes the joystick as an XBOX 360 controller and I can navigate with it in the windows menu and this online tester also recognizes it. https://gamepad-tester.com/ (Its not recognized in direct input mode by the online tester.
I can attach the joystick to WSL2 in Direct input mode, but its not working just as its not working in windows. When I switch to X input mode, i cannot attach the joystick to WSL2 , I get the following error:
After restart I run usbipd wsl list in powershell, and the joystick is listed as XBOX 360 controller, but it's not recognized by the online tester and I cannot navigate with it in the windows menu, I had luck attaching the joystick to WSL2 this way, I could list the device on the WSL2 side with command : lsusb.
However, I tried to use the online tester in WSL2 and it did not recognize the joystick, I tried jstest-gtk which I installed in WSL2, still no luck.
After pulling out the receiver from my USB port and putting it back, the windows could recognize the joystick, but then I could not attach it again to WSL2. I got the same errors like in the picture above.
I have the following system:
Dell laptop Intel(R) Core(TM) i5-6440HQ CPU @ 2.60GHz 2.60 GHz NVIDIA GeForce 940MX
Windows 10 Enterprise 21H2 Build number: 19044.1706 WSL 2 Kernel version: 5.10.102.1 Ubuntu 20.04 inside WSL2 USBIPD : 2.3.0+42.Branch.master.Sha.3d9f5c5acc4e133ab8147684ad1463cbaec43240
Please let me know what I'm doing wrong, or is this an issue with USBIPD?
Update:
I also tried my laptops integrated webcam and integrated bluetooth (They both work without any issue from windows side if not attached). I can attach them without any issue, but they are not recognized by the WSL Ubuntu.
As much as I could read about this, should I build my custom WSL kernel to make this work?
I am reproducing the SYSTEM_PTE_MISUSE on W11 stable branch on WSL2, latest usbipd-win release. Same steps as OP
Same here.
It happens pretty much everytime I try to debug a big apk for Android (consisting of two 350+MB .so).
usbipd-win 2.3.0
Microsoft Windows [Version 10.0.22000.739]
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************
SYSTEM_PTE_MISUSE (da)
A driver has corrupted system PTEs.
Set HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management\TrackPtes
to a DWORD 3 value and reboot. If the same BugCheck occurs again the stack trace will
identify the offending driver.
Arguments:
Arg1: 0000000000000302, Type of error.
Arg2: ffff8d0147e80000
Arg3: 0000000000000000
Arg4: 0000000000147e80
Debugging Details:
------------------
Page 13442c not present in the dump file. Type ".hh dbgerr004" for details
Page 13442c not present in the dump file. Type ".hh dbgerr004" for details
Page 13442c not present in the dump file. Type ".hh dbgerr004" for details
Page 13442c not present in the dump file. Type ".hh dbgerr004" for details
Page 13442c not present in the dump file. Type ".hh dbgerr004" for details
Page 13442c not present in the dump file. Type ".hh dbgerr004" for details
Page 13442c not present in the dump file. Type ".hh dbgerr004" for details
Page 13442c not present in the dump file. Type ".hh dbgerr004" for details
Page 13442c not present in the dump file. Type ".hh dbgerr004" for details
Page 13442c not present in the dump file. Type ".hh dbgerr004" for details
Page 13442c not present in the dump file. Type ".hh dbgerr004" for details
Page 13442c not present in the dump file. Type ".hh dbgerr004" for details
Page 13442c not present in the dump file. Type ".hh dbgerr004" for details
Page 13442c not present in the dump file. Type ".hh dbgerr004" for details
Page 13442c not present in the dump file. Type ".hh dbgerr004" for details
Page 13442c not present in the dump file. Type ".hh dbgerr004" for details
Page 13442e not present in the dump file. Type ".hh dbgerr004" for details
Page 13442e not present in the dump file. Type ".hh dbgerr004" for details
Page 13442e not present in the dump file. Type ".hh dbgerr004" for details
Page 13442e not present in the dump file. Type ".hh dbgerr004" for details
Page 13442e not present in the dump file. Type ".hh dbgerr004" for details
Page 13442e not present in the dump file. Type ".hh dbgerr004" for details
Page 13442e not present in the dump file. Type ".hh dbgerr004" for details
Page 13442e not present in the dump file. Type ".hh dbgerr004" for details
Page 13442e not present in the dump file. Type ".hh dbgerr004" for details
Page 13442e not present in the dump file. Type ".hh dbgerr004" for details
Page 13442e not present in the dump file. Type ".hh dbgerr004" for details
Page 13442e not present in the dump file. Type ".hh dbgerr004" for details
Page 13442e not present in the dump file. Type ".hh dbgerr004" for details
Page 13442e not present in the dump file. Type ".hh dbgerr004" for details
Page 13442e not present in the dump file. Type ".hh dbgerr004" for details
Page 13442e not present in the dump file. Type ".hh dbgerr004" for details
Page 13442e not present in the dump file. Type ".hh dbgerr004" for details
Page 13442e not present in the dump file. Type ".hh dbgerr004" for details
Page 13442e not present in the dump file. Type ".hh dbgerr004" for details
Page 13442e not present in the dump file. Type ".hh dbgerr004" for details
Page 132330 not present in the dump file. Type ".hh dbgerr004" for details
Page 132330 not present in the dump file. Type ".hh dbgerr004" for details
Page 132330 not present in the dump file. Type ".hh dbgerr004" for details
Page 132330 not present in the dump file. Type ".hh dbgerr004" for details
Page 132330 not present in the dump file. Type ".hh dbgerr004" for details
Page 132330 not present in the dump file. Type ".hh dbgerr004" for details
Page 132330 not present in the dump file. Type ".hh dbgerr004" for details
Page 132330 not present in the dump file. Type ".hh dbgerr004" for details
Page 132330 not present in the dump file. Type ".hh dbgerr004" for details
Page 132330 not present in the dump file. Type ".hh dbgerr004" for details
Page 132330 not present in the dump file. Type ".hh dbgerr004" for details
Page 132330 not present in the dump file. Type ".hh dbgerr004" for details
KEY_VALUES_STRING: 1
Key : Analysis.CPU.mSec
Value: 2296
Key : Analysis.DebugAnalysisManager
Value: Create
Key : Analysis.Elapsed.mSec
Value: 5037
Key : Analysis.Init.CPU.mSec
Value: 405
Key : Analysis.Init.Elapsed.mSec
Value: 29007
Key : Analysis.Memory.CommitPeak.Mb
Value: 112
Key : Bugcheck.DumpVsMemoryMatch
Value: True
Key : Dump.Attributes.AsUlong
Value: 1800
Key : WER.OS.Branch
Value: co_release
Key : WER.OS.Timestamp
Value: 2021-06-04T16:28:00Z
Key : WER.OS.Version
Value: 10.0.22000.1
FILE_IN_CAB: MEMORY.DMP
DUMP_FILE_ATTRIBUTES: 0x1800
BUGCHECK_CODE: da
BUGCHECK_P1: 302
BUGCHECK_P2: ffff8d0147e80000
BUGCHECK_P3: 0
BUGCHECK_P4: 147e80
BLACKBOXBSD: 1 (!blackboxbsd)
BLACKBOXNTFS: 1 (!blackboxntfs)
BLACKBOXPNP: 1 (!blackboxpnp)
BLACKBOXWINLOGON: 1
PROCESS_NAME: System
STACK_TEXT:
fffff805`3ae00f78 fffff805`3f08bc2c : 00000000`000000da 00000000`00000302 ffff8d01`47e80000 00000000`00000000 : nt!KeBugCheckEx
fffff805`3ae00f80 fffff805`3f08b7c9 : 00000000`00000d4e ffffc305`275e41e0 00000000`00000000 ffffc305`24855c30 : nt!MiReleasePtes+0x3ec
fffff805`3ae010d0 fffff805`3f0894c1 : 00000000`00000000 ffffc305`24855c30 00000000`00000000 fffff805`3f0817fd : nt!MmUnmapLockedPages+0x179
fffff805`3ae01140 fffff805`52c52ba1 : ffffc305`24855c30 ffffc305`275e41f0 ffffc305`1598c440 00000000`00000000 : nt!MmUnlockPages+0x71
fffff805`3ae011e0 fffff805`3f0868d7 : ffffc305`265edce0 00000000`00000000 fffff805`3ae012b9 ffffc305`31238b3b : VBoxUSB+0x2ba1
fffff805`3ae01210 fffff805`3f086797 : ffffc305`31238750 00000000`00000000 ffffc305`24ad8a00 00000000`00000001 : nt!IopfCompleteRequest+0x127
fffff805`3ae01320 fffff805`40cc8ad0 : ffffc305`31238750 00000000`00000001 00000000`00000002 fffff805`3ae01400 : nt!IofCompleteRequest+0x17
fffff805`3ae01350 fffff805`40cc885f : ffffc305`31238750 fffff805`40cd7240 ffffc305`159ab850 00000000`00000000 : Wdf01000!FxRequest::CompleteInternal+0x240 [minkernel\wdf\framework\shared\core\fxrequest.cpp @ 869]
fffff805`3ae013e0 fffff805`5009e370 : 00000000`ffffff02 ffffc305`248ecab0 ffffc305`24ad8de0 ffffc305`24ad8de0 : Wdf01000!imp_WdfRequestComplete+0x8f [minkernel\wdf\framework\shared\core\fxrequestapi.cpp @ 436]
fffff805`3ae01440 fffff805`5009e1b1 : ffffc305`248ecc50 00000000`00000000 ffffc305`248ecce0 fffff805`3ae01658 : USBXHCI!Bulk_Transfer_CompleteCancelable+0xc8
fffff805`3ae014a0 fffff805`5009dfa0 : 00000000`00000004 fffff805`3ae01610 00000000`00000000 ffffc305`24de0a30 : USBXHCI!Bulk_ProcessTransferEventWithED1+0x1fd
fffff805`3ae01550 fffff805`50093938 : 00000000`00000004 fffff805`3ae01628 00000000`00000008 fffff805`3ae01630 : USBXHCI!Bulk_EP_TransferEventHandler+0x10
fffff805`3ae01580 fffff805`50093188 : ffffc305`15721630 00000001`00000000 ffffc305`157f0df0 ffffc305`15721630 : USBXHCI!Endpoint_TransferEventHandler+0xa8
fffff805`3ae015e0 fffff805`50092b9c : 00000000`00000000 00000000`00000000 0000013a`bf24f61c 00000000`00000000 : USBXHCI!Interrupter_DeferredWorkProcessor+0x5d8
fffff805`3ae016e0 fffff805`40cc25f5 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : USBXHCI!Interrupter_WdfEvtInterruptDpc+0xc
fffff805`3ae01710 fffff805`3f126f71 : fffff805`3ae01ac0 00000000`00000000 fffff805`3aa5f4c0 00000000`00000000 : Wdf01000!FxInterrupt::_InterruptDpcThunk+0xa5 [minkernel\wdf\framework\shared\irphandlers\pnp\km\interruptobjectkm.cpp @ 404]
fffff805`3ae01750 fffff805`3f125f72 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiExecuteAllDpcs+0x491
fffff805`3ae01950 fffff805`3f21b79e : 00000000`00000000 fffff805`3aa5c180 fffff805`3fb35bc0 ffffc305`1ed87080 : nt!KiRetireDpcList+0x2a2
fffff805`3ae01c00 00000000`00000000 : fffff805`3ae02000 fffff805`3adfb000 00000000`00000000 00000000`00000000 : nt!KiIdleLoop+0x9e
SYMBOL_NAME: VBoxUSB+2ba1
MODULE_NAME: VBoxUSB
IMAGE_NAME: VBoxUSB.sys
STACK_COMMAND: .cxr; .ecxr ; kb
BUCKET_ID_FUNC_OFFSET: 2ba1
FAILURE_BUCKET_ID: 0xDA_VBoxUSB!unknown_function
OS_VERSION: 10.0.22000.1
BUILDLAB_STR: co_release
OSPLATFORM_TYPE: x64
OSNAME: Windows 10
FAILURE_ID_HASH: {82481b05-1d94-979d-554d-84d1270c9edb}
Followup: MachineOwner
@nathanprat Thanks for the debug info, this really helps!
Can you answer these (some questions seem "silly", but please just confirm them to rule it out):
1) Are there any USB filter warnings reported for usbipd list
?
2) Is the device on a USB2 or USB3 port?
3) Is the device itself USB3 capable?
4) Are all the hubs in the chain to the device port using stock Microsoft drivers, or are there any vendor drivers involved?
5) What is the Linux driver type when accessing the device (serial? storage? other?)
6) You are writing to the device, correct? Or is it reading?
7) Is your system in any way low on memory or low on other resources right before the crash?
8) Can you try to grab a USB dump as described in https://github.com/dorssel/usbipd-win/wiki/Troubleshooting#usb-capture? I understand the crash itself will truncate/corrupt the dump, but hopefully the last few seconds before the crash may provide some information...
Your dump contains useful information that we didn't have before. Here is my analysis:
SYSTEM_PTE_MISUSE (da) ... Arg1: 0000000000000302, Type of error.
From https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/bug-check-0xda--system-pte-misuse:
The caller is attempting to release a system address that is not currently mapped.
And also:
... nt!MmUnlockPages+0x71 ... VBoxUSB+0x2ba1 ... nt!IopfCompleteRequest+0x127
This contradicts my earlier suspicion in https://github.com/dorssel/usbipd-win/issues/248#issuecomment-1030681671; instead it is now clear this is the MmUnlockPages
during IRP completion, on line 1204 of VBoxUsbRt.cpp. From the code, it seems this can only be reached exactly once, after the IRP has indeed completed. There is a weird comment in the completion code:
case ((USBD_STATUS)0xC0010000L): // USBD_STATUS_CANCELED - too bad usbdi.h and usb.h aren't consistent! /// @todo What the heck are we really supposed to do here? pUrbInfo->error = USBSUP_XFER_STALL; Status = STATUS_SUCCESS; break;
But this is in my opinion not related to the MDL lifetime. So, it looks like the MDL must have been wrong at creation time already. The MDL is created on line 1256, and that seems to be all correct. The only thing I can find wrong with the code is the PENDING part of the IRP. https://docs.microsoft.com/en-us/windows-hardware/drivers/ifs/example--simple-pass-through-dispatch-and-completion indicates that IoMarkIrpPending should not be called by the dispatcher, but instead conditionally by the completion routine.
What VBox does is:
IoMarkIrpPending
IoCallDriver
What Microsoft says you should do:
IoCallDriver
IoCallDriver
returnedPendingReturned
and call IoMarkIrpPending
if set.The only other thing I can think of is resource limitation. Maybe there are too many queued/pending URBs for VBox. I don't really see a hard limit, but if there is one then going over that limit may corrupt the internal structures. A USB capture should show that...
I'll pass this on to VirtualBox.
Thanks for the quick response! I will do my best to follow up.
1. Are there any USB filter warnings reported for `usbipd list`?
No, not as for as I can see.
2. Is the device on a USB2 or USB3 port?
The stacktrace in my first message was on a USB3 port, and the captures at the end of this message are with a USB2 port.
3. Is the device itself USB3 capable?
Not sure;
USB Device Tree Viewer
says:Device maximum Speed : High-Speed Device Connection Speed : High-Speed
(It’s a phone, so I would not be surprised if it is only USB2?)
4. Are all the hubs in the chain to the device port using stock Microsoft drivers, or are there any vendor drivers involved?
Not sure, see
USB Device Tree Viewer
: Both tries were using the front panel ports of my desktop.5. What is the Linux driver type when accessing the device (serial? storage? other?)
No idea. How can I know? Not sure if related but I am using
adb
directly; not MTP/PTP.6. You are _writing_ to the device, correct? Or is it _reading_?
I used to think the problem was when writing b/c it happens as soon as Android Studio shows "install...". But when trying to reproduce I did manage to upload the 350MB library using
adb push
without issue.7. Is your system in any way low on memory or low on other resources right before the crash?
Not really. The original stacktrace was probably around 90% memory use of the windows side, and around 50% on the WSL2 side. But when reproducing I did a
drop_caches
just after building and before starting to debug and I still got a BSOD. At the time of the crash Windows was around the 50-60% RAM used.8. Can you try to grab a USB dump as described in https://github.com/dorssel/usbipd-win/wiki/Troubleshooting#usb-capture? I understand the crash itself will truncate/corrupt the dump, but hopefully the last few seconds before the crash may provide some information...
adb_push_direct_USB2_after_building.pcap
is just using adb push /THE/BIG/SHARED_LIB.SO /sdcard/Downloads/
which worked fineadb_debug_USB2_drop_caches.pcapng
is clicking on Android Studio "Start Debugging" which caused a BSOD
Those 2 captures are just a few minutes apart, without changing USB port or anything else.Hope that helps. Tell my if you need anything else.
@dorssel Did you get any response from VirtualBox on your findings above?
@d0n13 Unfortunately, no. Not a single reaction...
@d0n13 Unfortunately, no. Not a single reaction...
Still no response. This bug has killed us from using this and I really hope it gets fixed. Is there anything else that could be done? How hard to write this driver instead of depending on Orace?
Also seeing this with vboxusb referenced in the crash report. Is there a bug to follow on the Oracle side?
As an additional data point: I seem to be able to run our program in VirtualBox against USB w/o issue. Its just WSL2 + USBIP that are causing a failure.
I encountered this problem when connecting remotely from a Ubuntu Server LTS 22.04 too. It works fine with fastboot
but would get BSOD on the Windows side when using adb sideload
.
@typeless Seems that when you send a lot of data across the link the problem arises. I was using adb and libmobiledevice without issues until I started to use sideloading too.
I wonder can we replace it with a different driver? How complex is it to write such a driver?
@dorssel can you email me at donie dot kelly at g mail dot com on this issue? We would like you to work on it and are willing to pay for your time. Possible?
Does anyone know how we can escalate this with somebody who might know how to fix the driver issue here? Seems like it should be straightforward to somebody who is familiar with the code as @dorssel has shown what the issue may be above.
This happens to me whenever I try to use adb sideload , specifically adb sideload (I dont have issues pushing multi GB files). Whenever I use adb sideload inside wsl2 , it crashes with BSOD (pte_misuse)
I wonder can we replace it with a different driver? How complex is it to write such a driver?
@d0n13, I understand your frustration. Long ago, I was a kernel-mode developer in Windows. I've had my share of tracking backwards from bugchecks. I do have some recommendations. At the same time, please note my knowledge may no longer be 100% current...
A driver has corrupted system PTEs.
Set HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management\TrackPtes
to a DWORD 3 value and reboot. If the same BugCheck occurs again the stack trace will
identify the offending driver.
WARNING
Only mess with driver verifier if you are OK with your computer
being unavailable ... ensure you have another computer around,
in case you get stuck. Oh, and backup your bitlocker recovery
key, if you're using bitlocker.
Driver verifier allows you to enable additional validation of how a driver behaves. It's built into Windows. It may make your machine crash more often, but the bonus is that, when it does, it often gives very specific information on what driver violated a rule, and the stack trace will often show the exact lines of code that are guilty.
See https://gist.github.com/henrygab/044400844e1a8f3cfa730a66cc306d94
Since the bugcheck here appears to repeatedly be that system PTEs are being corrupted, it's likely a bug in a driver, and driver verifier is very likely going to find exactly who is causing the problem, and where.
While I was a kernel mode developer, my knowledge is likely out-of-date.
I cannot commit to any help beyond providing these informational pointers.
At the same time, it seems like @dorrsel has the expertise to analyze
memory dumps and the stack traces that !analyze -v
(from WinDbg
Preview debugger) shows ... so you seem to be in good hands!
The bugcheck itself gave a recommendation. Has anyone followed those instructions first?
Yes I've tried that once after I first time got kernel dump information, I'm not 100% understanding what this does, but I tried it. Unfortunatly nothing new brought by done that, or maybe I'm not good at WinDbg so I didn't find the point.
If the above isn't enough, have you tried to enable driver verifier?
I missed this message, I would try that when convienent.
Yes ... or maybe I'm not good at WinDbg
Even knowing what WinDbg is puts you in an elite field of experts. :)
If the above isn't enough, have you tried to enable driver verifier? I missed this message, I would try that when convienent.
Driver verifier is amazing. Sure, the overhead at times results in a less responsive computer, but when it finds a violation ... so much wasted debug effort avoided. For this purpose, especially as the issue is fairly reproducible, I think the first set of options (where the culprit driver is fairly likely to be known) are likely to bear fruit with a bugcheck that very specifically calls out what violation occurred, and what the code should have done.
If you dabble in "WDM" Windows drivers, then Driver Verifier needs to be part of every development execution of the driver and part of the standard test passes (imho, of course).
IIRC, both KMDF and UMDF also have similar verifier functionality available.
And of course, there's also an "AppVerifier" for user-mode applications.
Development on the Windows platform is made better by such tools.
(imho ... I was formerly a Windows kernel-mode dev, and am still employed there).
Figured I'd throw in my WinDbg analysis to see if it gives you anything new:
Figured I'd throw in my WinDbg analysis to see if it gives you anything new:
Bug Check 1 Bug Check 2 (TrackPtes enabled)
I hope this helps somebody figure this out. I have a question though. If this is figured and a code change is required where is the official repo for the drivers? Can this be built by anyone? Does it require signing etc?
Thanks for your work @Tsuser1
This definitely points the finger (whether correctly or not) at VBoxUSB as the culprit. Who owns that driver? Can you build it so you have debug symbols and source available?
The driver is maintained by the Oracle VM VirtualBox dev team. It is open source, and anyone in principle can build it (but for I assume numerous reasons everyone in the audience is avoiding the big effort). That said, we've just applied the fix, but we're skeptical that it really addresses this issue (which we have never seen in the context of VirtualBox). So far there's no testbuild out yet which has the modified code (and our testbuilds does not include attestation signing drivers by Microsoft, just to make this clear).
Hi @klaus-vb, nice work. Any idea if the fix will be reviewed by Oracle and included in a future build? Where is the repo?
As I said, we've applied the fix already (that's what we changed: https://www.virtualbox.org/changeset/98327/vbox ) and it will be included in future builds, at least the 7.0.x ones. Sometime next week we might have a test build which should be usable for people who have secure boot disabled. The subversion repo of VirtualBox is at https://www.virtualbox.org/svn/vbox/trunk/ We're appreciating contributions if it's reasonably clear what they achieve. As I wrote already, we're not sure if this changes anything, but on the other hand it also shouldn't do harm. The code should be buildable by others (I know that several people have built VirtualBox successfully - as a whole, we have no reason to invest time into making just the USB driver separately buildable without the big list of prerequisites for VirtualBox as a whole). Either way, building drivers needs a lot of knowledge about the signing bells and whistles. Assume it would take several weeks to get the necessary certs (for $$), with a very steep learning curve and several nervous breakdowns.
we're not sure if this changes anything, but on the other hand it also shouldn't do harm.
Kudos to the VirtualBox team for fixing a bug that did not appear in their own testing.
I encourage, if they do not already do so, the VirtualBox team to use DriverVerifier.exe
in its most pendantic modes, to further strengthen their driver.
Again, glad that (at least one of) the related bugs is being resolved! Hoping this hear, when the signed driver appears, that this fully resolves the issues....
@dorssel will you be able to release a version of usbip with the updated driver once it’s made available? Waiting eagerly to test :)
This BSOD could be reproduced on both the two PCs, so it should also be reproduced on other PCs too.
Environment differents:
Both enabled WSL2 and installed Ubuntu 20.04, and usbipd-win(2.0 on Windows 11, and 2.1 on Windows 10) in the host Windows system.
reproducing steps: (for step 1 to step 5, is just normal steps using
usbipd
, and step 6 should be the real reason for BSOD)usbipd list
, then executeusbipd attach -b x-y
lsusb
adb devices
could find the device in WSL.adb install yuanshen_2.4.0.apk
SYSTEM_PTE_MISUSED
I've tried other apk files, but not many enough to finding the threshold of the file size triggering BSOD. But if the apk file size is larger than 100MB, the BSOD occures.
after restarting, I've invesgated the system coredump file with
WinDBG
, some important info listed here:I'm not much good at kernel debugging, so if any information is needed to helping find the core reason, please let me know.