dorssel / usbipd-win

Windows software for sharing locally connected USB devices to other machines, including Hyper-V guests and WSL 2.
GNU General Public License v3.0
3.6k stars 229 forks source link

BSOD while using usbipd-win with wsl2 and Android device #248

Closed epiciskandar closed 1 year ago

epiciskandar commented 2 years ago

This BSOD could be reproduced on both the two PCs, so it should also be reproduced on other PCs too.

Environment differents:

Both enabled WSL2 and installed Ubuntu 20.04, and usbipd-win(2.0 on Windows 11, and 2.1 on Windows 10) in the host Windows system.

reproducing steps: (for step 1 to step 5, is just normal steps using usbipd, and step 6 should be the real reason for BSOD)

  1. connect Android device with USB cable, I've tested 3 different Android phones, and they all could got the same result.
  2. with elevated Powershell, find the busid with usbipd list, then execute usbipd attach -b x-y
  3. in the WSL2 Ubuntu, ensure device connected successfullly with lsusb
  4. configuring udev stuffs, and restart corresponding service.
  5. reconnect device and reattach again, make sure adb devices could find the device in WSL.
  6. install a big enough apk with adb: adb install yuanshen_2.4.0.apk
  7. BSOD! the code is SYSTEM_PTE_MISUSED

I've tried other apk files, but not many enough to finding the threshold of the file size triggering BSOD. But if the apk file size is larger than 100MB, the BSOD occures.

after restarting, I've invesgated the system coredump file with WinDBG, some important info listed here:

0: kd> !analyze -v
SYSTEM_PTE_MISUSE (da)
A driver has corrupted system PTEs.
Set HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management\TrackPtes
to a DWORD 3 value and reboot.  If the same bugcheck occurs again the stack trace will
identify the offending driver.
Arguments:
Arg1: 0000000000000302, Type of error.
Arg2: ffffb801275c0000
Arg3: 0000000000000000
Arg4: 00000000001275c0

<stripped>

DEFAULT_BUCKET_ID:  WIN8_DRIVER_FAULT

<stripped>

STACK_TEXT:  
fffff802`0932c0c8 fffff802`0c03baf3 : 00000000`000000da 00000000`00000302 ffffb801`275c0000 00000000`00000000 : nt!KeBugCheckEx
fffff802`0932c0d0 fffff802`0be83c8f : 00000000`00000000 00000000`00000000 00000000`00000000 fffff6fb`7dbed000 : nt!MiReleasePtes+0x1b73d3
fffff802`0932c220 fffff802`0a252ba1 : ffff980d`ee0507b0 ffff980d`f5f6018b ffff980d`e782a520 00000000`00000000 : nt!MmUnlockPages+0x2ff
fffff802`0932c310 ffff980d`ee0507b0 : ffff980d`f5f6018b ffff980d`e782a520 00000000`00000000 ffff980d`ed948e53 : VBoxUSB+0x2ba1
fffff802`0932c318 ffff980d`f5f6018b : ffff980d`e782a520 00000000`00000000 ffff980d`ed948e53 fffff802`0be84ffe : 0xffff980d`ee0507b0

<stripped>

0: kd> !blackboxpnp
    PnpActivityId      : {00000000-0000-0000-0000-000000000000}
    PnpActivityTime    : 132885345313477917
    PnpEventInformation: 2
    PnpEventInProgress : 0
    PnpProblemCode     : 24
    PnpVetoType        : 0
    DeviceId           : USB\VID_2717&PID_FF48\23a40494 (**the Android device**)
    VetoString         : 

I'm not much good at kernel debugging, so if any information is needed to helping find the core reason, please let me know.

epiciskandar commented 2 years ago

it looks like a simple kernel buffer overflow issue happened? the VBoxUSB.sys signatured by Oracle and verified with Windows Driver Verifier. so, whom should I expected to solve this problem?

dorssel commented 2 years ago

@epiciskandar This is definitely a bug in the VBoxUsb.sys driver, but maybe we can work around it. In any case, it may be worthwhile to report this to VirtualBox a well (together with this analysis).

I found 2 occurrences of MmUnlockPages in the VBoxUsb code: 1) When a URB is created, but something fails half-way, then MmUnlockPages is part of the failure cleanup 2) When a URB is completed, after it was successfully sent.

The code for (2) looks OK: in that case the URB was fully created, sent, and completed. Nothing went wrong and the completion is very straightforward. Nothing weird there.

So (1) is the suspicious call. And I found something interesting: there is one code path where an MDL was successfully created, and probed (without exception), but the mapping fails:

From VBoxUsbRt.cpp

       /* For some reason, passing a MDL in the URB does not work reliably. Notably
         * the iPhone when used with iTunes fails.
         */
        PVOID pBuffer = MmGetSystemAddressForMdlSafe(pMdlBuf, NormalPagePriority);
        if (!pBuffer)
        {
            AssertMsgFailed((__FUNCTION__": MmGetSystemAddressForMdlSafe failed\n"));
            Status = STATUS_INSUFFICIENT_RESOURCES;
            break;
        }

And then later in the function, the failure cleanup is done with:

    if (pMdlBuf)
    {
        MmUnlockPages(pMdlBuf);
        IoFreeMdl(pMdlBuf);
    }

So, even if the MmGetSystemAddressForMdlSafe fails (and the comment section is already indicating that the developer thinks something fishy is going on...), then still MmUnlockPages is called. I think this call is the BSOD you are seeing.

The example code from Microsoft (https://docs.microsoft.com/en-us/windows-hardware/drivers/kernel/using-mdls):

VOID MyFreeMdl(PMDL Mdl)
{
    PMDL currentMdl, nextMdl;

    for (currentMdl = Mdl; currentMdl != NULL; currentMdl = nextMdl) 
    {
        nextMdl = currentMdl->Next;
        if (currentMdl->MdlFlags & MDL_PAGES_LOCKED) 
        {
            MmUnlockPages(currentMdl);
        }
        IoFreeMdl(currentMdl);
    }
} 

Clearly, you are only supposed to call MmUnlockPages if the pages were indeed locked...

dorssel commented 2 years ago

@epiciskandar Now about what usbipd-win can do to avoid this. First thing is to figure out why there is a shortage of resources. I don't think a memory leak in user mode code can cause this (user mode memory is all pageable). I also don't think there is a memory leak in VBoxUsb; somebody should have noticed this before and I've done transfers of many gigabytes over USB.

What I think that could be happening is "too many outstanding URBs". Maybe Linux is queuing URBs faster than they are completed. usbipd-win just forwards every URB to VBoxUsb, without limitation. If the completion rate is lower than the submission rate, then surely you will run out of some resource. This also matches with your observation that bigger files cause it, smaller ones don't.

Fortunately, if you run usbipd server on the console, with debug logging, then it also logs the number of pending requests. Can you find out if this number is increasing (before the BSOD hits you)? Numbers up to 10 are normal, 20 is the maximum I have ever seen...

epiciskandar commented 2 years ago

Wow, this is really a impressive analyzing, I'm almost believing this is the real corruption point 😃. OK I will try the suggestion and watch that request count.

epiciskandar commented 2 years ago

Yes you are right, with verbose logging server instance, it does not crash to BSOD anymore. due to logging time consumption?

I'm attaching the last parts of the logs, in case you can comfirm something.

trce: UsbIpServer.AttachedClient[1001]
      actual: 8, requested: 512
trce: UsbIpServer.AttachedClient[1001]
      USBIP_CMD_SUBMIT, seqnum=1127, flags=512, length=512, ep=3
trce: UsbIpServer.AttachedClient[1001]
      actual: 24, requested: 512
trce: UsbIpServer.AttachedClient[1001]
      USBIP_CMD_SUBMIT, seqnum=1128, flags=0, length=24, ep=2
trce: UsbIpServer.AttachedClient[1001]
      actual: 24, requested: 24
trce: UsbIpServer.AttachedClient[1001]
      USBIP_CMD_SUBMIT, seqnum=1129, flags=512, length=512, ep=3
trce: UsbIpServer.AttachedClient[1001]
      USBIP_CMD_SUBMIT, seqnum=1130, flags=0, length=24, ep=2
trce: UsbIpServer.AttachedClient[1001]
      actual: 24, requested: 512
trce: UsbIpServer.AttachedClient[1001]
      actual: 24, requested: 24
trce: UsbIpServer.AttachedClient[1001]
      USBIP_CMD_SUBMIT, seqnum=1131, flags=512, length=512, ep=3

Now I will try again without logging instance, it's highly possible down again, so I need to leave informations before starting.

Edit: without logging server instance, BSOD happened again.

mi-hol commented 2 years ago

This is definitely a bug in the VBoxUsb.sys driver

@dorssel are you planning to create an new issue for VBox's VBoxUsb.sys implementation?

dorssel commented 2 years ago

@mi-hol Eventually, yes. But from experience I know it is quicker to work around it. So, we'll do that first.

epiciskandar commented 2 years ago

I know it is quicker to work around it

Any idea how to work around this? I can compile and test locally on my environment, this driver problem still sometimes annoying me even after running with verbose logging.

dorssel commented 2 years ago

@epiciskandar To be honest. Not really. I thought it was the queue depth. But since the latest master build still exhibits the problem, that cannot be it. The root cause is still unknown. You yourself had some success by running in debug mode. That changes the timing/performace, which seems to help. But any BSOD is always a driver problem; user mode software (like usbipd-win) cannot cause it, even if you try to (at least in theory...).

epiciskandar commented 2 years ago

That changes the timing/performace, which seems to help

Follow this theory, what if I continue adding time consumption on certain steps, could that reducing the crashing possibility ?

currently the crash rate is about 20%, if it could reduced to less than 5%, that would be a big improvement to me now.

dorssel commented 2 years ago

@epiciskandar You know better than I do. So far, you're the only one that can reproduce this. I've tried, but I never got a BSOD. The timing is jut a guess, since you reported that running in debug mode made it a little better. But the root cause is unclear. I found some suspicious code in VBoxUsb, but I cannot reverse engineer why/when/how that would be triggered. All my guessing is based on your reports...

epiciskandar commented 2 years ago

Well, sounds a bad situation for me... Looks eventually I have no choice but digging into this project now and trying to figure out what's really going on back there.

Edit: adding 5ms sleeping to SUPUSB_IOCTL.SEND_URB looks working, but that dramatically lower the performance. I know this is not trying to solve the real problem, but really acceptable to me now.🤔

yodamaster commented 2 years ago

BSOD +1. Just as @epiciskandar, every time I try to debug my App with Android studio, BSOD happens.

1: kd> !analyze -v
SYSTEM_PTE_MISUSE (da)
A driver has corrupted system PTEs.
Set HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management\TrackPtes
to a DWORD 3 value and reboot.  If the same BugCheck occurs again the stack trace will
identify the offending driver.
Arguments:
Arg1: 0000000000000302, Type of error.
Arg2: ffffbb80d6f00000
Arg3: 0000000000000000
Arg4: 00000000000d6f00

Debugging Details:
------------------

KEY_VALUES_STRING: 1

    Key  : Analysis.CPU.mSec
    Value: 2093

    Key  : Analysis.DebugAnalysisManager
    Value: Create

    Key  : Analysis.Elapsed.mSec
    Value: 6706

    Key  : Analysis.Init.CPU.mSec
    Value: 2140

    Key  : Analysis.Init.Elapsed.mSec
    Value: 74875

    Key  : Analysis.Memory.CommitPeak.Mb
    Value: 97

FILE_IN_CAB:  042122-7765-01.dmp

DUMP_FILE_ATTRIBUTES: 0x1808
  Kernel Generated Triage Dump

BUGCHECK_CODE:  da
BUGCHECK_P1: 302
BUGCHECK_P2: ffffbb80d6f00000
BUGCHECK_P3: 0
BUGCHECK_P4: d6f00
BLACKBOXBSD: 1 (!blackboxbsd)
BLACKBOXNTFS: 1 (!blackboxntfs)
BLACKBOXPNP: 1 (!blackboxpnp)

BLACKBOXWINLOGON: 1
CUSTOMER_CRASH_COUNT:  1
PROCESS_NAME:  usbipd.exe

DPC_STACK_BASE:  FFFFB80778A37FB0

STACK_TEXT:  
ffffb807`78a37318 fffff807`7ed3b10c     : 00000000`000000da 00000000`00000302 ffffbb80`d6f00000 00000000`00000000 : nt!KeBugCheckEx
ffffb807`78a37320 fffff807`7ed3aca9     : 00000000`00000d4e ffff868f`412133f0 00000000`00000000 ffff868f`4fba2e60 : nt!MiReleasePtes+0x3ec
ffffb807`78a37470 fffff807`7ed389a1     : 00000000`00000000 ffff868f`4fba2e60 00000000`00000000 fffff807`a3224854 : nt!MmUnmapLockedPages+0x179
ffffb807`78a374e0 fffff807`ffa72ba1     : ffff868f`4fba2e60 ffff868f`41213400 ffff868f`33230dd0 00000000`00000000 : nt!MmUnlockPages+0x71
ffffb807`78a37580 ffff868f`4fba2e60     : ffff868f`41213400 ffff868f`33230dd0 00000000`00000000 ffff868f`40ccf443 : VBoxUSB+0x2ba1
ffffb807`78a37588 ffff868f`41213400     : ffff868f`33230dd0 00000000`00000000 ffff868f`40ccf443 fffff807`7ed35db7 : 0xffff868f`4fba2e60
ffffb807`78a37590 ffff868f`33230dd0     : 00000000`00000000 ffff868f`40ccf443 fffff807`7ed35db7 ffff868f`3934c770 : 0xffff868f`41213400
ffffb807`78a37598 00000000`00000000     : ffff868f`40ccf443 fffff807`7ed35db7 ffff868f`3934c770 00000000`00000000 : 0xffff868f`33230dd0

SYMBOL_NAME:  VBoxUSB+2ba1
MODULE_NAME: VBoxUSB
IMAGE_NAME:  VBoxUSB.sys
STACK_COMMAND:  .cxr; .ecxr ; kb
BUCKET_ID_FUNC_OFFSET:  2ba1
FAILURE_BUCKET_ID:  0xDA_VBoxUSB!unknown_function
OSPLATFORM_TYPE:  x64
OSNAME:  Windows 10
FAILURE_ID_HASH:  {82481b05-1d94-979d-554d-84d1270c9edb}

Followup:     MachineOwner
---------

1: kd> !blackboxbsd
Version: 0xc0
Product type: 1

1: kd> !blackboxntfs

NTFS Blackbox Data

0 Slow I/O Timeout Records Found
0 Oplock Break Timeout Records Found
1: kd> !blackboxpnp
    PnpActivityId      : {00000000-0000-0000-0000-000000000000}
    PnpActivityTime    : 132950233952334558
    PnpEventInformation: 2
    PnpEventInProgress : 0
    PnpProblemCode     : 24
    PnpVetoType        : 0
    DeviceId           : USB\VID_22D9&PID_2772\532916e2
    VetoString         : 

1: kd> lmvm VBoxUSB
Browse full module list
start             end                 module name
fffff807`ffa70000 fffff807`ffaa5000   VBoxUSB  T (no symbols)           
    Loaded symbol image file: VBoxUSB.sys
    Image path: VBoxUSB.sys
    Image name: VBoxUSB.sys
    Browse all global symbols  functions  data
    Timestamp:        Tue Oct 19 01:50:33 2021 (616DB3E9)
    CheckSum:         0003A1FC
    ImageSize:        00035000
    Translations:     0000.04b0 0000.04e4 0409.04b0 0409.04e4
    Information from resource tables:
yodamaster commented 2 years ago

After I replaced VBoxUSB with that from VirtualBox-6.1.34-150636, BOSD is gone.

dorssel commented 2 years ago

@yodamaster Thanks for investigating this! I will update the driver that usbipd-win ships with to this version.

Strangely enough, there were no code changes in VBoxUsb itself (see https://www.virtualbox.org/browser/vbox/trunk/src/VBox/HostDrivers/VBoxUSB/win/dev), but there may have been something in libusb that is linked in...

dorssel commented 2 years ago

@yodamaster @epiciskandar I've created PR #354 that updates the drivers in the installer. The installer is at https://github.com/dorssel/usbipd-win/actions/runs/2204188147. Could you please test if this solves the problem?

yodamaster commented 2 years ago

@yodamaster @epiciskandar I've created PR #354 that updates the drivers in the installer. The installer is at https://github.com/dorssel/usbipd-win/actions/runs/2204188147. Could you please test if this solves the problem?

New installer is much better, BOSD happens only once today, so I can bare it at the moment. Anyway, thanks a lot!

epiciskandar commented 2 years ago

These days I noticed the BSOD happens only be relevant to adb debugging, not the transfering stuff, and these two things always happen one by one(install and then debugger attaching it). If I transfer and install the .apk file manually, no BSOD will happen.

anyway, I'm tring this new package.

dorssel commented 2 years ago

Now released in 2.3.0.

epiciskandar commented 2 years ago

Not much frequently working with usbipd these days, but still happened once, still keep trying.

epiciskandar commented 2 years ago

currently works fine, barely happen again, close this issue.

dorssel commented 2 years ago

@klaus-vb For your information, please have a look at https://github.com/dorssel/usbipd-win/issues/248#issuecomment-1030681671

epiciskandar commented 2 years ago

BSOD happened many times today, I've updated Windows 11 to latest beta build, not sure if this is related. So, this problem still exists apparently. @dorssel 😔

dorssel commented 2 years ago

@epiciskandar That's sad... Can you confirm the post-mortem still points at VBoxUSB.sys? I have notified @klaus-vb from VirtualBox.

epiciskandar commented 2 years ago

Can you confirm the post-mortem still points at VBoxUSB.sys?

Confirmed.

 # Child-SP          RetAddr               Call Site
00 fffff803`57f43b28 fffff803`5a09d54a     nt!KeBugCheckEx
01 fffff803`57f43b30 fffff803`5a09d2f1     nt!MiReleasePtes+0x20a
02 fffff803`57f43c80 fffff803`5a09c5eb     nt!MmUnmapLockedPages+0x191
03 fffff803`57f43cf0 fffff803`a0f62ba1     nt!MmUnlockPages+0x6b
04 fffff803`57f43d90 fffff803`5a09b5e4     VBoxUSB+0x2ba1
05 fffff803`57f43dc0 fffff803`5a09b417     nt!IopfCompleteRequest+0x1b4
06 fffff803`57f43eb0 fffff803`5e1c3cc6     nt!IofCompleteRequest+0x17
07 (Inline Function) --------`--------     Wdf01000!FxIrp::CompleteRequest+0x13 [minkernel\wdf\framework\shared\inc\private\km\FxIrpKm.hpp @ 75] 
08 fffff803`57f43ee0 fffff803`5e1c2031     Wdf01000!FxRequest::CompleteInternal+0x246 [minkernel\wdf\framework\shared\core\fxrequest.cpp @ 869] 
SYSTEM_PTE_MISUSE (da)
A driver has corrupted system PTEs.
SzKPeter commented 2 years ago

I have similar issue. I cannot connect a Logitech F710 Joystick controller to WSL2. The joystick has a Direct input mode and a XInput mode.

In direct input mode windows recognizes the joystick differently and i cannot navigate in the basic windows menu, but in xinput mode windows recognizes the joystick as an XBOX 360 controller and I can navigate with it in the windows menu and this online tester also recognizes it. https://gamepad-tester.com/ (Its not recognized in direct input mode by the online tester.

I can attach the joystick to WSL2 in Direct input mode, but its not working just as its not working in windows. When I switch to X input mode, i cannot attach the joystick to WSL2 , I get the following error: kép

After restart I run usbipd wsl list in powershell, and the joystick is listed as XBOX 360 controller, but it's not recognized by the online tester and I cannot navigate with it in the windows menu, I had luck attaching the joystick to WSL2 this way, I could list the device on the WSL2 side with command : lsusb.

However, I tried to use the online tester in WSL2 and it did not recognize the joystick, I tried jstest-gtk which I installed in WSL2, still no luck.

After pulling out the receiver from my USB port and putting it back, the windows could recognize the joystick, but then I could not attach it again to WSL2. I got the same errors like in the picture above.

I have the following system:

Dell laptop Intel(R) Core(TM) i5-6440HQ CPU @ 2.60GHz 2.60 GHz NVIDIA GeForce 940MX

Windows 10 Enterprise 21H2 Build number: 19044.1706 WSL 2 Kernel version: 5.10.102.1 Ubuntu 20.04 inside WSL2 USBIPD : 2.3.0+42.Branch.master.Sha.3d9f5c5acc4e133ab8147684ad1463cbaec43240

Please let me know what I'm doing wrong, or is this an issue with USBIPD?

Update:

I also tried my laptops integrated webcam and integrated bluetooth (They both work without any issue from windows side if not attached). I can attach them without any issue, but they are not recognized by the WSL Ubuntu.

As much as I could read about this, should I build my custom WSL kernel to make this work?

Uldiniad commented 2 years ago

I am reproducing the SYSTEM_PTE_MISUSE on W11 stable branch on WSL2, latest usbipd-win release. Same steps as OP

n-prat commented 2 years ago

Same here.
It happens pretty much everytime I try to debug a big apk for Android (consisting of two 350+MB .so).

usbipd-win 2.3.0 Microsoft Windows [Version 10.0.22000.739]

*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

SYSTEM_PTE_MISUSE (da)
A driver has corrupted system PTEs.
Set HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management\TrackPtes
to a DWORD 3 value and reboot.  If the same BugCheck occurs again the stack trace will
identify the offending driver.
Arguments:
Arg1: 0000000000000302, Type of error.
Arg2: ffff8d0147e80000
Arg3: 0000000000000000
Arg4: 0000000000147e80

Debugging Details:
------------------

Page 13442c not present in the dump file. Type ".hh dbgerr004" for details
Page 13442c not present in the dump file. Type ".hh dbgerr004" for details
Page 13442c not present in the dump file. Type ".hh dbgerr004" for details
Page 13442c not present in the dump file. Type ".hh dbgerr004" for details
Page 13442c not present in the dump file. Type ".hh dbgerr004" for details
Page 13442c not present in the dump file. Type ".hh dbgerr004" for details
Page 13442c not present in the dump file. Type ".hh dbgerr004" for details
Page 13442c not present in the dump file. Type ".hh dbgerr004" for details
Page 13442c not present in the dump file. Type ".hh dbgerr004" for details
Page 13442c not present in the dump file. Type ".hh dbgerr004" for details
Page 13442c not present in the dump file. Type ".hh dbgerr004" for details
Page 13442c not present in the dump file. Type ".hh dbgerr004" for details
Page 13442c not present in the dump file. Type ".hh dbgerr004" for details
Page 13442c not present in the dump file. Type ".hh dbgerr004" for details
Page 13442c not present in the dump file. Type ".hh dbgerr004" for details
Page 13442c not present in the dump file. Type ".hh dbgerr004" for details
Page 13442e not present in the dump file. Type ".hh dbgerr004" for details
Page 13442e not present in the dump file. Type ".hh dbgerr004" for details
Page 13442e not present in the dump file. Type ".hh dbgerr004" for details
Page 13442e not present in the dump file. Type ".hh dbgerr004" for details
Page 13442e not present in the dump file. Type ".hh dbgerr004" for details
Page 13442e not present in the dump file. Type ".hh dbgerr004" for details
Page 13442e not present in the dump file. Type ".hh dbgerr004" for details
Page 13442e not present in the dump file. Type ".hh dbgerr004" for details
Page 13442e not present in the dump file. Type ".hh dbgerr004" for details
Page 13442e not present in the dump file. Type ".hh dbgerr004" for details
Page 13442e not present in the dump file. Type ".hh dbgerr004" for details
Page 13442e not present in the dump file. Type ".hh dbgerr004" for details
Page 13442e not present in the dump file. Type ".hh dbgerr004" for details
Page 13442e not present in the dump file. Type ".hh dbgerr004" for details
Page 13442e not present in the dump file. Type ".hh dbgerr004" for details
Page 13442e not present in the dump file. Type ".hh dbgerr004" for details
Page 13442e not present in the dump file. Type ".hh dbgerr004" for details
Page 13442e not present in the dump file. Type ".hh dbgerr004" for details
Page 13442e not present in the dump file. Type ".hh dbgerr004" for details
Page 13442e not present in the dump file. Type ".hh dbgerr004" for details
Page 132330 not present in the dump file. Type ".hh dbgerr004" for details
Page 132330 not present in the dump file. Type ".hh dbgerr004" for details
Page 132330 not present in the dump file. Type ".hh dbgerr004" for details
Page 132330 not present in the dump file. Type ".hh dbgerr004" for details
Page 132330 not present in the dump file. Type ".hh dbgerr004" for details
Page 132330 not present in the dump file. Type ".hh dbgerr004" for details
Page 132330 not present in the dump file. Type ".hh dbgerr004" for details
Page 132330 not present in the dump file. Type ".hh dbgerr004" for details
Page 132330 not present in the dump file. Type ".hh dbgerr004" for details
Page 132330 not present in the dump file. Type ".hh dbgerr004" for details
Page 132330 not present in the dump file. Type ".hh dbgerr004" for details
Page 132330 not present in the dump file. Type ".hh dbgerr004" for details

KEY_VALUES_STRING: 1

    Key  : Analysis.CPU.mSec
    Value: 2296

    Key  : Analysis.DebugAnalysisManager
    Value: Create

    Key  : Analysis.Elapsed.mSec
    Value: 5037

    Key  : Analysis.Init.CPU.mSec
    Value: 405

    Key  : Analysis.Init.Elapsed.mSec
    Value: 29007

    Key  : Analysis.Memory.CommitPeak.Mb
    Value: 112

    Key  : Bugcheck.DumpVsMemoryMatch
    Value: True

    Key  : Dump.Attributes.AsUlong
    Value: 1800

    Key  : WER.OS.Branch
    Value: co_release

    Key  : WER.OS.Timestamp
    Value: 2021-06-04T16:28:00Z

    Key  : WER.OS.Version
    Value: 10.0.22000.1

FILE_IN_CAB:  MEMORY.DMP

DUMP_FILE_ATTRIBUTES: 0x1800

BUGCHECK_CODE:  da

BUGCHECK_P1: 302

BUGCHECK_P2: ffff8d0147e80000

BUGCHECK_P3: 0

BUGCHECK_P4: 147e80

BLACKBOXBSD: 1 (!blackboxbsd)

BLACKBOXNTFS: 1 (!blackboxntfs)

BLACKBOXPNP: 1 (!blackboxpnp)

BLACKBOXWINLOGON: 1

PROCESS_NAME:  System

STACK_TEXT:  
fffff805`3ae00f78 fffff805`3f08bc2c     : 00000000`000000da 00000000`00000302 ffff8d01`47e80000 00000000`00000000 : nt!KeBugCheckEx
fffff805`3ae00f80 fffff805`3f08b7c9     : 00000000`00000d4e ffffc305`275e41e0 00000000`00000000 ffffc305`24855c30 : nt!MiReleasePtes+0x3ec
fffff805`3ae010d0 fffff805`3f0894c1     : 00000000`00000000 ffffc305`24855c30 00000000`00000000 fffff805`3f0817fd : nt!MmUnmapLockedPages+0x179
fffff805`3ae01140 fffff805`52c52ba1     : ffffc305`24855c30 ffffc305`275e41f0 ffffc305`1598c440 00000000`00000000 : nt!MmUnlockPages+0x71
fffff805`3ae011e0 fffff805`3f0868d7     : ffffc305`265edce0 00000000`00000000 fffff805`3ae012b9 ffffc305`31238b3b : VBoxUSB+0x2ba1
fffff805`3ae01210 fffff805`3f086797     : ffffc305`31238750 00000000`00000000 ffffc305`24ad8a00 00000000`00000001 : nt!IopfCompleteRequest+0x127
fffff805`3ae01320 fffff805`40cc8ad0     : ffffc305`31238750 00000000`00000001 00000000`00000002 fffff805`3ae01400 : nt!IofCompleteRequest+0x17
fffff805`3ae01350 fffff805`40cc885f     : ffffc305`31238750 fffff805`40cd7240 ffffc305`159ab850 00000000`00000000 : Wdf01000!FxRequest::CompleteInternal+0x240 [minkernel\wdf\framework\shared\core\fxrequest.cpp @ 869] 
fffff805`3ae013e0 fffff805`5009e370     : 00000000`ffffff02 ffffc305`248ecab0 ffffc305`24ad8de0 ffffc305`24ad8de0 : Wdf01000!imp_WdfRequestComplete+0x8f [minkernel\wdf\framework\shared\core\fxrequestapi.cpp @ 436] 
fffff805`3ae01440 fffff805`5009e1b1     : ffffc305`248ecc50 00000000`00000000 ffffc305`248ecce0 fffff805`3ae01658 : USBXHCI!Bulk_Transfer_CompleteCancelable+0xc8
fffff805`3ae014a0 fffff805`5009dfa0     : 00000000`00000004 fffff805`3ae01610 00000000`00000000 ffffc305`24de0a30 : USBXHCI!Bulk_ProcessTransferEventWithED1+0x1fd
fffff805`3ae01550 fffff805`50093938     : 00000000`00000004 fffff805`3ae01628 00000000`00000008 fffff805`3ae01630 : USBXHCI!Bulk_EP_TransferEventHandler+0x10
fffff805`3ae01580 fffff805`50093188     : ffffc305`15721630 00000001`00000000 ffffc305`157f0df0 ffffc305`15721630 : USBXHCI!Endpoint_TransferEventHandler+0xa8
fffff805`3ae015e0 fffff805`50092b9c     : 00000000`00000000 00000000`00000000 0000013a`bf24f61c 00000000`00000000 : USBXHCI!Interrupter_DeferredWorkProcessor+0x5d8
fffff805`3ae016e0 fffff805`40cc25f5     : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : USBXHCI!Interrupter_WdfEvtInterruptDpc+0xc
fffff805`3ae01710 fffff805`3f126f71     : fffff805`3ae01ac0 00000000`00000000 fffff805`3aa5f4c0 00000000`00000000 : Wdf01000!FxInterrupt::_InterruptDpcThunk+0xa5 [minkernel\wdf\framework\shared\irphandlers\pnp\km\interruptobjectkm.cpp @ 404] 
fffff805`3ae01750 fffff805`3f125f72     : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiExecuteAllDpcs+0x491
fffff805`3ae01950 fffff805`3f21b79e     : 00000000`00000000 fffff805`3aa5c180 fffff805`3fb35bc0 ffffc305`1ed87080 : nt!KiRetireDpcList+0x2a2
fffff805`3ae01c00 00000000`00000000     : fffff805`3ae02000 fffff805`3adfb000 00000000`00000000 00000000`00000000 : nt!KiIdleLoop+0x9e

SYMBOL_NAME:  VBoxUSB+2ba1

MODULE_NAME: VBoxUSB

IMAGE_NAME:  VBoxUSB.sys

STACK_COMMAND:  .cxr; .ecxr ; kb

BUCKET_ID_FUNC_OFFSET:  2ba1

FAILURE_BUCKET_ID:  0xDA_VBoxUSB!unknown_function

OS_VERSION:  10.0.22000.1

BUILDLAB_STR:  co_release

OSPLATFORM_TYPE:  x64

OSNAME:  Windows 10

FAILURE_ID_HASH:  {82481b05-1d94-979d-554d-84d1270c9edb}

Followup:     MachineOwner
dorssel commented 2 years ago

@nathanprat Thanks for the debug info, this really helps!

Can you answer these (some questions seem "silly", but please just confirm them to rule it out): 1) Are there any USB filter warnings reported for usbipd list? 2) Is the device on a USB2 or USB3 port? 3) Is the device itself USB3 capable? 4) Are all the hubs in the chain to the device port using stock Microsoft drivers, or are there any vendor drivers involved? 5) What is the Linux driver type when accessing the device (serial? storage? other?) 6) You are writing to the device, correct? Or is it reading? 7) Is your system in any way low on memory or low on other resources right before the crash? 8) Can you try to grab a USB dump as described in https://github.com/dorssel/usbipd-win/wiki/Troubleshooting#usb-capture? I understand the crash itself will truncate/corrupt the dump, but hopefully the last few seconds before the crash may provide some information...

Your dump contains useful information that we didn't have before. Here is my analysis:

SYSTEM_PTE_MISUSE (da) ... Arg1: 0000000000000302, Type of error.

From https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/bug-check-0xda--system-pte-misuse:

The caller is attempting to release a system address that is not currently mapped.

And also:

... nt!MmUnlockPages+0x71 ... VBoxUSB+0x2ba1 ... nt!IopfCompleteRequest+0x127

This contradicts my earlier suspicion in https://github.com/dorssel/usbipd-win/issues/248#issuecomment-1030681671; instead it is now clear this is the MmUnlockPages during IRP completion, on line 1204 of VBoxUsbRt.cpp. From the code, it seems this can only be reached exactly once, after the IRP has indeed completed. There is a weird comment in the completion code:

       case ((USBD_STATUS)0xC0010000L): // USBD_STATUS_CANCELED - too bad usbdi.h and usb.h aren't consistent!
           /// @todo What the heck are we really supposed to do here?
           pUrbInfo->error = USBSUP_XFER_STALL;
           Status = STATUS_SUCCESS;
           break;

But this is in my opinion not related to the MDL lifetime. So, it looks like the MDL must have been wrong at creation time already. The MDL is created on line 1256, and that seems to be all correct. The only thing I can find wrong with the code is the PENDING part of the IRP. https://docs.microsoft.com/en-us/windows-hardware/drivers/ifs/example--simple-pass-through-dispatch-and-completion indicates that IoMarkIrpPending should not be called by the dispatcher, but instead conditionally by the completion routine.

What VBox does is:

What Microsoft says you should do:

The only other thing I can think of is resource limitation. Maybe there are too many queued/pending URBs for VBox. I don't really see a hard limit, but if there is one then going over that limit may corrupt the internal structures. A USB capture should show that...

I'll pass this on to VirtualBox.

n-prat commented 2 years ago

Thanks for the quick response! I will do my best to follow up.

1. Are there any USB filter warnings reported for `usbipd list`?

No, not as for as I can see. usbipd_list

2. Is the device on a USB2 or USB3 port?

The stacktrace in my first message was on a USB3 port, and the captures at the end of this message are with a USB2 port.

3. Is the device itself USB3 capable?

Not sure; USB Device Tree Viewer says:

Device maximum Speed     : High-Speed
Device Connection Speed  : High-Speed

(It’s a phone, so I would not be surprised if it is only USB2?)

4. Are all the hubs in the chain to the device port using stock Microsoft drivers, or are there any vendor drivers involved?

Not sure, see USB Device Tree Viewer: usb2_tree usb3_tree Both tries were using the front panel ports of my desktop.

5. What is the Linux driver type when accessing the device (serial? storage? other?)

No idea. How can I know? Not sure if related but I am using adb directly; not MTP/PTP.

6. You are _writing_ to the device, correct? Or is it _reading_?

I used to think the problem was when writing b/c it happens as soon as Android Studio shows "install...". But when trying to reproduce I did manage to upload the 350MB library using adb push without issue.

7. Is your system in any way low on memory or low on other resources right before the crash?

Not really. The original stacktrace was probably around 90% memory use of the windows side, and around 50% on the WSL2 side. But when reproducing I did a drop_caches just after building and before starting to debug and I still got a BSOD. At the time of the crash Windows was around the 50-60% RAM used.

8. Can you try to grab a USB dump as described in https://github.com/dorssel/usbipd-win/wiki/Troubleshooting#usb-capture? I understand the crash itself will truncate/corrupt the dump, but hopefully the last few seconds before the crash may provide some information...

Here

Hope that helps. Tell my if you need anything else.

d0n13 commented 2 years ago

@dorssel Did you get any response from VirtualBox on your findings above?

dorssel commented 2 years ago

@d0n13 Unfortunately, no. Not a single reaction...

d0n13 commented 2 years ago

@d0n13 Unfortunately, no. Not a single reaction...

Still no response. This bug has killed us from using this and I really hope it gets fixed. Is there anything else that could be done? How hard to write this driver instead of depending on Orace?

codesmithcode commented 2 years ago

Also seeing this with vboxusb referenced in the crash report. Is there a bug to follow on the Oracle side?

As an additional data point: I seem to be able to run our program in VirtualBox against USB w/o issue. Its just WSL2 + USBIP that are causing a failure.

typeless commented 2 years ago

I encountered this problem when connecting remotely from a Ubuntu Server LTS 22.04 too. It works fine with fastboot but would get BSOD on the Windows side when using adb sideload.

d0n13 commented 2 years ago

@typeless Seems that when you send a lot of data across the link the problem arises. I was using adb and libmobiledevice without issues until I started to use sideloading too.

I wonder can we replace it with a different driver? How complex is it to write such a driver?

d0n13 commented 2 years ago

@dorssel can you email me at donie dot kelly at g mail dot com on this issue? We would like you to work on it and are willing to pay for your time. Possible?

d0n13 commented 2 years ago

Does anyone know how we can escalate this with somebody who might know how to fix the driver issue here? Seems like it should be straightforward to somebody who is familiar with the code as @dorssel has shown what the issue may be above.

somu1795 commented 1 year ago

This happens to me whenever I try to use adb sideload , specifically adb sideload (I dont have issues pushing multi GB files). Whenever I use adb sideload inside wsl2 , it crashes with BSOD (pte_misuse)

henrygab commented 1 year ago

I wonder can we replace it with a different driver? How complex is it to write such a driver?

@d0n13, I understand your frustration. Long ago, I was a kernel-mode developer in Windows. I've had my share of tracking backwards from bugchecks. I do have some recommendations. At the same time, please note my knowledge may no longer be 100% current...

  1. The bugcheck itself gave a recommendation. Has anyone followed those instructions first?
A driver has corrupted system PTEs.
Set HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management\TrackPtes
to a DWORD 3 value and reboot.  If the same BugCheck occurs again the stack trace will
identify the offending driver.
  1. If the above isn't enough, have you tried to enable driver verifier?
WARNING
Only mess with driver verifier if you are OK with your computer
being unavailable ... ensure you have another computer around,
in case you get stuck.  Oh, and backup your bitlocker recovery
key, if you're using bitlocker.

Driver verifier allows you to enable additional validation of how a driver behaves. It's built into Windows. It may make your machine crash more often, but the bonus is that, when it does, it often gives very specific information on what driver violated a rule, and the stack trace will often show the exact lines of code that are guilty.

See https://gist.github.com/henrygab/044400844e1a8f3cfa730a66cc306d94

Since the bugcheck here appears to repeatedly be that system PTEs are being corrupted, it's likely a bug in a driver, and driver verifier is very likely going to find exactly who is causing the problem, and where.

closing notes

While I was a kernel mode developer, my knowledge is likely out-of-date. I cannot commit to any help beyond providing these informational pointers. At the same time, it seems like @dorrsel has the expertise to analyze memory dumps and the stack traces that !analyze -v (from WinDbg Preview debugger) shows ... so you seem to be in good hands!

epiciskandar commented 1 year ago

The bugcheck itself gave a recommendation. Has anyone followed those instructions first?

Yes I've tried that once after I first time got kernel dump information, I'm not 100% understanding what this does, but I tried it. Unfortunatly nothing new brought by done that, or maybe I'm not good at WinDbg so I didn't find the point.

If the above isn't enough, have you tried to enable driver verifier?

I missed this message, I would try that when convienent.

henrygab commented 1 year ago

Yes ... or maybe I'm not good at WinDbg

Even knowing what WinDbg is puts you in an elite field of experts. :)

If the above isn't enough, have you tried to enable driver verifier? I missed this message, I would try that when convienent.

Driver verifier is amazing. Sure, the overhead at times results in a less responsive computer, but when it finds a violation ... so much wasted debug effort avoided. For this purpose, especially as the issue is fairly reproducible, I think the first set of options (where the culprit driver is fairly likely to be known) are likely to bear fruit with a bugcheck that very specifically calls out what violation occurred, and what the code should have done.

If you dabble in "WDM" Windows drivers, then Driver Verifier needs to be part of every development execution of the driver and part of the standard test passes (imho, of course).

IIRC, both KMDF and UMDF also have similar verifier functionality available.

And of course, there's also an "AppVerifier" for user-mode applications.

Development on the Windows platform is made better by such tools.

(imho ... I was formerly a Windows kernel-mode dev, and am still employed there).

Tsuser1 commented 1 year ago

Figured I'd throw in my WinDbg analysis to see if it gives you anything new:

Bug Check 1 ******************************************************************************* * * * Bugcheck Analysis * * * ******************************************************************************* SYSTEM_PTE_MISUSE (da) A driver has corrupted system PTEs. Set HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management\TrackPtes to a DWORD 3 value and reboot. If the same BugCheck occurs again the stack trace will identify the offending driver. Arguments: Arg1: 0000000000000302, Type of error. Arg2: ffff96015cfc0000 Arg3: 0000000000000000 Arg4: 000000000015cfc0 Debugging Details: ------------------ Page 11e7d0 not present in the dump file. Type ".hh dbgerr004" for details Page 122dd1 not present in the dump file. Type ".hh dbgerr004" for details KEY_VALUES_STRING: 1 Key : Analysis.CPU.mSec Value: 4858 Key : Analysis.DebugAnalysisManager Value: Create Key : Analysis.Elapsed.mSec Value: 9218 Key : Analysis.IO.Other.Mb Value: 6 Key : Analysis.IO.Read.Mb Value: 0 Key : Analysis.IO.Write.Mb Value: 32 Key : Analysis.Init.CPU.mSec Value: 734 Key : Analysis.Init.Elapsed.mSec Value: 24713 Key : Analysis.Memory.CommitPeak.Mb Value: 102 Key : Bugcheck.Code.DumpHeader Value: 0xda Key : Bugcheck.Code.KiBugCheckData Value: 0xda Key : Bugcheck.Code.Register Value: 0xda Key : WER.OS.Branch Value: vb_release Key : WER.OS.Timestamp Value: 2019-12-06T14:06:00Z Key : WER.OS.Version Value: 10.0.19041.1 FILE_IN_CAB: MEMORY.DMP BUGCHECK_CODE: da BUGCHECK_P1: 302 BUGCHECK_P2: ffff96015cfc0000 BUGCHECK_P3: 0 BUGCHECK_P4: 15cfc0 PROCESS_NAME: System STACK_TEXT: ffffce06`b050d1d8 fffff805`830279ad : 00000000`000000da 00000000`00000302 ffff9601`5cfc0000 00000000`00000000 : nt!KeBugCheckEx ffffce06`b050d1e0 fffff805`83026b0f : ffff9601`3299e180 fffff805`8305aad6 00000000`00000000 00000000`00000000 : nt!MiReleasePtes+0x40d ffffce06`b050d330 fffff806`91582ba1 : ffffcf03`b802e6f0 ffffcf03`c01f018b ffffcf03`bb71ed10 fffff805`84bd42f4 : nt!MmUnlockPages+0x2ff ffffce06`b050d420 fffff805`8302816e : ffffcf03`b2fb8ce0 00000000`00000000 ffffce06`b050d4d9 ffffcf03`b64f03fb : VBoxUSB+0x2ba1 ffffce06`b050d450 fffff805`83028037 : 00000000`00000001 00000000`00000000 ffffcf03`c0736a30 00000000`00000002 : nt!IopfCompleteRequest+0x11e ffffce06`b050d540 fffff805`84bd811a : 00000000`00000000 ffffcf03`b0c3add0 ffffcf03`b64f0010 ffffce06`b050d620 : nt!IofCompleteRequest+0x17 ffffce06`b050d570 fffff805`84bd5bbf : ffffcf03`b8a96c02 ffffcf03`b4ed1b20 ffffcf03`b64f0010 00000000`00000000 : Wdf01000!FxRequest::CompleteInternal+0x23a [minkernel\wdf\framework\shared\core\fxrequest.cpp @ 869] ffffce06`b050d600 fffff805`9e55c1ed : 00000000`ffffff02 ffffcf03`b8a96b30 ffffcf03`c0736e10 ffffcf03`c0736e10 : Wdf01000!imp_WdfRequestComplete+0x8f [minkernel\wdf\framework\shared\core\fxrequestapi.cpp @ 436] ffffce06`b050d660 fffff805`9e55c0b1 : ffffcf03`b8a96cd0 00000000`00000000 ffffcf03`b8a96d60 ffffce06`b050d878 : USBXHCI!Bulk_Transfer_CompleteCancelable+0xc9 ffffce06`b050d6c0 fffff805`9e55bea0 : 00000000`00000004 ffffce06`b050d830 00000000`00000000 ffffcf03`b4a51a60 : USBXHCI!Bulk_ProcessTransferEventWithED1+0x1fd ffffce06`b050d770 fffff805`9e556911 : 00000000`00000004 ffffce06`b050d848 00000000`00000008 ffffce06`b050d850 : USBXHCI!Bulk_EP_TransferEventHandler+0x10 ffffce06`b050d7a0 fffff805`9e556445 : 00000000`00000780 00000000`00000000 ffffcf03`bf0ffdc0 ffffcf03`bb9bf430 : USBXHCI!Endpoint_TransferEventHandler+0xb1 ffffce06`b050d800 fffff805`9e58bf78 : ffffcf03`bbb6bc70 000030fc`459c5578 ffffcf03`bbb6bc70 00000000`00000000 : USBXHCI!Interrupter_DeferredWorkProcessor+0x315 ffffce06`b050d900 fffff805`84c59d61 : ffffcf03`bbb6ba70 ffffcf03`bbb6ba70 fffff805`84c595d0 fffff805`8301ad6f : USBXHCI!Interrupter_WdfEvtInterruptWorkItem+0x68 ffffce06`b050d930 fffff805`84c595d9 : ffffcf03`bbdf7430 fffff805`85f17101 ffffce06`b050d994 00000000`00000000 : Wdf01000!FxInterrupt::WorkItemHandler+0x101 [minkernel\wdf\framework\shared\irphandlers\pnp\km\interruptobjectkm.cpp @ 126] ffffce06`b050d970 fffff805`84bd3b12 : 00000000`00000000 ffffcf03`b2a08920 ffffcf03`a94ef240 00000000`00000000 : Wdf01000!FxInterrupt::_InterruptWorkItemCallback+0x9 [minkernel\wdf\framework\shared\irphandlers\pnp\interruptobject.cpp @ 1764] ffffce06`b050d9a0 fffff805`84bd3a41 : 00000000`00000000 ffffcf03`b2a08920 ffffcf03`ba27cac0 00000000`00000000 : Wdf01000!FxSystemWorkItem::WorkItemHandler+0xae [minkernel\wdf\framework\shared\core\fxsystemworkitem.cpp @ 264] ffffce06`b050d9d0 fffff805`83007df5 : 00000000`00000000 00000000`00000000 00000000`00000000 fffff805`84bdc5c0 : Wdf01000!FxSystemWorkItem::_WorkItemThunk+0x11 [minkernel\wdf\framework\shared\core\fxsystemworkitem.cpp @ 315] ffffce06`b050da00 fffff805`8305b3d5 : ffffcf03`a94ef100 ffffcf03`a94ef100 fffff805`83007cc0 00000000`00000000 : nt!IopProcessWorkItem+0x135 ffffce06`b050da70 fffff805`831030e5 : ffffcf03`a94ef100 00000000`00000080 ffffcf03`a5ebb200 000fa56f`b19bbfff : nt!ExpWorkerThread+0x105 ffffce06`b050db10 fffff805`83202e08 : fffff805`7d621180 ffffcf03`a94ef100 fffff805`83103090 00000000`00000000 : nt!PspSystemThreadStartup+0x55 ffffce06`b050db60 00000000`00000000 : ffffce06`b050e000 ffffce06`b0507000 00000000`00000000 00000000`00000000 : nt!KiStartSystemThread+0x28 SYMBOL_NAME: VBoxUSB+2ba1 MODULE_NAME: VBoxUSB IMAGE_NAME: VBoxUSB.sys STACK_COMMAND: .cxr; .ecxr ; kb BUCKET_ID_FUNC_OFFSET: 2ba1 FAILURE_BUCKET_ID: 0xDA_VBoxUSB!unknown_function OS_VERSION: 10.0.19041.1 BUILDLAB_STR: vb_release OSPLATFORM_TYPE: x64 OSNAME: Windows 10 FAILURE_ID_HASH: {82481b05-1d94-979d-554d-84d1270c9edb} Followup: MachineOwner ---------
Bug Check 2 (TrackPtes enabled) ******************************************************************************* * * * Bugcheck Analysis * * * ******************************************************************************* SYSTEM_PTE_MISUSE (da) A driver has corrupted system PTEs. Set HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management\TrackPtes to a DWORD 3 value and reboot. If the same BugCheck occurs again the stack trace will identify the offending driver. Arguments: Arg1: 0000000000000200, Type of error. Arg2: ffff8a005b000000 Arg3: 0000000000000000 Arg4: 0000000000000000 Debugging Details: ------------------ Page 1198ed not present in the dump file. Type ".hh dbgerr004" for details KEY_VALUES_STRING: 1 Key : Analysis.CPU.mSec Value: 4702 Key : Analysis.DebugAnalysisManager Value: Create Key : Analysis.Elapsed.mSec Value: 4857 Key : Analysis.IO.Other.Mb Value: 0 Key : Analysis.IO.Read.Mb Value: 0 Key : Analysis.IO.Write.Mb Value: 0 Key : Analysis.Init.CPU.mSec Value: 967 Key : Analysis.Init.Elapsed.mSec Value: 7064 Key : Analysis.Memory.CommitPeak.Mb Value: 92 Key : Bugcheck.Code.DumpHeader Value: 0xda Key : Bugcheck.Code.KiBugCheckData Value: 0xda Key : Bugcheck.Code.Register Value: 0xda Key : WER.OS.Branch Value: vb_release Key : WER.OS.Timestamp Value: 2019-12-06T14:06:00Z Key : WER.OS.Version Value: 10.0.19041.1 FILE_IN_CAB: MEMORY.DMP BUGCHECK_CODE: da BUGCHECK_P1: 200 BUGCHECK_P2: ffff8a005b000000 BUGCHECK_P3: 0 BUGCHECK_P4: 0 PROCESS_NAME: usbipd.exe STACK_TEXT: ffffd60e`481af4c8 fffff807`75d51381 : 00000000`000000da 00000000`00000200 ffff8a00`5b000000 00000000`00000000 : nt!KeBugCheckEx ffffd60e`481af4d0 fffff807`75c57f1a : 00000000`00000000 fffff807`7644edc0 fffff807`7644edc0 00000000`00000000 : nt!MiCheckPteReserve+0x55 ffffd60e`481af520 fffff807`75acaa84 : ffffb28f`6a9dca10 00000000`00000000 ffffb28f`833bbec0 ffffb28f`6a9dca38 : nt!MiReservePtes+0x18dc7a ffffd60e`481af5f0 fffff807`c8482d09 : ffffb28f`6a9dca38 ffffb28f`00000000 00000000`00000002 ffffb28f`833bbec0 : nt!MmMapLockedPagesSpecifyCache+0xd4 ffffd60e`481af650 fffff807`c8481366 : ffffb28f`7ac85880 ffffb28f`7694a6e0 00000000`00000098 ffffb28f`7f4ce870 : VBoxUSB+0x2d09 ffffd60e`481af6d0 fffff807`75a329b5 : ffffb28f`7694a6e0 00000000`00000002 00000000`00000000 00000000`00000068 : VBoxUSB+0x1366 ffffd60e`481af700 fffff807`75e39bd8 : ffffb28f`7694a6e0 00000000`00000000 ffffb28f`7694a6e0 00000000`00000000 : nt!IofCallDriver+0x55 ffffd60e`481af740 fffff807`75e399d7 : 00000000`00000000 ffffd60e`481afa80 00000000`00040005 ffffd60e`481afa80 : nt!IopSynchronousServiceTail+0x1a8 ffffd60e`481af7e0 fffff807`75e38d56 : 00000001`00000000 00000000`00000000 00000000`00000000 0000018b`f1c868e0 : nt!IopXxxControlFile+0xc67 ffffd60e`481af920 fffff807`75c0d9f5 : ffffb28f`7df80080 00000063`25f3fdc8 ffffd60e`481af9a8 ffffb28f`826a61c0 : nt!NtDeviceIoControlFile+0x56 ffffd60e`481af990 00007ffa`5f72d1a4 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceCopyEnd+0x25 00000063`25f3e898 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x00007ffa`5f72d1a4 SYMBOL_NAME: VBoxUSB+2d09 MODULE_NAME: VBoxUSB IMAGE_NAME: VBoxUSB.sys STACK_COMMAND: .cxr; .ecxr ; kb BUCKET_ID_FUNC_OFFSET: 2d09 FAILURE_BUCKET_ID: 0xDA_VBoxUSB!unknown_function OS_VERSION: 10.0.19041.1 BUILDLAB_STR: vb_release OSPLATFORM_TYPE: x64 OSNAME: Windows 10 FAILURE_ID_HASH: {82481b05-1d94-979d-554d-84d1270c9edb} Followup: MachineOwner ---------
d0n13 commented 1 year ago

Figured I'd throw in my WinDbg analysis to see if it gives you anything new:

Bug Check 1 Bug Check 2 (TrackPtes enabled)

I hope this helps somebody figure this out. I have a question though. If this is figured and a code change is required where is the official repo for the drivers? Can this be built by anyone? Does it require signing etc?

Thanks for your work @Tsuser1

henrygab commented 1 year ago

This definitely points the finger (whether correctly or not) at VBoxUSB as the culprit. Who owns that driver? Can you build it so you have debug symbols and source available?

Analysis of Dump

When this dump is loaded, the top few parameters help understand what the class of the problem is. Using the online list of [bugcheck codes](https://learn.microsoft.com/en-us/windows-hardware/drivers/debugger/bug-check-0xda--system-pte-misuse), note that `Arg1` is `0x200`. Scrolling deep into that list, `0x200` means: `The caller is attempting to reserve a mapping address space that contains no mappings.` `Arg2` in such a case is the first mapping address. My reduction of the shows the following: ``` STACK_TEXT: nt!KeBugCheckEx nt!MiCheckPteReserve+0x55 nt!MiReservePtes+0x18dc7a nt!MmMapLockedPagesSpecifyCache+0xd4 VBoxUSB+0x2d09 VBoxUSB+0x1366 nt!IofCallDriver+0x55 nt!IopSynchronousServiceTail+0x1a8 nt!IopXxxControlFile+0xc67 nt!NtDeviceIoControlFile+0x56 ``` Unfortunately, there are no debug symbols given for `VBoxUSB`, and therefore without a more comprehensive dump, it is difficult to directly correlate this to specific lines of code. However, the stack does give some hints: 1. `VBoxUSB` is likely in the DeviceIOControl dispatch handler, because this occurred from `nt!NtDeviceIoControlFile`. 2. The dispatch handler called a second function from (just prior to) `VBoxUSB+0x1366`. 3. That second function eventually calls `MmMapLockedPagesSpecifyCache` (with return address `VBoxUSB+0x2d09`). IMO, the next thing needed is symbols for VBoxUSB, or the owner of that driver to do additional analysis. You could also look at the IRP that was in flight, and trace through by looking at the driver object to find the DeviceIOControl dispatch, and trace where that IRP would have gone, instruction by instruction, knowing it should have the above return addresses pushed to the stack.

klaus-vb commented 1 year ago

The driver is maintained by the Oracle VM VirtualBox dev team. It is open source, and anyone in principle can build it (but for I assume numerous reasons everyone in the audience is avoiding the big effort). That said, we've just applied the fix, but we're skeptical that it really addresses this issue (which we have never seen in the context of VirtualBox). So far there's no testbuild out yet which has the modified code (and our testbuilds does not include attestation signing drivers by Microsoft, just to make this clear).

d0n13 commented 1 year ago

Hi @klaus-vb, nice work. Any idea if the fix will be reviewed by Oracle and included in a future build? Where is the repo?

klaus-vb commented 1 year ago

As I said, we've applied the fix already (that's what we changed: https://www.virtualbox.org/changeset/98327/vbox ) and it will be included in future builds, at least the 7.0.x ones. Sometime next week we might have a test build which should be usable for people who have secure boot disabled. The subversion repo of VirtualBox is at https://www.virtualbox.org/svn/vbox/trunk/ We're appreciating contributions if it's reasonably clear what they achieve. As I wrote already, we're not sure if this changes anything, but on the other hand it also shouldn't do harm. The code should be buildable by others (I know that several people have built VirtualBox successfully - as a whole, we have no reason to invest time into making just the USB driver separately buildable without the big list of prerequisites for VirtualBox as a whole). Either way, building drivers needs a lot of knowledge about the signing bells and whistles. Assume it would take several weeks to get the necessary certs (for $$), with a very steep learning curve and several nervous breakdowns.

henrygab commented 1 year ago

we're not sure if this changes anything, but on the other hand it also shouldn't do harm.

Kudos to the VirtualBox team for fixing a bug that did not appear in their own testing.

I encourage, if they do not already do so, the VirtualBox team to use DriverVerifier.exe in its most pendantic modes, to further strengthen their driver.

Again, glad that (at least one of) the related bugs is being resolved! Hoping this hear, when the signed driver appears, that this fully resolves the issues....

d0n13 commented 1 year ago

@dorssel will you be able to release a version of usbip with the updated driver once it’s made available? Waiting eagerly to test :)