dorssel / usbipd-win

Windows software for sharing locally connected USB devices to other machines, including Hyper-V guests and WSL 2.
GNU General Public License v3.0
3.5k stars 225 forks source link

adb push via usbipd cause windows bluescreen #461

Closed harlenli2022 closed 1 year ago

harlenli2022 commented 1 year ago

I'm using WIN10+WSL2 with usbipd deployed in both system. during my development, I will attach adb device into WSL, and use adb push to push files into device, when the single file is larger (1G), adb push will fail and windows became bluescreen, is there any limitation on the file size transferred via usbipd?

dorssel commented 1 year ago

Duplicate of #248 and #410 (related).

You can help by adding to the debug information we have on this issue. What is the BSOD STOP-code? Which driver crashed? Is it USB2 or USB3. What is the (approximate) transfer rate?

harlenli2022 commented 1 year ago

Duplicate of #248 and #410 (related).

You can help by adding to the debug information we have on this issue. What is the BSOD STOP-code? Which driver crashed? Is it USB2 or USB3. What is the (approximate) transfer rate?

Thanks for reply, the STOP code is SYSTEM_PTE_MISUSE, I'm using USB3.0, I didn't do any setting on transfer rate, the general ADB max payload size shall be 1M, I also tried to use adb to push the same file in Windows, everything's fine without crash. What's more, the smaller files can be pushed successfully in same env setup (WSL->usbipd->device).

harlenli2022 commented 1 year ago

Duplicate of #248 and #410 (related).

You can help by adding to the debug information we have on this issue. What is the BSOD STOP-code? Which driver crashed? Is it USB2 or USB3. What is the (approximate) transfer rate?

Duplicate of #248 and #410 (related). You can help by adding to the debug information we have on this issue. What is the BSOD STOP-code? Which driver crashed? Is it USB2 or USB3. What is the (approximate) transfer rate?

Additional info: usbipd version in Windows: 2.3.0+42.Branch.master.Sha.3d9f5c5acc4e133ab8147684ad1463cbaec43240 usbip version in WSL2: usbip (usbip-utils 2.0) Could you please double check if these version may cause this crash? thanks.

dorssel commented 1 year ago

2.4.0 contains a newer USB driver. Please try that too. I don't think Oracle fixed anything in this regard, but you never know...

If you can: use a remote bare metal Linux (or perhaps a WSL instance on a different machine) and turn on USB tracing there. (See for example https://serverfault.com/questions/1104416/how-to-capture-usb-traffic-using-wireshark-in-linux-cli). This allows us to see the last USBIP communication right up to the point where the Windows host crashes.

maffiou commented 1 year ago

Seeing the same with C:\Windows\System32>usbipd usbipd-win 2.4.1

:~$ usbip version usbip (usbip-utils 2.0)

Works well enough for small files, but same blue screen above a certain size. Not too sure how to capture logs.

harlenli2022 commented 1 year ago

Here's my update for today: 1.upgrade VirtualBox to 6.1.36, issue can still 100% reproduced. 2.upgrade usbipd-win to 2.4.0, issue can still 100% reproduced. 3.slow down adb push speed with below setup:

Hope above info can help u for fix direction. @https://github.com/dorssel

dorssel commented 1 year ago

Unfortunately, I cannot reproduce the problem with my hardware (I don't have anything that uses adb). My flash drives can easily handle > 1GiB transfers, and I just tried a USB serial adapter:

frans@ubuntu:~$ sudo dd if=/dev/zero of=/dev/ttyUSB0 bs=16k status=progress 1002979328 bytes (1.0 GB, 957 MiB) copied, 2217 s, 452 kB/s

What type of device is this? Serial? Can you reproduce this problem also when attaching from a different physical machine? If so, can you get a USB capture from the Linux side (e.g., using tshark)? By attaching from a different machine we can then get logs right up to the moment where the host crashes.

harlenli2022 commented 1 year ago

Unfortunately, I cannot reproduce the problem with my hardware (I don't have anything that uses adb). My flash drives can easily handle > 1GiB transfers, and I just tried a USB serial adapter:

frans@ubuntu:~$ sudo dd if=/dev/zero of=/dev/ttyUSB0 bs=16k status=progress 1002979328 bytes (1.0 GB, 957 MiB) copied, 2217 s, 452 kB/s

What type of device is this? Serial? Can you reproduce this problem also when attaching from a different physical machine? If so, can you get a USB capture from the Linux side (e.g., using tshark)? By attaching from a different machine we can then get logs right up to the moment where the host crashes.

Hi Dorssel,

My devices are embedded boards used in Automotive industry which has adbd running inside, in WSL we can use adb to connect to the device and adb pull/push for data exchange, the SOC from Qcom or Nvidia are used in our devices, tried different kinds of devices, same issue appeared.

Regarding your test method, I'm not clear enough how to setup, if I use a VM like VMware/VirtualBox to attach device and flash, everything is ok, no BOSD reported. BTW, I also raised on ticket in microsoft/WSL (https://github.com/microsoft/WSL/issues/9000), they suggest to disable Watson, but no chance to verify due to travel, do you think it's related to BOSD in my case?

I reviewed all related issue comments, there's one workaround to add 5ms sleep in SUPUSB_IOCTL.SEND_URB, could you support to give a draft patch? so that I can build it myself and try to check if it works in my side, lower performance is acceptable for me, but it need to work in my case, thanks a lot.

dorssel commented 1 year ago

@harlenli2022

they suggest to disable Watson

No, they suggest that you enable it (if it isn't already) and send the kernel dump to them for investigation. That is actually a good idea. It will provide a stack trace of the offending driver.

5ms sleep in SUPUSB_IOCTL.SEND_URB, could you support to give a draft patch

Yes, I will do that. I suspect it could be timing related. Later this weekend.

I'm not clear enough how to setup

The problem is that the host crashes, and with it also the WSL instance (of course). My suggestion is that you use 2 physical machines. One to host the device. And another to host the client. So: 2 Windows machines. Machine A runs usbipd-win and has the physical USB device attached. And machine B Windows + WSL. Then attach the USB device on machine A to the WSL on machine B. This most likely will again crash machine A, but machine B should be unaffected. This allows to get USB trace logging (from WSL on machine B) right up to the moment of failure.

harlenli2022 commented 1 year ago

@dorssel Thanks for your kindly support for workaround patch, where could I get it? any draft patch in PR?

harlenli2022 commented 1 year ago

The problem is that the host crashes, and with it also the WSL instance (of course). My suggestion is that you use 2 physical machines. One to host the device. And another to host the client. So: 2 Windows machines. Machine A runs usbipd-win and has the physical USB device attached. And machine B Windows + WSL. Then attach the USB device on machine A to the WSL on machine B. This most likely will again crash machine A, but machine B should be unaffected. This allows to get USB trace logging (from WSL on machine B) right up to the moment of failure.

Let me finalize the setup: Preparation:

Connection:

Expected Behaviors:

Is above understandings correct? could you check above questions about attach USB to remote WSL? thanks.

dorssel commented 1 year ago

Thanks for your kindly support for workaround patch, where could I get it? any draft patch in PR?

I'm sorry. Was a bit busy this weekend. Next opportunity will be Wednesday.

dorssel commented 1 year ago

Attach USB to remote WSL? (with command usbipd wsl attach? how to attach to remote WSL?)

Yes, but you will have to manually run usbipd bind -b <busid> on the Windows host A, followed by a manual usbip attach -r <host_A> -b <busid> from within WSL on host B (note the difference usbipd vs usbip). You cannot use the usbipd wsl convenience commands.

dorssel commented 1 year ago

@harlenli2022 A test version is available here: https://github.com/dorssel/usbipd-win/actions/runs/3381285117 It has 5ms delay between requests on the same (non-ISOC) endpoint.

harlenli2022 commented 1 year ago

@harlenli2022 A test version is available here: https://github.com/dorssel/usbipd-win/actions/runs/3381285117 It has 5ms delay between requests on the same (non-ISOC) endpoint.

Thanks @dorssel , I will verify this once device available in my hand and back to you the result.

harlenli2022 commented 1 year ago

@dorssel Unfortunately the 5ms delay version can't fix the BOSD issue in my machine, and it appeared new transfer timeout issue. What's more, when I try to follow your suggestion to setup two machine, I can't attach remote USB successfully.

  1. Connect USB to Machine A (10.175.116.82)and execute "usbipd bind -b 1-4", the device has changed to Shared.

    PS C:\WINDOWS\system32> usbipd.exe list Connected: BUSID VID:PID DEVICE STATE 1-4 0955:7023 APX Shared

  2. In Machine B WSL terminal, execute "usbip list -r 10.175.116.82" can show remote device

    Exportable USB devices

    • 10.175.116.82 1-4: NVIDIA Corp. : unknown product (0955:7023) : USB\VID_0955&PID_7023\5&1F007347&0&4 : (Defined at Interface level) (00/00/00) : 0 - Vendor Specific Class / Vendor Specific Subclass / Vendor Specific Protocol (ff/ff/ff)
  3. Try to attach USB with command "usbip attach -r 10.175.116.82 -b 1-4" in WSL, it report below error:

    _usbip: error: import device_
harlenli2022 commented 1 year ago

BTW, regarding the tshark log, I'm trying to prepare the environment, and it can capture below kinds of logs, is it the excepted log?

1160 187.391696325 Microsof_ee:df:8b → Microsof_ee:d4:6a ARP 42 Who has 172.19.240.1? Tell 172.19.241.234 1161 187.391954240 Microsof_ee:d4:6a → Microsof_ee:df:8b ARP 42 172.19.240.1 is at 00:15:5d:ee:d4:6a 1162 192.339207105 172.19.240.1 → 172.19.241.234 TCP 55 [TCP Keep-Alive] 3240 → 59614 [ACK] Seq=38705 Ack=698573 Win=8232 Len=1 1163 192.339226305 172.19.241.234 → 172.19.240.1 TCP 78 [TCP Keep-Alive ACK] 59614 → 3240 [ACK] Seq=698573 Ack=38706 Win=501 Len=0 TSval=3453744257 TSecr=107237812 SLE=38705 SRE=38706 1164 202.345423789 172.19.240.1 → 172.19.241.234 TCP 55 [TCP Keep-Alive] 3240 → 59614 [ACK] Seq=38705 Ack=698573 Win=8232 Len=1 1165 202.345445941 172.19.241.234 → 172.19.240.1 TCP 78 [TCP Keep-Alive ACK] 59614 → 3240 [ACK] Seq=698573 Ack=38706 Win=501 Len=0 TSval=3453754263 TSecr=107237812 SLE=38705 SRE=38706 1166 212.350095908 172.19.240.1 → 172.19.241.234 TCP 55 [TCP Keep-Alive] 3240 → 59614 [ACK] Seq=38705 Ack=698573 Win=8232 Len=1 1167 212.350125850 172.19.241.234 → 172.19.240.1 TCP 78 [TCP Keep-Alive ACK] 59614 → 3240 [ACK] Seq=698573 Ack=38706 Win=501 Len=0 TSval=3453764268 TSecr=107237812 SLE=38705 SRE=38706 1168 222.362643533 172.19.240.1 → 172.19.241.234 TCP 55 [TCP Keep-Alive] 3240 → 59614 [ACK] Seq=38705 Ack=698573 Win=8232 Len=1 1169 222.362661427 172.19.241.234 → 172.19.240.1 TCP 78 [TCP Keep-Alive ACK] 59614 → 3240 [ACK] Seq=698573 Ack=38706 Win=501 Len=0 TSval=3453774280 TSecr=107237812 SLE=38705 SRE=38706 1170 232.372595895 172.19.240.1 → 172.19.241.234 TCP 55 [TCP Keep-Alive] 3240 → 59614 [ACK] Seq=38705 Ack=698573 Win=8232 Len=1 1171 232.372615878 172.19.241.234 → 172.19.240.1 TCP 78 [TCP Keep-Alive ACK] 59614 → 3240 [ACK] Seq=698573 Ack=38706 Win=501 Len=0 TSval=3453784290 TSecr=107237812 SLE=38705 SRE=38706 1172 237.391635599 Microsof_ee:df:8b → Microsof_ee:d4:6a ARP 42 Who has 172.19.240.1? Tell 172.19.241.234 1173 237.391901283 Microsof_ee:d4:6a → Microsof_ee:df:8b ARP 42 172.19.240.1 is at 00:15:5d:ee:d4:6a 1174 242.379876142 172.19.240.1 → 172.19.241.234 TCP 55 [TCP Keep-Alive] 3240 → 59614 [ACK] Seq=38705 Ack=698573 Win=8232 Len=1 1175 242.379912246 172.19.241.234 → 172.19.240.1 TCP 78 [TCP Keep-Alive ACK] 59614 → 3240 [ACK] Seq=698573 Ack=38706 Win=501 Len=0 TSval=3453794298 TSecr=107237812 SLE=38705 SRE=38706 1176 252.380875689 172.19.240.1 → 172.19.241.234 TCP 55 [TCP Keep-Alive] 3240 → 59614 [ACK] Seq=38705 Ack=698573 Win=8232 Len=1 1177 252.380895560 172.19.241.234 → 172.19.240.1 TCP 78 [TCP Keep-Alive ACK] 59614 → 3240 [ACK] Seq=698573 Ack=38706 Win=501 Len=0 TSval=3453804299 TSecr=107237812 SLE=38705 SRE=38706 1178 262.383391188 172.19.240.1 → 172.19.241.234 TCP 55 [TCP Keep-Alive] 3240 → 59614 [ACK] Seq=38705 Ack=698573 Win=8232 Len=1 1179 262.383432441 172.19.241.234 → 172.19.240.1 TCP 78 [TCP Keep-Alive ACK] 59614 → 3240 [ACK] Seq=698573 Ack=38706 Win=501 Len=0 TSval=3453814301 TSecr=107237812 SLE=38705 SRE=38706 1180 272.386586921 172.19.240.1 → 172.19.241.234 TCP 55 [TCP Keep-Alive] 3240 → 59614 [ACK] Seq=38705 Ack=698573 Win=8232 Len=1 1181 272.386612270 172.19.241.234 → 172.19.240.1 TCP 78 [TCP Keep-Alive ACK] 59614 → 3240 [ACK] Seq=698573 Ack=38706 Win=501 Len=0 TSval=3453824304 TSecr=107237812 SLE=38705 SRE=38706 1182 282.399690047 172.19.240.1 → 172.19.241.234 TCP 55 [TCP Keep-Alive] 3240 → 59614 [ACK] Seq=38705 Ack=698573 Win=8232 Len=1 1183 282.399745082 172.19.241.234 → 172.19.240.1 TCP 78 [TCP Keep-Alive ACK] 59614 → 3240 [ACK] Seq=698573 Ack=38706 Win=501 Len=0 TSval=3453834317 TSecr=107237812 SLE=38705 SRE=38706

dorssel commented 1 year ago

Try to attach USB with command "usbip attach -r 10.175.116.82 -b 1-4" in WSL, it report below error:

usbip: error: import device

What OS is this? What is the version of usbip (usbip version)?

dorssel commented 1 year ago

BTW, regarding the tshark log, I'm trying to prepare the environment, and it can capture below kinds of logs, is it the excepted log?

No, this is a capture of the network. For USB capture, see for example https://serverfault.com/questions/1104416/how-to-capture-usb-traffic-using-wireshark-in-linux-cli.

harlenli2022 commented 1 year ago

Try to attach USB with command "usbip attach -r 10.175.116.82 -b 1-4" in WSL, it report below error: usbip: error: import device

What OS is this? What is the version of usbip (usbip version)?

usbip (usbip-utils 2.0) is used in my WSL Ubuntun 20.04.

dorssel commented 1 year ago

Is it not simply that you forgot sudo?

somu1795 commented 1 year ago

This happens to me whenever I try to use adb sideload , specifically adb sideload (I dont have issues pushing multi GB files). Whenever I use adb sideload inside wsl2 , it crashes with BSOD (pte_misuse)

harlenli2022 commented 1 year ago

@dorssel Sorry for late response, it cause me some time to prepare two computer and devices.

  1. In Machine A, USB attached and usbipd force bind device in order to share it to Machine B.
  2. In Machine B WSL environment, usbip attach -r xxx -b xxx to attach device to WSL.
  3. Then trigger flash via adb, after some data exchange, Machine A crashed with BSOD (PTE_MISUSE).

Current issue is how to get the USB traffic log, I checked the link you provided, and using command "tshark -D" to check the connected interface list, but no matter I attach or detach my device, the interface list was not changed, so I'm not sure which interface to capture our expected traffic log.

tshark -D

  1. ciscodump (Cisco remote capture)
  2. dpauxmon (DisplayPort AUX channel monitor capture)
  3. randpkt (Random packet generator)
  4. sdjournal (systemd Journal Export)
  5. sshdump (SSH remote capture)
  6. udpdump (UDP Listener remote capture)
dorssel commented 1 year ago

@harlenli2022 Ah, yes. I use a full Linux installation in Hyper-V. WSL doesn't have CONFIG_USB_MON configured for the default kernel...

Could you compile a custom kernel with this enabled? If not, capture ethernet port 3240 (as you did before). Maybe it already provides enough hints.

harlenli2022 commented 1 year ago

@dorssel Yes, I had compiled a custom kernel with USB monitor enabled, and now I can see usbmon in tshark -D. I will try to find out the usbmon which shall be used to capture log and share the usb log to you tomorrow. Thanks a lot for kindly support.

harlenli2022 commented 1 year ago

@dorssel Here's the usbmon log trace, since there are 3 usbmon interfaces available, not sure which one is the expected one, so collect all for issue analysis. You can get the detail log file in https://github.com/harlenli2022/dummy

dorssel commented 1 year ago

@harlenli2022 Received in good order ... still analyzing ...

usb1 is the correct capture; it is for bus 1 (on the Linux side). Your device was attached as 1-4 (on the Linux side). usb2 is the USB3 virtual bus, but it has no virtual devices attached (so your device is USB1 or USB2. usb0 contains everything (== usb1 + usb2).

harlenli2022 commented 1 year ago

@dorssel Thanks for confirmation, how is it going? Any findings?

dorssel commented 1 year ago

@harlenli2022 Not yet, will have some time during this weekend.

harlenli2022 commented 1 year ago

@dorssel Is there any interesting findings? if any penitential solution to verify, don't hesitate to share me the test build, thanks.

harlenli2022 commented 1 year ago

@dorssel Any update findings?

harlenli2022 commented 1 year ago

@dorssel How about the progress of usb log analysis? Thanks a lot.

harlenli2022 commented 1 year ago

@dorssel Any update? Thanks

dorssel commented 1 year ago

Sry, been busy with work (and still am...). At first glance: no definitive answer. But it may be too many multiple outstanding write requests on the same pipe. As it looks like more than I see on other devices. Maybe the VirtualBox driver cannot handle that. Not sure though.

d0n13 commented 1 year ago

Any progress on this? Is there something we can do on our side to help?

harlenli2022 commented 1 year ago

Sry, been busy with work (and still am...). At first glance: no definitive answer. But it may be too many multiple outstanding write requests on the same pipe. As it looks like more than I see on other devices. Maybe the VirtualBox driver cannot handle that. Not sure though.

@dorssel I don't think so, because I create another Ubuntu VM via VirtualBox, and attach device USB to VM and do the same device flash via adb, BSOD was never happened, so I think the issue is still related to USBIP/USBIPD. Detail Setup: Case1: Device USB => Windows => WSL (via usbip/usbipd), adb flash large file in WSL => BSOD Case2: Device USB => Windows => VirtualBox Ubuntu VM (attach usb in VirtualBox tool), adb flash large file in Ubuntu => NO BSOD

dorssel commented 1 year ago

@harlenli2022

I don't think so, because...

Could still be the case, though. Maybe VirtualBox uses internal queuing in "user mode", ensuring that they never send more than what the driver can handle (if that is indeed limited, not sure). usbipd-win simply passes every URB immediately to the driver, so the driver does all the queuing. Something in the driver code hints at a maximum queue depth. But that code also hints that going over that maximum should lead to a normal error return, not a crash... Nevertheless, I noticed from several PCAPs that a queue depth of 3 to 8 URBs per pipe is normal, and always works fine. And a lot of "my device does not work correctly" reports show something 8+. Maybe (just maybe) if it is really high, it may cause the crash.

dorssel commented 1 year ago

@harlenli2022 Could you try the current master build? It contains a fix that may very well solve this long-standing, nasty, BSOD.

MSI is here: https://github.com/dorssel/usbipd-win/actions/runs/5154025248

Installer is not signed, but the drivers are.

cmkf01 commented 1 year ago

Thank-you - your efforts are greatly appreciated. This update from 3.0.0 to 3.0.1 has resolved my BSOD issue. It was 100% reproducible when pushing large .apk to physical device. Happy to provide more detail if useful but my scenario seemed identical to other users commenting previously incl all the WinDbg output and it seems like you've cracked it. Thanks again.

dorssel commented 1 year ago

Version 3.1.0 (https://github.com/dorssel/usbipd-win/releases/tag/v3.1.0) was released. This should fix the BSOD. Can this be confirmed?

mancioshell commented 1 year ago

I can confirm it, on my machine.