Open danielk-98 opened 1 year ago
The only weird thing that I can see in this log is:
trce: Usbipd.AttachedClient[1001]
Unlinking 105, pending = True, pending count = 1
trce: Usbipd.AttachedClient[1001]
Aborting endpoint 130
dbug: Usbipd.AttachedClient[1000]
USBSUP_XFER_STALL -> EPIPE -> -32
This is indeed the last URB that is logged in the PCap as well. The Linux log on the other hand shows:
[ 2515.727820] vhci_hcd: unlink->seqnum 106
[ 2515.728077] vhci_hcd: the urb (seqnum 106) was already given back
Which is weird, because the last URB is 105; there never was a URB 106. What driver is being used on the Linux side?
Thanks for the help!
It looks like no specific driver is used for the NXT; It lists as a "Generic USB device". So nexttool
(and the other, similar programs) probably just use libusb to access the device directly, since most interactions are so simple.
Running sudo lspci -v
shows:
*-usbhost:0
product: USB/IP Virtual Host Controller
vendor: Linux 5.15.90.1-microsoft-standard-WSL2 vhci_hcd
physical id: 1
bus info: usb@1
logical name: usb1
version: 5.15
capabilities: usb-2.00
configuration: driver=hub slots=8 speed=480Mbit/s
*-usb UNCLAIMED
description: Generic USB device
product: Mindstorms NXT
vendor: Lego Group
physical id: 1
bus info: usb@1:1
version: 0.00
serial: 0016530C69D5
capabilities: usb-2.00
configuration: speed=12Mbit/s
Anything else I could try to collect more information?
I installed WireShark in a Virtualbox VM (Ubuntu 20.04) to see what a successful transaction looks like. Then I reran the test in WSL2 for comparison.
nexttool -deviceinfo
nexttool -deviceinfo
again (WSL fails at this point)The two logs show almost identical URBs during this test. Only differences I can see are:
GET DESCRIPTOR Request DEVICE
but Virtualbox only sends oneURB_BULK in
, fails with URB status: Broken pipe (-EPIPE) (-32)
(which corresponds to the weird server logs you pointed out) and looses the 33-byte payload that is supposed to be there.Not sure if this adds anything or not. Very confusing why the transaction only fails on the second run!
Here's the vbox pcap: virtualbox.zip
WSL always sends two GET DESCRIPTOR Request DEVICE but Virtualbox only sends one
This may be due to caching (which usbipd
does not do). First, the OS queries "what device is this?". Based on the answer, a certain driver is asked to handle the device. That driver probably can serve multiple devices, so it too first asks "what device is this?" Maybe VirtualBox caches the result of the static reply (the device descriptor never changes), and does not actually ask the device again but simply returns the same result for the second query. Since the second query is never actually sent to the device, it does not get captured. In any case, this difference is benign.
WSL's last URB 105,
URB_BULK in
, fails withURB status: Broken pipe (-EPIPE) (-32)
This is (of course) the problem. All I can think of is that during "SET CONFIGURATION", maybe more needs to be done than just that. Some sort of pipe reset, or so. Just a wild guess ... I will look into it.
Hi,
I have encounter similar problem and I followed the WSL installation instructions here: https://www.xda-developers.com/wsl-connect-usb-devices-windows-11/
There's a step that install the user space tools for USB/IP and a database of USB hardware identifiers... I'm not sure if that would related to the issues.
After all installation success and I connected my astronomical camera. The linux has a server running and the device was functioning probably (e.g. the driver can send commands to the device and responding), until the camera capture started and transfer the data back. At the data trasnfer via the USB, error shows up in dmesg.
From the error descriptions and the pcapng capture, the symptoms are very similar. The WLS URB encounter broken pipe with a URB# in the linux sub system but the same URB# did not appears in the WSL.
The dmesg is showing seqnum 527, 529, 531, etc but the WSL showing 528, 530, 532....
[ 110.868153] vhci_hcd: unlink->seqnum 527 [ 110.868155] vhci_hcd: the urb (seqnum 527) was already given back [ 111.419232] vhci_hcd: unlink->seqnum 529 [ 111.419250] vhci_hcd: the urb (seqnum 529) was already given back [ 111.970243] vhci_hcd: unlink->seqnum 531 [ 111.970246] vhci_hcd: the urb (seqnum 531) was already given back [ 112.521337] vhci_hcd: unlink->seqnum 533 [ 112.521340] vhci_hcd: the urb (seqnum 533) was already given back
dbug: Usbipd.AttachedClient[1000] USBSUP_XFER_STALL -> EPIPE -> -32 trce: Usbipd.AttachedClient[1001] Unlinking 528, pending = True, pending count = 1 trce: Usbipd.AttachedClient[1001] Aborting endpoint 4 dbug: Usbipd.AttachedClient[1000] USBSUP_XFER_STALL -> EPIPE -> -32 trce: Usbipd.AttachedClient[1001] Unlinking 530, pending = True, pending count = 1 trce: Usbipd.AttachedClient[1001] Aborting endpoint 4 dbug: Usbipd.AttachedClient[1000] USBSUP_XFER_STALL -> EPIPE -> -32 trce: Usbipd.AttachedClient[1001] Unlinking 532, pending = True, pending count = 1 trce: Usbipd.AttachedClient[1001] Aborting endpoint 4 dbug: Usbipd.AttachedClient[1000] USBSUP_XFER_STALL -> EPIPE -> -32 trce: Usbipd.AttachedClient[1001] Unlinking 534, pending = True, pending count = 1 trce: Usbipd.AttachedClient[1001] Aborting endpoint 4
Attached is the pcapng capture log, the linux dmesg and the wsl debug output
This may have been fixed with #684. Could you give that build a try? The installer is at https://github.com/dorssel/usbipd-win/actions/runs/6061183362.
Thank you. Unfortunately it's still not working for me if I'm correct the status -104 is ECONNRESET which is different from before I will try and capture more logs later today
Thanks again
[ 397.651145] vhci_hcd: unlink->seqnum 1893 [ 397.651148] vhci_hcd: urb->status -104 [ 398.202404] vhci_hcd: unlink->seqnum 1895 [ 398.202406] vhci_hcd: urb->status -104 [ 398.753862] vhci_hcd: unlink->seqnum 1897 [ 398.753864] vhci_hcd: urb->status -104 [ 399.305095] vhci_hcd: unlink->seqnum 1899 [ 399.305097] vhci_hcd: urb->status -104 [ 399.856204] vhci_hcd: unlink->seqnum 1901 [ 399.856207] vhci_hcd: urb->status -104 [ 400.407446] vhci_hcd: unlink->seqnum 1903
I have difficulties to get into more information after having the 3.1.1 msi installed. First WSL USB GUI reported some installation dependencies were missing and asking to re-install. After reinstall it keeps saying the same. Restart Windows and it seemed to work but then my device originally named "Atik Camera" is replaced as "USBIP Shared Device" (I can spot they in the same DEV ID). I tried using usbipd command to attached manually - attachment was successful but the device is not showing in the WSL guest Ubuntu. I will try to remove everything and try running via Visual Studio and see if I can get more information..... The only successful attempt I tried is that in VirtualBox Guest Linux, the device & the transfer works fine - so I can say the device do not have a problem working under usb/ip protocol... Yet I looked into the ECONNRESET status - it's a successful state for the urb unlink. So the problem may be caused by a different issue
WSL USB GUI reported some installation dependencies
3.1.1 is a dev version. I am not surprised it does not work with the external GUI tool. But you may want to report it to that project as well.
attachment was successful but the device is not showing in the WSL guest Ubuntu
That is weird.. What is the output of lsusb
and do you have a syslog?
Yet I looked into the ECONNRESET status - it's a successful state for the urb unlink. So the problem may be caused by a different issue
Yes, the issue seems not related.
I loaded the master branch into Visual Studio and have it started and running in the shell. I still see my device showing as "USBIP Shared Device" but there're no problems on attaching it now and showing OK in the WSL guest.
I started the camera capturing and found that although the dmesg print is not the same before but the problem symtom seemed to be the same, after checking the pcapng capture against the dmesg:
pcapng shows URB packets with URB ID (0a43=2627, 0a45=2629, etc) being submitted and completed but on the other hand dmesg shows none of those URB#. The dmesg showing unlink with URB ID 2626, 2628, 2630 but these numbers are not appearing in the URB trace pcapng
meanwhile the VS trace output shows USBSUP_XFER_STALL -> EPIPE -> -32
These all happens when the camera finished its capturing and trying to download the data from the camera. The camera driver showing BulkRead Failed. I checked at the exact time of the download starts, the URB packets started to slip... at the attached pcapng capstures, line#1796 the URB are good in sequence 09fc, 09fd, but after this point 09fe is missing and jumps to 09ff and 0a01 (0a00 is missing) and so on.... meanwhile these missing URB# are showing in the kernel unlink
I can try and trace and see what caused the USBSUP_XFER_STALL but I have very little experiences on the USB/IP. I will have to study and see how I can trap the packets and find out.
[ 1757.350935] vhci_hcd: unlink->seqnum 2624 [ 1757.350937] vhci_hcd: urb->status -104 [ 1757.902103] vhci_hcd: unlink->seqnum 2626 [ 1757.902106] vhci_hcd: urb->status -104 [ 1758.453133] vhci_hcd: unlink->seqnum 2628 [ 1758.453136] vhci_hcd: urb->status -104 [ 1759.004167] vhci_hcd: unlink->seqnum 2630
dbug: 7-3.4[1000] USBSUP_XFER_STALL -> EPIPE -> -32 trce: 7-3.4[1001] Unlinking: PendingSubmits=1, PendingUnlinks=1 trce: 7-3.4[1001] Aborting endpoint dbug: 7-3.4[1000] USBSUP_XFER_STALL -> EPIPE -> -32 trce: 7-3.4[1001] Unlinking: PendingSubmits=1, PendingUnlinks=1 trce: 7-3.4[1001] Aborting endpoint dbug: 7-3.4[1000] USBSUP_XFER_STALL -> EPIPE -> -32 trce: 7-3.4[1001] Unlinking: PendingSubmits=1, PendingUnlinks=1 trce: 7-3.4[1001] Aborting endpoint dbug: 7-3.4[1000] USBSUP_XFER_STALL -> EPIPE -> -32 trce: 7-3.4[1001] Unlinking: PendingSubmits=1, PendingUnlinks=1 trce: 7-3.4[1001] Aborting endpoint dbug: 7-3.4[1000] USBSUP_XFER_STALL -> EPIPE -> -32 trce: 7-3.4[1001]
The messages you see are normal for cancelation of a transfer. It looks like the client (WSL) is requesting data, which is then canceled again by the client. This leads to UNLINK of the outstanding read URB.
UNLINK
, Aborting endpoint
, USBSUP_XFER_STALL -> EPIPE -> -32
, and status -104
are all normal messages that belong to the cancelation by the client of an earlier read (or write) request. They are not errors.
OK. Thank you! How about the URB ID jumps on URB_BULK out? I assume the URB# should be in sequence under normal operations, correct? The jumping of the URB# may be an indication on missing chunks from the host to the device?
UNLINKs are not logged (they are technically not URBs, but instead the cancelation of an earlier URB). The "URB#" that is logged is really the USBIP request number. So:
0x9fd is an out request, i.e. an URB, so it is logged
0x9fe is the UNLINK (cancelation) request for 0x9fd (not logged in PCap, as it does not support UNLINK requests, there is no format for it).
0x9ff is another out request
0xa00 is another UNLINK, for 0x9ff
0xa01 is another out request
etc.
What is interesting is that at the start of the capture, apparently the status is read back (endpoint 6) alongside the output (endpoint 4). But from the failures on, only failing outputs, no more input requests. Also: the EPIPE completion after cancelation is always 0.5 seconds after the out request. So it appears to be a timeout issue: an out request is sent, which after 0.5 seconds gets canceled before it completes. Then (of course) it acknowledges cancelation with EPIPE (that's just how it works).
Questions are:
OK I see where the chunks started the 0.5s cancel and I do more captures including what's happening in the Windows host. I can see the URB_FUNCTION_ABORT_PIPE (0x0002) at the time the problem happens at the host computer. I do not have the source code of the driver so it's a little bit difficult to tell why it timed out.
I have also captured the camera operating under a guest virtualbox which the image download is successful, but only the USB traffic via usbpcap in Windows host. Also I capture the USB when I operate the camera directly in the Windows machine. I do see bigger chunks (>16Kb) frames for those successful attempts. So it seemed the transfer never started in the WSL...
From the localonly capture, focusing on the bulk endpoint (endpoint 8):
1768 18.591696 2.17.8 host USB 34 URB_BULK in
2451 22.709311 host 2.17.8 USB 27 URB_BULK in
2452 22.709394 2.17.8 host USB 34 URB_BULK in
2505 22.924759 host 2.17.8 USB 27 URB_BULK in
2506 22.924848 2.17.8 host USB 34 URB_BULK in
2647 24.433789 host 2.17.8 USB 27 URB_BULK in
2810 25.978284 2.17.8 host USB 2097179 URB_BULK in
2811 25.978310 host 2.17.8 USB 27 URB_BULK in
2864 27.168742 2.17.8 host USB 2097179 URB_BULK in
2865 27.168773 host 2.17.8 USB 27 URB_BULK in
2920 28.359342 2.17.8 host USB 2097179 URB_BULK in
2921 28.359372 host 2.17.8 USB 27 URB_BULK in
Around the same "logical" point, using usbipd-win
:
4216 34.938190 2.16.8 host USB 34 URB_BULK in
4321 35.940501 host 2.16.8 USB 27 URB_BULK in
4322 35.940554 2.16.8 host USB 34 URB_BULK in
4363 36.143332 host 2.16.8 USB 27 URB_BULK in
4364 36.143358 2.16.8 host USB 34 URB_BULK in
4435 37.232748 host 2.16.8 USB 27 URB_FUNCTION_ABORT_PIPE
4436 37.232750 2.16.8 host USB 27 URB_FUNCTION_ABORT_PIPE
It looks like the device wants to transfer 2097179 bytes (about 2 MB) per request. That is quite large (in fact, the largest I've ever seen, per transfer). Either this times out due to latency, or that's beyond some limit (unknown to me). In any case, the transfer does not complete within 0.5 seconds on usbipd-win
, after which the client (i.e., WSL) cancels the request. On the other hand, the transfers take between 1 and 1.5 seconds in the localonly case. So, it looks like "something" (don't know what, Linux USBIP driver?) enforces a timeout of 0.5 seconds, but the transfer really requires >1 seconds to complete.
Thanks for your analysis on the traffic. I certainly think that the Windows driver with its own implementation that support much larger transfer block size on the USB device if accessed locally.
About the timeout scenario, I was thinking for such case, I would probably see a larger block of transfer at the Windows host but do not see them in the guest. That's also why I captured using wireshark + UsbPcap to log the traffic in the host machine: usb_cap_usbipd_wsl2_host.pcapng.
However upon checking the two logs: the largest block I can find in the host capture was only 195 bytes and it was not even closed to the time when the transfer issue happens (the problem happened around 12:17:10 GMT+8). At the time of the problem happens, there were only chunks in sizes of 43, 27 bytes being transferred. So it seemed like the transfer may have never started (?).
** just an update if I'm not mistaken by filtering the endpoints to compare the traffic, there is one 27-byte URB request in Virtualbox VM traffic (#1627) would trigger the subsequence response for the 16Kb data coming back from the device. Comparing to the WSL guest captures, it seems that the usbipd never receives such URB request and time out after 30s? Am I interpreting this correctly?
USB Host traffic started with WSL2
No. Time Source Destination Protocol Length Info
10489 88.790947 host 2.16.8 USB 27 URB_BULK in
10490 88.790997 2.16.8 host USB 34 URB_BULK in
10499 89.793834 host 2.16.8 USB 27 URB_BULK in
10500 89.793873 2.16.8 host USB 34 URB_BULK in
10509 89.996358 host 2.16.8 USB 27 URB_BULK in
10510 89.996389 2.16.8 host USB 34 URB_BULK in
10533 90.998272 host 2.16.8 USB 27 URB_BULK in
10534 90.998306 2.16.8 host USB 34 URB_BULK in
10543 91.201730 host 2.16.8 USB 27 URB_BULK in
10544 91.201777 2.16.8 host USB 34 URB_BULK in
<-- missing URB here and then after 30 secs it timed out?
12687 121.952288 host 2.16.8 USB 27 URB_FUNCTION_ABORT_PIPE
12688 121.952289 2.16.8 host USB 27 URB_FUNCTION_ABORT_PIPE
Comparing to the chunks captured in Virtualbox VM transfer (same version of ubuntu guest, and I only capture the Windows Host traffic but not the Linux guest) - I spotted there're transfers of many sets of 16Kb blocks (to be exact 16411 bytes, that's 16384 + 27 bytes) - which are likely the URB that carries the 16Kb chunk image data with the URB headers and completed the transfer and each of them takes ~10ms. Not 100% sure about the mechanism of virtualbox USB passthrough - I just assume it's also an implementation of USB/IP.
USB Host traffic when running in VirtualBox VM
No. Time Source Destination Protocol Length Info
622 1.672115 2.18.8 host USB 34 URB_BULK in
943 2.689192 host 2.18.8 USB 27 URB_BULK in
944 2.689259 2.18.8 host USB 34 URB_BULK in
1029 2.910157 host 2.18.8 USB 27 URB_BULK in
1030 2.910185 2.18.8 host USB 34 URB_BULK in
1627 5.309898 host 2.18.8 USB 27 URB_BULK in <--
1708 5.709428 2.18.8 host USB 16411 URB_BULK in <--
1711 5.711395 host 2.18.8 USB 27 URB_BULK in
1712 5.719471 2.18.8 host USB 16411 URB_BULK in
1713 5.719552 host 2.18.8 USB 27 URB_BULK in
1716 5.728699 2.18.8 host USB 16411 URB_BULK in
1717 5.728893 host 2.18.8 USB 27 URB_BULK in
1720 5.738034 2.18.8 host USB 16411 URB_BULK in
1721 5.739345 host 2.18.8 USB 27 URB_BULK in
1724 5.747929 2.18.8 host USB 16411 URB_BULK in
1725 5.748022 host 2.18.8 USB 27 URB_BULK in
1728 5.757292 2.18.8 host USB 16411 URB_BULK in
1729 5.757415 host 2.18.8 USB 27 URB_BULK in
1732 5.766492 2.18.8 host USB 16411 URB_BULK in
I did the capture once again today with release 3.2.0 and there're some interesting findings. The overall process still failed. And I was able to locate the full image data chunks, but they only comes back, after the device has been re-initialized at a later time again and being configured: usbipd.zip
_dmesg_20230912_01.txt is the dmesg output indi_app_log.txt is the application log usbipd01.pcapng is the PcapNg capture
2023-09-12 20:37:14 Driver start to capture an image with 10sec exposure. So expect at 20:37:24 the image will be ready for download.
2023-09-12 20:37:25 Driver started reporting "ParDeviceLibUSB::In - BulkRead Failed!". The image data didn't come in as expected. At the same time dmesg started showing the unlinks: [Tue Sep 12 20:37:25 2023] vhci_hcd: unlink->seqnum 3842 [Tue Sep 12 20:37:25 2023] vhci_hcd: urb->status -104 [Tue Sep 12 20:37:26 2023] vhci_hcd: unlink->seqnum 3844 [Tue Sep 12 20:37:26 2023] vhci_hcd: urb->status -104
At this point, the PcapNg didn't really show anything for endpoint 8. I saw the time out occurs in endpoint 4.
2023-09-12 20:39:40 I manually abort the capture. The device takes a long time to stop as it seemed still running.
2023-09-12 20:39:58 Software stopped. Process terminated.
2023-09-12 20:40:14 Then I started a new instance of the server.
2023-09-12 20:40:21 Camera being initialized by the new process again and being setup, then:
At this particular time, there're a set of 19 x URB_BULK IN each size 1000065 bytes (around 976 Kb) , total around ~18Mb showing in the PcapNg. The 18Mb is also the usual image full file size. So it looks like these URB_BULK IN chunks were "delayed" until the device was being initialized the second time. At the same time, the drivers log showing a lots of "ParDeviceLibUSB::In - OK!! " which indicates the driver were able to "receive" these chunks but unable to process them because they're not associated with any capture request. I cannot be sure but seemed the initialization happened at the 2nd time release/flush those previous URB chunks queued somewhere maybe?
No. Time Source Destination Protocol Length Info
1682 2023-09-12 20:40:21.787241 6.4.8 host USB 67 URB_BULK in
1687 2023-09-12 20:40:21.838999 host 6.4.8 USB 64 URB_BULK in
1688 2023-09-12 20:40:22.407024 6.4.8 host USB 1000065 URB_BULK in
1691 2023-09-12 20:40:22.415025 host 6.4.8 USB 64 URB_BULK in
1692 2023-09-12 20:40:22.982857 6.4.8 host USB 1000065 URB_BULK in
1695 2023-09-12 20:40:22.987156 host 6.4.8 USB 64 URB_BULK in
1696 2023-09-12 20:40:23.554518 6.4.8 host USB 1000065 URB_BULK in
1699 2023-09-12 20:40:23.557446 host 6.4.8 USB 64 URB_BULK in
1700 2023-09-12 20:40:24.125093 6.4.8 host USB 1000065 URB_BULK in
1703 2023-09-12 20:40:24.128214 host 6.4.8 USB 64 URB_BULK in
1704 2023-09-12 20:40:24.695626 6.4.8 host USB 1000065 URB_BULK in
1707 2023-09-12 20:40:24.698106 host 6.4.8 USB 64 URB_BULK in
1708 2023-09-12 20:40:25.265646 6.4.8 host USB 1000065 URB_BULK in
1711 2023-09-12 20:40:25.269358 host 6.4.8 USB 64 URB_BULK in
1712 2023-09-12 20:40:25.836214 6.4.8 host USB 1000065 URB_BULK in
1715 2023-09-12 20:40:25.838712 host 6.4.8 USB 64 URB_BULK in
1716 2023-09-12 20:40:26.407745 6.4.8 host USB 1000065 URB_BULK in
Hello! First off, thanks so much for your work on this project. It's crazy how important this is for Windows and WSL, and it's a shame it's not receiving more direct support from Microsoft.
The problem: I have a microcontroller which successfully attaches to WSL. A program (based on libusb) connects to the device and requests some information. The program successfully transfers the data the first time it is run; running the program a second time fails.
In WSL, "dmesg" shows errors like:
Host: Windows 11 build 22000 WSL version: 1.2.0.0 Kernel version: 5.15.90.1 WSLg version: 1.0.51 MSRDC version: 1.2.3770 Direct3D version: 1.608.2-61064218 DXCore version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp Windows version: 10.0.22000.1817 USBIPD 2.4.1+1.Branch.master.Sha.90acf9456020ca8c6310ca62a71ee23cb6ca34ad
Guest: Ubuntu 20.04 Kernel: 5.15.90.1-microsoft-standard-WSL2
Procedure:
1. USBIPD server is started with logging:
2. Device is attached from Windows:
Server and "
dmesg
" logs immediately after connection:3. WSL shows the device is available at 1-4:
4. First transaction with the device is successful:
Server and dmesg:
5. Second transaction with the device fails:
Server and dmesg:
6. Device is detached from Windows:
Server and dmesg:
Things I've Tried
I have no problems on VirtualBox with Ubuntu 20.04, so it seems likely to be an issue with USBIPD or with the WSL kernel.
Could this be similar to 530? https://github.com/dorssel/usbipd-win/issues/530
I've attached the full server and dmesg logs, along with the PcapNg for this session.
Thank you for the help! dmesg.txt serverlog.txt pcapng.zip