URB_BULK transfers work only once, then never again until device is replugged.

dorssel / usbipd-win

Windows software for sharing locally connected USB devices to other machines, including Hyper-V guests and WSL 2.

GNU General Public License v3.0

3.8k stars 234 forks source link

URB_BULK transfers work only once, then never again until device is replugged. #581

Open danielk-98 opened 1 year ago

danielk-98 commented 1 year ago

Hello! First off, thanks so much for your work on this project. It's crazy how important this is for Windows and WSL, and it's a shame it's not receiving more direct support from Microsoft.

The problem: I have a microcontroller which successfully attaches to WSL. A program (based on libusb) connects to the device and requests some information. The program successfully transfers the data the first time it is run; running the program a second time fails.

In WSL, "dmesg" shows errors like:

vhci_hcd: vhci_device speed not set
vhci_hcd: the urb (seqnum 106) was already given back

Host: Windows 11 build 22000 WSL version: 1.2.0.0 Kernel version: 5.15.90.1 WSLg version: 1.0.51 MSRDC version: 1.2.3770 Direct3D version: 1.608.2-61064218 DXCore version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp Windows version: 10.0.22000.1817 USBIPD 2.4.1+1.Branch.master.Sha.90acf9456020ca8c6310ca62a71ee23cb6ca34ad

Guest: Ubuntu 20.04 Kernel: 5.15.90.1-microsoft-standard-WSL2

Procedure:

1. USBIPD server is started with logging:

usbipd server Logging:LogLevel:Default=Trace 'usbipd:PcapNg:Path=C:\Users\Daniel\Desktop\usbip.pcapng'

2. Device is attached from Windows:

usbipd wsl attach -d ubuntu20test -i 0694:0002

Server and "dmesg" logs immediately after connection:

dbug: Usbipd.Server[1000]
      new connection from 172.19.237.228
dbug: Usbipd.Server[1000]
      connection close: Attempted to read past the end of the stream.
dbug: Usbipd.Server[1000]
      connection closed
dbug: Usbipd.Server[1000]
      new connection from 172.19.237.228
dbug: Usbipd.ConnectedClient[1000]
      Received opcode: OP_REQ_IMPORT
info: Usbipd.ConnectedClient[1]
      Client 172.19.237.228 claimed device at 6-4 (USB\VID_0694&PID_0002\0016530C69D5).
dbug: Usbipd.AttachedClient[1000]
      USBSUP_XFER_STALL -> EPIPE -> -32
dbug: Usbipd.AttachedClient[1000]
      USBSUP_XFER_STALL -> EPIPE -> -32
dbug: Usbipd.AttachedClient[1000]
      USBSUP_XFER_STALL -> EPIPE -> -32
dbug: Usbipd.AttachedClient[1000]
      Trapped SET_CONFIGURATION: 1

[ 2238.118580] vhci_hcd vhci_hcd.0: pdev(0) rhport(0) sockfd(3)
[ 2238.119114] vhci_hcd vhci_hcd.0: devid(393220) speed(2) speed_str(full-speed)
[ 2238.119497] vhci_hcd vhci_hcd.0: Device attached
[ 2238.391478] vhci_hcd: vhci_device speed not set
[ 2238.461381] usb 1-1: new full-speed USB device number 4 using vhci_hcd
[ 2238.551402] vhci_hcd: vhci_device speed not set
[ 2238.621366] usb 1-1: SetAddress Request (4) to port 0
[ 2238.661024] usb 1-1: New USB device found, idVendor=0694, idProduct=0002, bcdDevice= 0.00
[ 2238.661377] usb 1-1: New USB device strings: Mfr=0, Product=0, SerialNumber=1
[ 2238.661633] usb 1-1: SerialNumber: 0016530C69D5

3. WSL shows the device is available at 1-4:

$ lsusb
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 004: ID 0694:0002 Lego Group Mindstorms NXT
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

4. First transaction with the device is successful:

$ nexttool -deviceinfo
Brick name = NXT4
Bluetooth Address = 00:16:53:0C:69:D5
Bluetooth signal strength = 0,0,0,0
Free memory = 129504

Server and dmesg:

dbug: Usbipd.AttachedClient[1000]
      Trapped SET_CONFIGURATION: 1

[ 2415.941405] vhci_hcd: vhci_device speed not set
[ 2416.011386] usb 1-1: reset full-speed USB device number 4 using vhci_hcd
[ 2416.091365] vhci_hcd: vhci_device speed not set
[ 2416.161394] usb 1-1: SetAddress Request (4) to port 0

5. Second transaction with the device fails:

$ nexttool -deviceinfo

Server and dmesg:

dbug: Usbipd.AttachedClient[1000]
      Trapped SET_CONFIGURATION: 1
trce: Usbipd.AttachedClient[1001]
      Unlinking 105, pending = True, pending count = 1
trce: Usbipd.AttachedClient[1001]
      Aborting endpoint 130
dbug: Usbipd.AttachedClient[1000]
      USBSUP_XFER_STALL -> EPIPE -> -32

[ 2514.461372] vhci_hcd: vhci_device speed not set
[ 2514.531477] usb 1-1: reset full-speed USB device number 4 using vhci_hcd
[ 2514.611432] vhci_hcd: vhci_device speed not set
[ 2514.681471] usb 1-1: SetAddress Request (4) to port 0
[ 2515.727820] vhci_hcd: unlink->seqnum 106
[ 2515.728077] vhci_hcd: the urb (seqnum 106) was already given back

6. Device is detached from Windows:

usbipd wsl detach -i 0694:0002

Server and dmesg:

dbug: Usbipd.ConnectedClient[1000]
      Unbind or unplug while attached
info: Usbipd.ConnectedClient[2]
      Client 172.19.237.228 released device at 6-4 (USB\VID_0694&PID_0002\0016530C69D5).
dbug: Usbipd.Server[1000]
      connection close: The operation was canceled.
dbug: Usbipd.Server[1000]
      connection closed

[ 2676.358117] vhci_hcd: connection closed
[ 2676.358229] vhci_hcd: stop threads
[ 2676.358896] vhci_hcd: release socket
[ 2676.359092] vhci_hcd: disconnect device
[ 2676.359385] usb 1-1: USB disconnect, device number 4

Things I've Tried

Booting WSL2 with and without SystemD init.
Modifying UDEV rules.
Running "nexttool" with and without "sudo".
Running WSL2 from an Administrator terminal.
Restarting UDEV service

I have no problems on VirtualBox with Ubuntu 20.04, so it seems likely to be an issue with USBIPD or with the WSL kernel.

Could this be similar to 530? https://github.com/dorssel/usbipd-win/issues/530

I've attached the full server and dmesg logs, along with the PcapNg for this session.

Thank you for the help! dmesg.txt serverlog.txt pcapng.zip

dorssel commented 1 year ago

The only weird thing that I can see in this log is:

trce: Usbipd.AttachedClient[1001]
      Unlinking 105, pending = True, pending count = 1
trce: Usbipd.AttachedClient[1001]
      Aborting endpoint 130
dbug: Usbipd.AttachedClient[1000]
      USBSUP_XFER_STALL -> EPIPE -> -32

This is indeed the last URB that is logged in the PCap as well. The Linux log on the other hand shows:

[ 2515.727820] vhci_hcd: unlink->seqnum 106
[ 2515.728077] vhci_hcd: the urb (seqnum 106) was already given back

Which is weird, because the last URB is 105; there never was a URB 106. What driver is being used on the Linux side?

danielk-98 commented 1 year ago

Thanks for the help! It looks like no specific driver is used for the NXT; It lists as a "Generic USB device". So nexttool (and the other, similar programs) probably just use libusb to access the device directly, since most interactions are so simple.

Running sudo lspci -v shows:

  *-usbhost:0
       product: USB/IP Virtual Host Controller
       vendor: Linux 5.15.90.1-microsoft-standard-WSL2 vhci_hcd
       physical id: 1
       bus info: usb@1
       logical name: usb1
       version: 5.15
       capabilities: usb-2.00
       configuration: driver=hub slots=8 speed=480Mbit/s
     *-usb UNCLAIMED
          description: Generic USB device
          product: Mindstorms NXT
          vendor: Lego Group
          physical id: 1
          bus info: usb@1:1
          version: 0.00
          serial: 0016530C69D5
          capabilities: usb-2.00
          configuration: speed=12Mbit/s

Anything else I could try to collect more information?

danielk-98 commented 1 year ago

I installed WireShark in a Virtualbox VM (Ubuntu 20.04) to see what a successful transaction looks like. Then I reran the test in WSL2 for comparison.

Connect the NXT device to the VM.
Run nexttool -deviceinfo
Run nexttool -deviceinfo again (WSL fails at this point)
Disconnect the device and save the USB logs

The two logs show almost identical URBs during this test. Only differences I can see are:

WSL always sends two GET DESCRIPTOR Request DEVICE but Virtualbox only sends one
WSL's last URB 105, URB_BULK in, fails with URB status: Broken pipe (-EPIPE) (-32) (which corresponds to the weird server logs you pointed out) and looses the 33-byte payload that is supposed to be there.

Not sure if this adds anything or not. Very confusing why the transaction only fails on the second run!

Here's the vbox pcap: virtualbox.zip

dorssel commented 1 year ago

WSL always sends two GET DESCRIPTOR Request DEVICE but Virtualbox only sends one

This may be due to caching (which usbipd does not do). First, the OS queries "what device is this?". Based on the answer, a certain driver is asked to handle the device. That driver probably can serve multiple devices, so it too first asks "what device is this?" Maybe VirtualBox caches the result of the static reply (the device descriptor never changes), and does not actually ask the device again but simply returns the same result for the second query. Since the second query is never actually sent to the device, it does not get captured. In any case, this difference is benign.

WSL's last URB 105, URB_BULK in, fails with URB status: Broken pipe (-EPIPE) (-32)

This is (of course) the problem. All I can think of is that during "SET CONFIGURATION", maybe more needs to be done than just that. Some sort of pipe reset, or so. Just a wild guess ... I will look into it.

sywong2000 commented 1 year ago

Hi,

I have encounter similar problem and I followed the WSL installation instructions here: https://www.xda-developers.com/wsl-connect-usb-devices-windows-11/

There's a step that install the user space tools for USB/IP and a database of USB hardware identifiers... I'm not sure if that would related to the issues.

After all installation success and I connected my astronomical camera. The linux has a server running and the device was functioning probably (e.g. the driver can send commands to the device and responding), until the camera capture started and transfer the data back. At the data trasnfer via the USB, error shows up in dmesg.

From the error descriptions and the pcapng capture, the symptoms are very similar. The WLS URB encounter broken pipe with a URB# in the linux sub system but the same URB# did not appears in the WSL.

The dmesg is showing seqnum 527, 529, 531, etc but the WSL showing 528, 530, 532....

[ 110.868153] vhci_hcd: unlink->seqnum 527 [ 110.868155] vhci_hcd: the urb (seqnum 527) was already given back [ 111.419232] vhci_hcd: unlink->seqnum 529 [ 111.419250] vhci_hcd: the urb (seqnum 529) was already given back [ 111.970243] vhci_hcd: unlink->seqnum 531 [ 111.970246] vhci_hcd: the urb (seqnum 531) was already given back [ 112.521337] vhci_hcd: unlink->seqnum 533 [ 112.521340] vhci_hcd: the urb (seqnum 533) was already given back

dbug: Usbipd.AttachedClient[1000] USBSUP_XFER_STALL -> EPIPE -> -32 trce: Usbipd.AttachedClient[1001] Unlinking 528, pending = True, pending count = 1 trce: Usbipd.AttachedClient[1001] Aborting endpoint 4 dbug: Usbipd.AttachedClient[1000] USBSUP_XFER_STALL -> EPIPE -> -32 trce: Usbipd.AttachedClient[1001] Unlinking 530, pending = True, pending count = 1 trce: Usbipd.AttachedClient[1001] Aborting endpoint 4 dbug: Usbipd.AttachedClient[1000] USBSUP_XFER_STALL -> EPIPE -> -32 trce: Usbipd.AttachedClient[1001] Unlinking 532, pending = True, pending count = 1 trce: Usbipd.AttachedClient[1001] Aborting endpoint 4 dbug: Usbipd.AttachedClient[1000] USBSUP_XFER_STALL -> EPIPE -> -32 trce: Usbipd.AttachedClient[1001] Unlinking 534, pending = True, pending count = 1 trce: Usbipd.AttachedClient[1001] Aborting endpoint 4

Attached is the pcapng capture log, the linux dmesg and the wsl debug output

ubuntu_dmesg.txt wsl_log.txt usbipd.zip

dorssel commented 1 year ago

This may have been fixed with #684. Could you give that build a try? The installer is at https://github.com/dorssel/usbipd-win/actions/runs/6061183362.

sywong2000 commented 1 year ago

Thank you. Unfortunately it's still not working for me if I'm correct the status -104 is ECONNRESET which is different from before I will try and capture more logs later today

Thanks again

[ 397.651145] vhci_hcd: unlink->seqnum 1893 [ 397.651148] vhci_hcd: urb->status -104 [ 398.202404] vhci_hcd: unlink->seqnum 1895 [ 398.202406] vhci_hcd: urb->status -104 [ 398.753862] vhci_hcd: unlink->seqnum 1897 [ 398.753864] vhci_hcd: urb->status -104 [ 399.305095] vhci_hcd: unlink->seqnum 1899 [ 399.305097] vhci_hcd: urb->status -104 [ 399.856204] vhci_hcd: unlink->seqnum 1901 [ 399.856207] vhci_hcd: urb->status -104 [ 400.407446] vhci_hcd: unlink->seqnum 1903

sywong2000 commented 1 year ago

I have difficulties to get into more information after having the 3.1.1 msi installed. First WSL USB GUI reported some installation dependencies were missing and asking to re-install. After reinstall it keeps saying the same. Restart Windows and it seemed to work but then my device originally named "Atik Camera" is replaced as "USBIP Shared Device" (I can spot they in the same DEV ID). I tried using usbipd command to attached manually - attachment was successful but the device is not showing in the WSL guest Ubuntu. I will try to remove everything and try running via Visual Studio and see if I can get more information..... The only successful attempt I tried is that in VirtualBox Guest Linux, the device & the transfer works fine - so I can say the device do not have a problem working under usb/ip protocol... Yet I looked into the ECONNRESET status - it's a successful state for the urb unlink. So the problem may be caused by a different issue

dorssel commented 1 year ago

WSL USB GUI reported some installation dependencies

3.1.1 is a dev version. I am not surprised it does not work with the external GUI tool. But you may want to report it to that project as well.

attachment was successful but the device is not showing in the WSL guest Ubuntu

That is weird.. What is the output of lsusb and do you have a syslog?

Yet I looked into the ECONNRESET status - it's a successful state for the urb unlink. So the problem may be caused by a different issue

Yes, the issue seems not related.

sywong2000 commented 1 year ago

I loaded the master branch into Visual Studio and have it started and running in the shell. I still see my device showing as "USBIP Shared Device" but there're no problems on attaching it now and showing OK in the WSL guest.

I started the camera capturing and found that although the dmesg print is not the same before but the problem symtom seemed to be the same, after checking the pcapng capture against the dmesg:

pcapng shows URB packets with URB ID (0a43=2627, 0a45=2629, etc) being submitted and completed but on the other hand dmesg shows none of those URB#. The dmesg showing unlink with URB ID 2626, 2628, 2630 but these numbers are not appearing in the URB trace pcapng

meanwhile the VS trace output shows USBSUP_XFER_STALL -> EPIPE -> -32

These all happens when the camera finished its capturing and trying to download the data from the camera. The camera driver showing BulkRead Failed. I checked at the exact time of the download starts, the URB packets started to slip... at the attached pcapng capstures, line#1796 the URB are good in sequence 09fc, 09fd, but after this point 09fe is missing and jumps to 09ff and 0a01 (0a00 is missing) and so on.... meanwhile these missing URB# are showing in the kernel unlink

I can try and trace and see what caused the USBSUP_XFER_STALL but I have very little experiences on the USB/IP. I will have to study and see how I can trap the packets and find out.

[ 1757.350935] vhci_hcd: unlink->seqnum 2624 [ 1757.350937] vhci_hcd: urb->status -104 [ 1757.902103] vhci_hcd: unlink->seqnum 2626 [ 1757.902106] vhci_hcd: urb->status -104 [ 1758.453133] vhci_hcd: unlink->seqnum 2628 [ 1758.453136] vhci_hcd: urb->status -104 [ 1759.004167] vhci_hcd: unlink->seqnum 2630

dbug: 7-3.4[1000] USBSUP_XFER_STALL -> EPIPE -> -32 trce: 7-3.4[1001] Unlinking: PendingSubmits=1, PendingUnlinks=1 trce: 7-3.4[1001] Aborting endpoint dbug: 7-3.4[1000] USBSUP_XFER_STALL -> EPIPE -> -32 trce: 7-3.4[1001] Unlinking: PendingSubmits=1, PendingUnlinks=1 trce: 7-3.4[1001] Aborting endpoint dbug: 7-3.4[1000] USBSUP_XFER_STALL -> EPIPE -> -32 trce: 7-3.4[1001] Unlinking: PendingSubmits=1, PendingUnlinks=1 trce: 7-3.4[1001] Aborting endpoint dbug: 7-3.4[1000] USBSUP_XFER_STALL -> EPIPE -> -32 trce: 7-3.4[1001] Unlinking: PendingSubmits=1, PendingUnlinks=1 trce: 7-3.4[1001] Aborting endpoint dbug: 7-3.4[1000] USBSUP_XFER_STALL -> EPIPE -> -32 trce: 7-3.4[1001]

usbipd.zip

dorssel commented 1 year ago

The messages you see are normal for cancelation of a transfer. It looks like the client (WSL) is requesting data, which is then canceled again by the client. This leads to UNLINK of the outstanding read URB.

UNLINK , Aborting endpoint, USBSUP_XFER_STALL -> EPIPE -> -32 , and status -104 are all normal messages that belong to the cancelation by the client of an earlier read (or write) request. They are not errors.

sywong2000 commented 1 year ago

OK. Thank you! How about the URB ID jumps on URB_BULK out? I assume the URB# should be in sequence under normal operations, correct? The jumping of the URB# may be an indication on missing chunks from the host to the device?

dorssel commented 1 year ago

UNLINKs are not logged (they are technically not URBs, but instead the cancelation of an earlier URB). The "URB#" that is logged is really the USBIP request number. So:

0x9fd is an out request, i.e. an URB, so it is logged
0x9fe is the UNLINK (cancelation) request for 0x9fd (not logged in PCap, as it does not support UNLINK requests, there is no format for it).
0x9ff is another out request
0xa00 is another UNLINK, for 0x9ff
0xa01 is another out request

etc.

What is interesting is that at the start of the capture, apparently the status is read back (endpoint 6) alongside the output (endpoint 4). But from the failures on, only failing outputs, no more input requests. Also: the EPIPE completion after cancelation is always 0.5 seconds after the out request. So it appears to be a timeout issue: an out request is sent, which after 0.5 seconds gets canceled before it completes. Then (of course) it acknowledges cancelation with EPIPE (that's just how it works).

Questions are:

why do the status read backs stop at the same point where outputs start failing
why does the device timeout. The earlier (successful) requests complete within 1 millisecond, And then, suddenly, they no longer complete within 500 ms.

sywong2000 commented 1 year ago

OK I see where the chunks started the 0.5s cancel and I do more captures including what's happening in the Windows host. I can see the URB_FUNCTION_ABORT_PIPE (0x0002) at the time the problem happens at the host computer. I do not have the source code of the driver so it's a little bit difficult to tell why it timed out.

I have also captured the camera operating under a guest virtualbox which the image download is successful, but only the USB traffic via usbpcap in Windows host. Also I capture the USB when I operate the camera directly in the Windows machine. I do see bigger chunks (>16Kb) frames for those successful attempts. So it seemed the transfer never started in the WSL...

usb_captures.zip

dorssel commented 1 year ago

From the localonly capture, focusing on the bulk endpoint (endpoint 8):

1768    18.591696   2.17.8  host    USB 34  URB_BULK in
2451    22.709311   host    2.17.8  USB 27  URB_BULK in
2452    22.709394   2.17.8  host    USB 34  URB_BULK in
2505    22.924759   host    2.17.8  USB 27  URB_BULK in
2506    22.924848   2.17.8  host    USB 34  URB_BULK in
2647    24.433789   host    2.17.8  USB 27  URB_BULK in
2810    25.978284   2.17.8  host    USB 2097179 URB_BULK in
2811    25.978310   host    2.17.8  USB 27  URB_BULK in
2864    27.168742   2.17.8  host    USB 2097179 URB_BULK in
2865    27.168773   host    2.17.8  USB 27  URB_BULK in
2920    28.359342   2.17.8  host    USB 2097179 URB_BULK in
2921    28.359372   host    2.17.8  USB 27  URB_BULK in

Around the same "logical" point, using usbipd-win:

4216    34.938190   2.16.8  host    USB 34  URB_BULK in
4321    35.940501   host    2.16.8  USB 27  URB_BULK in
4322    35.940554   2.16.8  host    USB 34  URB_BULK in
4363    36.143332   host    2.16.8  USB 27  URB_BULK in
4364    36.143358   2.16.8  host    USB 34  URB_BULK in
4435    37.232748   host    2.16.8  USB 27  URB_FUNCTION_ABORT_PIPE
4436    37.232750   2.16.8  host    USB 27  URB_FUNCTION_ABORT_PIPE

It looks like the device wants to transfer 2097179 bytes (about 2 MB) per request. That is quite large (in fact, the largest I've ever seen, per transfer). Either this times out due to latency, or that's beyond some limit (unknown to me). In any case, the transfer does not complete within 0.5 seconds on usbipd-win, after which the client (i.e., WSL) cancels the request. On the other hand, the transfers take between 1 and 1.5 seconds in the localonly case. So, it looks like "something" (don't know what, Linux USBIP driver?) enforces a timeout of 0.5 seconds, but the transfer really requires >1 seconds to complete.

sywong2000 commented 1 year ago

Thanks for your analysis on the traffic. I certainly think that the Windows driver with its own implementation that support much larger transfer block size on the USB device if accessed locally.

About the timeout scenario, I was thinking for such case, I would probably see a larger block of transfer at the Windows host but do not see them in the guest. That's also why I captured using wireshark + UsbPcap to log the traffic in the host machine: usb_cap_usbipd_wsl2_host.pcapng.

However upon checking the two logs: the largest block I can find in the host capture was only 195 bytes and it was not even closed to the time when the transfer issue happens (the problem happened around 12:17:10 GMT+8). At the time of the problem happens, there were only chunks in sizes of 43, 27 bytes being transferred. So it seemed like the transfer may have never started (?).

** just an update if I'm not mistaken by filtering the endpoints to compare the traffic, there is one 27-byte URB request in Virtualbox VM traffic (#1627) would trigger the subsequence response for the 16Kb data coming back from the device. Comparing to the WSL guest captures, it seems that the usbipd never receives such URB request and time out after 30s? Am I interpreting this correctly?

USB Host traffic started with WSL2

No. Time    Source  Destination Protocol    Length  Info
10489   88.790947   host    2.16.8  USB 27  URB_BULK in
10490   88.790997   2.16.8  host    USB 34  URB_BULK in
10499   89.793834   host    2.16.8  USB 27  URB_BULK in
10500   89.793873   2.16.8  host    USB 34  URB_BULK in
10509   89.996358   host    2.16.8  USB 27  URB_BULK in
10510   89.996389   2.16.8  host    USB 34  URB_BULK in
10533   90.998272   host    2.16.8  USB 27  URB_BULK in
10534   90.998306   2.16.8  host    USB 34  URB_BULK in
10543   91.201730   host    2.16.8  USB 27  URB_BULK in
10544   91.201777   2.16.8  host    USB 34  URB_BULK in

<-- missing URB here and then after 30 secs it timed out?

12687   121.952288  host    2.16.8  USB 27  URB_FUNCTION_ABORT_PIPE
12688   121.952289  2.16.8  host    USB 27  URB_FUNCTION_ABORT_PIPE

Comparing to the chunks captured in Virtualbox VM transfer (same version of ubuntu guest, and I only capture the Windows Host traffic but not the Linux guest) - I spotted there're transfers of many sets of 16Kb blocks (to be exact 16411 bytes, that's 16384 + 27 bytes) - which are likely the URB that carries the 16Kb chunk image data with the URB headers and completed the transfer and each of them takes ~10ms. Not 100% sure about the mechanism of virtualbox USB passthrough - I just assume it's also an implementation of USB/IP.

USB Host traffic when running in VirtualBox VM

No. Time    Source  Destination Protocol    Length  Info
622 1.672115    2.18.8  host    USB 34  URB_BULK in
943 2.689192    host    2.18.8  USB 27  URB_BULK in
944 2.689259    2.18.8  host    USB 34  URB_BULK in
1029    2.910157    host    2.18.8  USB 27  URB_BULK in
1030    2.910185    2.18.8  host    USB 34  URB_BULK in
1627    5.309898    host    2.18.8  USB 27  URB_BULK in  <--
1708    5.709428    2.18.8  host    USB 16411   URB_BULK in  <--
1711    5.711395    host    2.18.8  USB 27  URB_BULK in
1712    5.719471    2.18.8  host    USB 16411   URB_BULK in
1713    5.719552    host    2.18.8  USB 27  URB_BULK in
1716    5.728699    2.18.8  host    USB 16411   URB_BULK in
1717    5.728893    host    2.18.8  USB 27  URB_BULK in
1720    5.738034    2.18.8  host    USB 16411   URB_BULK in
1721    5.739345    host    2.18.8  USB 27  URB_BULK in
1724    5.747929    2.18.8  host    USB 16411   URB_BULK in
1725    5.748022    host    2.18.8  USB 27  URB_BULK in
1728    5.757292    2.18.8  host    USB 16411   URB_BULK in
1729    5.757415    host    2.18.8  USB 27  URB_BULK in
1732    5.766492    2.18.8  host    USB 16411   URB_BULK in

sywong2000 commented 1 year ago

I did the capture once again today with release 3.2.0 and there're some interesting findings. The overall process still failed. And I was able to locate the full image data chunks, but they only comes back, after the device has been re-initialized at a later time again and being configured: usbipd.zip

_dmesg_20230912_01.txt is the dmesg output indi_app_log.txt is the application log usbipd01.pcapng is the PcapNg capture

2023-09-12 20:37:14 Driver start to capture an image with 10sec exposure. So expect at 20:37:24 the image will be ready for download.
2023-09-12 20:37:25 Driver started reporting "ParDeviceLibUSB::In - BulkRead Failed!". The image data didn't come in as expected. At the same time dmesg started showing the unlinks: [Tue Sep 12 20:37:25 2023] vhci_hcd: unlink->seqnum 3842 [Tue Sep 12 20:37:25 2023] vhci_hcd: urb->status -104 [Tue Sep 12 20:37:26 2023] vhci_hcd: unlink->seqnum 3844 [Tue Sep 12 20:37:26 2023] vhci_hcd: urb->status -104
At this point, the PcapNg didn't really show anything for endpoint 8. I saw the time out occurs in endpoint 4.
2023-09-12 20:39:40 I manually abort the capture. The device takes a long time to stop as it seemed still running.
2023-09-12 20:39:58 Software stopped. Process terminated.
2023-09-12 20:40:14 Then I started a new instance of the server.
2023-09-12 20:40:21 Camera being initialized by the new process again and being setup, then:
At this particular time, there're a set of 19 x URB_BULK IN each size 1000065 bytes (around 976 Kb) , total around ~18Mb showing in the PcapNg. The 18Mb is also the usual image full file size. So it looks like these URB_BULK IN chunks were "delayed" until the device was being initialized the second time. At the same time, the drivers log showing a lots of "ParDeviceLibUSB::In - OK!! " which indicates the driver were able to "receive" these chunks but unable to process them because they're not associated with any capture request. I cannot be sure but seemed the initialization happened at the 2nd time release/flush those previous URB chunks queued somewhere maybe?

No. Time    Source  Destination Protocol    Length  Info
1682    2023-09-12 20:40:21.787241  6.4.8   host    USB 67  URB_BULK in
1687    2023-09-12 20:40:21.838999  host    6.4.8   USB 64  URB_BULK in
1688    2023-09-12 20:40:22.407024  6.4.8   host    USB 1000065 URB_BULK in
1691    2023-09-12 20:40:22.415025  host    6.4.8   USB 64  URB_BULK in
1692    2023-09-12 20:40:22.982857  6.4.8   host    USB 1000065 URB_BULK in
1695    2023-09-12 20:40:22.987156  host    6.4.8   USB 64  URB_BULK in
1696    2023-09-12 20:40:23.554518  6.4.8   host    USB 1000065 URB_BULK in
1699    2023-09-12 20:40:23.557446  host    6.4.8   USB 64  URB_BULK in
1700    2023-09-12 20:40:24.125093  6.4.8   host    USB 1000065 URB_BULK in
1703    2023-09-12 20:40:24.128214  host    6.4.8   USB 64  URB_BULK in
1704    2023-09-12 20:40:24.695626  6.4.8   host    USB 1000065 URB_BULK in
1707    2023-09-12 20:40:24.698106  host    6.4.8   USB 64  URB_BULK in
1708    2023-09-12 20:40:25.265646  6.4.8   host    USB 1000065 URB_BULK in
1711    2023-09-12 20:40:25.269358  host    6.4.8   USB 64  URB_BULK in
1712    2023-09-12 20:40:25.836214  6.4.8   host    USB 1000065 URB_BULK in
1715    2023-09-12 20:40:25.838712  host    6.4.8   USB 64  URB_BULK in
1716    2023-09-12 20:40:26.407745  6.4.8   host    USB 1000065 URB_BULK in