dorssel / usbipd-win

Windows software for sharing locally connected USB devices to other machines, including Hyper-V guests and WSL 2.
GNU General Public License v3.0
3.51k stars 225 forks source link

USBIPd lockup about once a month #729

Open rhermsen opened 11 months ago

rhermsen commented 11 months ago

I'm using usbipd between Windows-11 and WSL2-Ubuntu to connect a serial device to WSL2. For the most part it works great, but about once a month it locks up. I have a Python program running that obtains data from the USB connected device every 5 seconds. To recover from this lockup I have to detach/attach (usbipd wsl detach --busid 1-4 / usbipd wsl attach --busid 1-4). And than, after a restart of my program it works again. There is no need to powercycle the device with USB (serial) port. Dates seen a lockup so far (so often far apart):

2023-06-22 15:46
2023-06-26 23:48
2023-08-09 19:15
2023-09-17 15:34
2023-10-10 01:44

I looked at https://github.com/dorssel/usbipd-win/wiki/Troubleshooting and this time I started with trace level logging. I see that the logging is without timestamps, so I wonder if this logging would be helpful for my issue? Any additional advice how I could debug this issue?

Version details: Windows:

systeminfo | findstr /B /C:"OS Name" /B /C:"OS Version"
OS Name:                   Microsoft Windows 11 Pro
OS Version:                10.0.22000 N/A Build 22000
usbipd --version
3.2.0+38.Branch.master.Sha.7ad7b9904ac0db8d759e3a53a22b1126da01b2be

WSL2:

$ cat /etc/os-release | grep VERSION=
VERSION="22.04.3 LTS (Jammy Jellyfish)"
$ uname -r
5.15.90.1-microsoft-standard-WSL2
$ apt list --installed hwdata linux-tools*
Listing... Done
hwdata/jammy,now 0.357-1 all [installed]
linux-tools-5.15.0-60-generic/now 5.15.0-60.66 amd64 [installed,local]
linux-tools-5.15.0-60/now 5.15.0-60.66 amd64 [installed,local]
linux-tools-5.15.0-86-generic/jammy-updates,jammy-security,now 5.15.0-86.96 amd64 [installed,automatic]
linux-tools-5.15.0-86/jammy-updates,jammy-security,now 5.15.0-86.96 amd64 [installed,automatic]
linux-tools-common/jammy-updates,jammy-security,now 5.15.0-86.96 all [installed,automatic]
linux-tools-virtual/jammy-updates,jammy-security,now 5.15.0.86.83 amd64 [installed]
$ update-alternatives --display usbip
usbip - auto mode
  link best version is /usr/lib/linux-tools/5.15.0-60-generic/usbip
  link currently points to /usr/lib/linux-tools/5.15.0-60-generic/usbip
  link usbip is /usr/local/bin/usbip
/usr/lib/linux-tools/5.15.0-60-generic/usbip - priority 20

USB device/driver: A particulate matter meter (fijnstof meter). https://github.com/msillano/tuyaDAEMON/wiki/custom-device-'PM-detector':-case-study

$ lsusb -s 001:005
Bus 001 Device 005: ID 1a86:7523 QinHeng Electronics CH340 serial converter
dorssel commented 11 months ago

Any additional advice how I could debug this issue?

https://github.com/dorssel/usbipd-win/wiki/Troubleshooting also shows how you can create captures (PCAPs). That's the most detailed logging available. You can also capture on the Linux side (usbmon), but it's a bit harder to set up. syslog messages are also helpful.

dorssel commented 9 months ago

4.0.0 has been released which fixes a (very small) memory leak. This could explain your issue. Can you confirm?

rhermsen commented 9 months ago

Thanks for the update. I installed version 4.0.0 on 7-Dec. At that time 3.2.0 was running for 6 wks.

dorssel commented 7 months ago

Any news on this?

rhermsen commented 7 months ago

Hi, Uptime is now: 'active (running) since Wed 2024-01-03 13:45:54 CET; 1 month 1 day ago' (I did had to restart a few times after installing 4.0.0 for other reasons ...)

dorssel commented 7 months ago

Are you monitoring the memory usage of the usbipd service?

rhermsen commented 7 months ago

Hi, No I was not..., only overall memory usage (which goes up and down due to WSL2). Just added a graph to MRTG for usbipd.exe

rhermsen commented 6 months ago

Update on a likely unrelated occurrence.

On 2024-02-29 at 20:28 the PM-Monitor stopped working. This was after 1 month and 26 days uptime (started 2024-01-03 13:45:54). This time with a different failure-mode than for which I opened the ticket. I cannot exclude that this got raised due to a connection glitch (e.g. connector accidentally touched). The Win11 event log also show some logs at the time of the issue.

Memory usage looks to increase from 10MB to about 11MB at the moment the issue occurred (2024-02-29 20:26:48 CET). Because the CH340-based serial-USB device was detached from usbipd.exe the setup stopped (no auto restart). After attaching the CH340-based serial-USB device and restarting PM-Monitor process (2024-03-01 19:19:55 CET) I see another memory usage increase from 11MB to 12MB. I have a graph of this ~20% memory usage increase. usbipd.exe wasn't restarted, so maybe it is normal to have a bit more memory in use?

Because I didn't have to restart usbipd.exe for this occurrence I assume it to be a different issue than for which I opened the ticket.

PM-Monitor restart scripts

startUSBipD.bat

@echo off
echo "Starting usbipd..."

:search
tasklist | find "wsl.exe"
IF %ERRORLEVEL% == 0 GOTO :found
TIMEOUT /T 1
GOTO :search

:found
TIMEOUT /T 3
python C:\Users\rhermsen\usbipd_start.py
TIMEOUT /T 30
wsl.exe -u root service pm_monitor restart
echo "Done"

usbipd_start.py

# -*- coding: utf-8 -*-
"""
# execute usbipd at startup
#
# rhermsen
# version 2.0
# 
Created on Fri Jun 16 21:20:07 2023

Available commands upto 3.2.0:
    usbipd wsl list
    usbipd wsl
        gives usage info
    usbipd wsl list
    usbipd wsl attach -h
    usbipd wsl detach -h
Available commands from 4.0.0:
    usbipd list
    usbipd
        gives usage info
    usbipd list
    usbipd attach -h
    usbipd detach -h

@author: rhermsen

"""

import subprocess

def executeCliCmd(command):
    '''
    Parameters
    ----------
    command : string, The full command line, including the parameters to execute.

        DESCRIPTION.
        Executes a command with optional parameters. The resulting output is returned in a list of strings.

    Returns
    -------
    List of strings where each line is an entry in this list.
    '''
    p = subprocess.Popen(command, stdout=subprocess.PIPE, shell=True)
    out, err = p.communicate()
    outString = str(out)
    # result = outString.split('\\r\\n')
    # print(result)
    return outString.split('\\r\\n')

def findBusidAndState(list, match='CH340'):
    attached = False
    busid = None
    for line in list:
        wordList = line.split()
        if match in wordList:
            busid = wordList[0]
            if "Attached" in wordList:
                attached = True
            # print("busid =", busid, "attached", attached)
    return busid, attached

list_command = 'usbipd list'
usb_list = executeCliCmd(list_command)

busid, attached = findBusidAndState(usb_list)
attach_command = f'usbipd attach --wsl --busid {busid}'
#print("attach_command:",attach_command)
if busid != None and attached == False:
    executeCliCmd(attach_command)
elif attached == True:
    print("PM-Monitor USB already attached to WSL2")
else:
    print("Could not attach PM-Monitor.")

MRTG USBIPd memory usage

Logging

WSL

Error messages in journalctl:

Feb 29 20:26:48 DESKTOP-E6DGEM9 kernel: ch341-uart ttyUSB0: usb_serial_generic_read_bulk_callback - urb stopped: -32
Feb 29 20:26:48 DESKTOP-E6DGEM9 kernel: vhci_hcd: connection reset by peer
Feb 29 20:26:48 DESKTOP-E6DGEM9 kernel: vhci_hcd: stop threads
Feb 29 20:26:48 DESKTOP-E6DGEM9 kernel: vhci_hcd: release socket
Feb 29 20:26:48 DESKTOP-E6DGEM9 kernel: vhci_hcd: disconnect device
Feb 29 20:26:48 DESKTOP-E6DGEM9 kernel: usb 1-1: USB disconnect, device number 2
Feb 29 20:26:48 DESKTOP-E6DGEM9 kernel: usb 1-1: failed to send control message: -19
Feb 29 20:26:48 DESKTOP-E6DGEM9 kernel: ch341-uart ttyUSB0: ch341-uart converter now disconnected from ttyUSB0
Feb 29 20:26:48 DESKTOP-E6DGEM9 kernel: ch341 1-1:1.0: device disconnected

similar in dmesg:

[Thu Feb 29 20:27:01 CET 2024] ch341-uart ttyUSB0: usb_serial_generic_read_bulk_callback - urb stopped: -32
[Thu Feb 29 20:27:01 CET 2024] vhci_hcd: connection reset by peer
[Thu Feb 29 20:27:01 CET 2024] vhci_hcd: stop threads
[Thu Feb 29 20:27:01 CET 2024] vhci_hcd: release socket
[Thu Feb 29 20:27:01 CET 2024] vhci_hcd: disconnect device
[Thu Feb 29 20:27:01 CET 2024] usb 1-1: USB disconnect, device number 2
[Thu Feb 29 20:27:01 CET 2024] usb 1-1: failed to send control message: -19
[Thu Feb 29 20:27:01 CET 2024] ch341-uart ttyUSB0: ch341-uart converter now disconnected from ttyUSB0
[Thu Feb 29 20:27:01 CET 2024] ch341 1-1:1.0: device disconnected

Win11 event logs

(sometimes with a few events before the issue, and more after the issue): USBIPd.exe event 2024-02-29 2026.evtx: USBIPd.exe event 2024-02-29 2026.evtx.zip Win Application logs USBIPd.exe event 2024-02-29 2026.evtx: Win Application logs USBIPd.exe event 2024-02-29 2026.evtx.zip Win System logs USBIPd.exe event 2024-02-29 2026.evtx: Win System logs USBIPd.exe event 2024-02-29 2026.evtx.zip

dorssel commented 6 months ago

Thanks for the elaborate report.

Memory usage seems correct. usbipd-win is written in C#, a managed language using a garbage collector. It's normal for a new connection to have a temporary increase in memory consumption. It will take a while before the garbage collector will kick in.

The memory bug that I fixed earlier leaked 40 bytes per URB + a kernel event handle. I could see it in memory consumption clearly after about 100.000 URBs. I guess in your case it was the kernel handles that caused the original crash. I went up to 1 million URBs, but couldn't actually reproduce the crash. Can you guestimate your URB-rate? If that is indeed > 1 million per month it could explain the original crash.

With the current behavior, I think we can safely say the original problem is fixed.

rhermsen commented 6 months ago

Is it only in one direct this issue was present?

My stab at guestimate the URB-rate: I have a tcpdump capture and assuming each URB ends up in a separate TCP segment I count

dorssel commented 6 months ago

Is it only in one direct this issue was present?

Usually, for each URB there is one USBIP request and one USBIP reply. It also makes sense that a single "poll" from your software consists of several URBs. Of course, when either the request or the reply doesn't fit in the MTU there will be several TCP fragments. But as a "guestimate", the 4 million makes sense.

rhermsen commented 6 months ago

To comment on your last update: Indeed I see a single poll to be split in several URB's each in its own TCP segment. There is no fragmentation (largest packet is 146 MB only).

rhermsen commented 6 months ago

Unfortunately there was another similar occurrence as on 2024-02-29.

On 2024-03-07 11:04 the PM-Monitor stopped working in a similar way as on 2024-02-29 at 20:28. This was after only 5 days uptime (started 2024-03-01 19:19:55). This time nobody was near the device. The Win11 event log doesn't show much this time...

Category: Usbipd.ConnectedClient
EventId: 2

Client 192.168.188.87 released device at 1-4 (USB\VID_1A86&PID_7523\5&9BB4271&0&4).

Memory usage was the same as after last week issue, 12MB until I restarted USBIPd.exe. But after I restarted with debug and packet capture (pcap) enabled it jumped to 22MB. (I assume a bit this is due debug and/or a packet capture running).

MRTG USBIPd memory usage 1

I hope the capture isn't growing too fast, but from the 'guestimate' it probably will...

rhermsen commented 2 months ago

I observed another occurrence (2024-05-14, 23:11) after 1 month and 6 days. This time with capturing in place. I was still using usbipd 4.0.0 (capturing timestamps incorrect). This was just before my holiday, so I only had little time to investigate. Now I'm back from holiday.

At the moment the issue starts I see an 'address' change at the server side (windows) from 1.4.2 to 1.4.1 The requests coming from WSL2 (Ubuntu) are still going to 1.4.2. There is a response packet from 1.4.2 followed by 1.4.1 from the server side. All these packets are marked as 'URB transfer type' 'URB_INTERRUPT' in the IN direction. This repeats at every request coming from WSL2, every 5 seconds.

In the mean time I upgraded to USBIPd 4.2.0.

The complete capture is 1.19GB. I extracted the last 2 correct sequences, and 2 incorrect sequences (named: pm-monitor_20240514__2311__last_successful_and_failed2.pcapng.txt ):

packet 21 to 38 (=18), expected request response
packet 39 and 40 look a bit unexpected
packet 41 to 70 (=30, so much more packets) the request looks good, but response not.
packet 71 to 114 (=44 even more packets) the request looks good, but response not.
and this repeats every 5 seconds

Do you see what goes wrong here? pm-monitor_202405142311last_successful_and_failed2.pcapng.txt

dorssel commented 2 months ago

1.4.2 is a bulk endpoint. In Linux terminology, it acts like a block device. You request a read, and you get a reply, always. 1.4.1 is an interrupt endpoint. In Linux terminology, it acts like a character device. You request a read, and it blocks until there is actually something to read., which may be immediately or maybe never at all.

Normally, your device apparently uses 1.4.2. And every IN request completes successfully. Then, in your log line 39. It does not respond. This is the error. A bulk pipe should always respond immediately to a read request. But this time it doesn't. It ends with ETIME, which comes from VBoxUSB where the error code is called USBSUP_XFER_DNR, which it in turn got from Windows where the error code is called USBD_STATUS_DEV_NOT_RESPONDING.

Then your client side driver again submits an IN request for 1.4.2, but it never completes. Besides that, it starts sending requests to 1.4.1.

The root cause is the device not responding (ETIME) in the first place.

rhermsen commented 2 months ago

If I get it correct it the USB device (the PM-monitor) connected to the Windows machine that is 'misbehaving' by having a timeout. (this is seen from packet 39 having status ETIME (=Timer expired)). It is at the hardware side the timeout occurred, so I'm not able to solve it for future similar events.

For all previous events I never had to power-cycle the PM-Monitor to restore communication (required steps: usbipd detach/attach and restart of my python script). This time I'm not sure as I powered off all involved equipment. Is there a retry mechanism possible, or a workaround to initiate a restart e.g. from WSL2?

dorssel commented 2 months ago

If I get it correct it the USB device (the PM-monitor) connected to the Windows machine that is 'misbehaving' by having a timeout.

Looks like it. It is in any case the error that the VBoxUSB driver got from Windows. Normally, when devices get "stuck" it is called a STALL. And the error for that is EPIPE. Client drivers usually handle that by sending a "clear halt" (pipe reset) via channel 0 (the control channel). Maybe (just maybe) that may work for ETIME as well. And I have to admit, the translation for this error to ETIME is mine, based on https://www.kernel.org/doc/html/latest/driver-api/usb/error-codes.html. I/You/We could try to return EPIPE instead, hoping that the driver responds to that with a pipe reset, which just may work (or not)...

Translation happens here: https://github.com/dorssel/usbipd-win/blob/master/Usbipd/Tools.cs#L109

rhermsen commented 2 months ago

@dorssel If it is possible to have an build with EPIPE returned i.s.o. ETIME that would be great. (I see you did that sometime in the past https://github.com/dorssel/usbipd-win/actions/runs/7308106135?pr=812)

Btw, are there any instructions available to built usbipd from source?

dorssel commented 2 months ago

Btw, are there any instructions available to built usbipd from source?

Simply run the commands from the GitHub build script: https://github.com/dorssel/usbipd-win/blob/master/.github/workflows/build-installer.yml

rhermsen commented 2 months ago

First time I look at GitHub Actions...

I did manage to build an msi-installer, only it is for version 0.2.0 (usbipd-win_0.2.0.msi). Steps done:

But after this I get an exception regarding VBoxUsb driver (see debug log below). But now also after installing 4.2.0 I get the same exception message.

usbipd --version
0.2.0+1052.Branch.master.Sha.716112dce471b303d6de1ddd8ba7273823bd816e
C:\WINDOWS\system32>usbipd server Logging:LogLevel:Default=Trace "usbipd:PcapNg:Path=C:\Users\rhermsen\pm-monitor_20240706__0155.pcapng"
dbug: Microsoft.Extensions.Hosting.Internal.Host[1]
      Hosting starting
dbug: Usbipd.PcapNg[1000]
      usbipd:PcapNg:Path = 'C:\Users\rhermsen\pm-monitor_20240706__0155.pcapng'
dbug: Usbipd.PcapNg[1000]
      usbipd:PcapNg:SnapLength = unlimited
dbug: Usbipd.Server[1000]
      usbipd:Port = 3240
dbug: Usbipd.Server[1000]
      0.2.0+1052.Branch.master.Sha.716112dce471b303d6de1ddd8ba7273823bd816e
info: Microsoft.Hosting.Lifetime[0]
      Application started. Press Ctrl+C to shut down.
info: Microsoft.Hosting.Lifetime[0]
      Hosting environment: Production
info: Microsoft.Hosting.Lifetime[0]
      Content root path: C:\Program Files\usbipd-win\
dbug: Microsoft.Extensions.Hosting.Internal.Host[2]
      Hosting started
dbug: Usbipd.PcapNg[1000]
      Flushing
dbug: Usbipd.Server[1000]
      new connection from 192.168.188.87
dbug: Usbipd.Server[1000]
      connection close: Unable to read beyond the end of the stream.
dbug: Usbipd.Server[1000]
      connection closed
dbug: Usbipd.Server[1000]
      new connection from 192.168.188.87
dbug: Usbipd.ConnectedClient[1000]
      Received opcode: OP_REQ_IMPORT
fail: Usbipd.ConnectedClient[3]
      An exception occurred while communicating with the client:
      System.IO.FileNotFoundException: Unable to find the specified file.
         at Usbipd.ConfigurationManager.GetVBoxDevice(BusId) + 0x128
         at Usbipd.VBoxUsb.<ClaimDeviceOnce>d__0.MoveNext() + 0x6d
      --- End of stack trace from previous location ---
         at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x20
         at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task) + 0xb2
         at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task, ConfigureAwaitOptions) + 0x4b
         at Usbipd.VBoxUsb.<ClaimDevice>d__1.MoveNext() + 0x145
      --- End of stack trace from previous location ---
         at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x20
         at Usbipd.VBoxUsb.<ClaimDevice>d__1.MoveNext() + 0x202
      --- End of stack trace from previous location ---
         at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x20
         at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task) + 0xb2
         at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task, ConfigureAwaitOptions) + 0x4b
         at Usbipd.ConnectedClient.<HandleRequestImportAsync>d__8.MoveNext() + 0xd99
      --- End of stack trace from previous location ---
         at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x20
         at Usbipd.ConnectedClient.<HandleRequestImportAsync>d__8.MoveNext() + 0x1823
      --- End of stack trace from previous location ---
         at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x20
         at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task) + 0xb2
         at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task, ConfigureAwaitOptions) + 0x4b
         at Usbipd.ConnectedClient.<RunAsync>d__5.MoveNext() + 0x300
dbug: Usbipd.Server[1000]
      connection close: Unable to find the specified file.
dbug: Usbipd.Server[1000]
      connection closed
dorssel commented 2 months ago

The version number is inconsequential. It is automatically determined from the tags (using GitVersion). Your clone does not have the release tags, so it "restarts" from 0.x. That's not a problem.

Drivers have been bumped, and for development builds that causes issues. Simply: uninstall whatever usbipd-win you have installed now. And then install another version (your dev build, for example). And if that fails (like you reported), Do a repair install. That should fix any driver versions.

EDIT: And as always with Windows, when things go wrong with driver installs: a reboot in between steps helps ...

rhermsen commented 2 months ago

I uninstalled, repaired, downgraded/upgraded, rebooted usbipd-win multiple times, but only got it resolved after uninstalling a VirturalBox USB device via 'Device Manager': VirtualBox USB No drivers are installed

Now it is running with version 0.2.0

dorssel commented 2 months ago

Oh well, dev versions .... Good luck with this one.

dorssel commented 2 months ago

I guess you had it forced bound?

rhermsen commented 2 months ago

No, didn't use a 'force' option. Only used the 'usbipd bind --busid 1-4' and 'usbipd attach --wsl --busid 1-4'.

Started from a working usbipd 4.2.0 > tried install usbipd 0.2.0 but this failed due to lower version as the one already installed > uninstall 4.2.0 > (didn't reboot...) installed 0.2.0 (don't think I rebooted) and at this point ran into that issue until I found (after a lot of steps) the 'VituralBox USB' device in de 'Device Manager'.

rhermsen commented 1 week ago

There was a new occurrence yesterday. This was with the the changes in place discussed here https://github.com/dorssel/usbipd-win/issues/729#issuecomment-2198529063 It now failed with 'usb.urb_status == -32' (Broken pipe) a EPIPE timeout error (packet 37) after 15 seconds. What I don't see is a retry after the first timeout, is that indeed correct?

The last working sequence starts with packet 19 and ends with packet 36. Than i.s.o. a packet about 5 seconds after packet 36 from host to 1.4.2 (usbipd to local client), there is a packet from 1.4.2 to host (local USB client to usbipd) after 15 seconds.

In debug I see a

"dbug: Usbipd.ConnectedClient[1000]
      Unbind or unplug while attached"

Can this point to a connection instability (issue happened at 04:32, don't expect anyone to be around the PM-Monitor at that time)? And if so, is there any connection (in)stability dampening possible? Btw, is there an option to include timestamps in the debug logging.

Without restarting the usbipd server, or powercycling the serial device (PM-Monitor), I just re-attached the usb port again (usbipd attach --wsl --busid 1-4) and it worked.

P.S. I haven't updated my Windows system for quite a while. Before running usbipd for a long time again, I'm planning to do a Win11 update (Win11 23H2) and also would like to update WSL2 to the latest version (2.3.17) and also the Ubuntu distribution to 24.04.01.

usbipd --version
0.2.0+1052.Branch.master.Sha.716112dce471b303d6de1ddd8ba7273823bd816e

(similar to 4.2.0 with te epip patch)

USBIPd debug output:

C:\WINDOWS\system32>usbipd server Logging:LogLevel:Default=Trace "usbipd:PcapNg:Path=C:\Users\rhermsen\pm-monitor_20240710__1814.pcapng"
dbug: Microsoft.Extensions.Hosting.Internal.Host[1]
      Hosting starting
dbug: Usbipd.PcapNg[1000]
      usbipd:PcapNg:Path = 'C:\Users\rhermsen\pm-monitor_20240710__1814.pcapng'
dbug: Usbipd.PcapNg[1000]
      usbipd:PcapNg:SnapLength = unlimited
dbug: Usbipd.Server[1000]
      usbipd:Port = 3240
dbug: Usbipd.Server[1000]
      0.2.0+1052.Branch.master.Sha.716112dce471b303d6de1ddd8ba7273823bd816e
info: Microsoft.Hosting.Lifetime[0]
      Application started. Press Ctrl+C to shut down.
info: Microsoft.Hosting.Lifetime[0]
      Hosting environment: Production
info: Microsoft.Hosting.Lifetime[0]
      Content root path: C:\Program Files\usbipd-win\
dbug: Microsoft.Extensions.Hosting.Internal.Host[2]
      Hosting started
dbug: Usbipd.PcapNg[1000]
      Flushing
dbug: Usbipd.Server[1000]
      new connection from 192.168.188.87
dbug: Usbipd.Server[1000]
      connection close: Unable to read beyond the end of the stream.
dbug: Usbipd.Server[1000]
      connection closed
dbug: Usbipd.Server[1000]
      new connection from 192.168.188.87
dbug: Usbipd.ConnectedClient[1000]
      Received opcode: OP_REQ_IMPORT
dbug: Usbipd.ConnectedClient[1000]
      Claiming took 230 ms
info: Usbipd.ConnectedClient[1]
      Client 192.168.188.87 claimed device at 1-4 (USB\VID_1A86&PID_7523\5&9BB4271&0&4).
dbug: 1-4.0[1000]
      Endpoint created
dbug: 1-4.0[1000]
      Endpoint created
dbug: 1-4.0[1000]
      Trapped SET_CONFIGURATION: 1
dbug: 1-4.0[1000]
      USBSUP_XFER_STALL -> EPIPE -> -32
dbug: 1-4.0[1000]
      USBSUP_XFER_STALL -> EPIPE -> -32
dbug: Usbipd.PcapNg[1000]
      Flushing
dbug: 1-4.1[1000]
      Endpoint created
dbug: 1-4.2[1000]
      Endpoint created
dbug: 1-4.2[1000]
      Endpoint created
[following messages seen 206 times]
dbug: Usbipd.PcapNg[1000]
      Flushing
[ 205 messages removed]
dbug: Usbipd.ConnectedClient[1000]
      Unbind or unplug while attached
dbug: 1-4.2[1000]
      USBSUP_XFER_STALL -> EPIPE -> -32
dbug: 1-4.2[1000]
      USBSUP_XFER_STALL -> EPIPE -> -32
dbug: 1-4.1[1000]
      USBSUP_XFER_STALL -> EPIPE -> -32
info: Usbipd.ConnectedClient[2]
      Client 192.168.188.87 released device at 1-4 (USB\VID_1A86&PID_7523\5&9BB4271&0&4).
fail: Usbipd.ConnectedClient[3]
      An exception occurred while communicating with the client:
      System.IO.IOException: Unable to read data from the transport connection: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond..
       ---> System.Net.Sockets.SocketException (10060): A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
         --- End of inner exception stack trace ---
         at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError, CancellationToken) + 0x47
         at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.System.Threading.Tasks.Sources.IValueTaskSource<System.Int32>.GetResult(Int16) + 0x58
         at System.IO.Stream.<ReadAtLeastAsyncCore>d__46.MoveNext() + 0x1bb
      --- End of stack trace from previous location ---
         at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x20
         at System.Threading.Tasks.Sources.ManualResetValueTaskSourceCore`1.ThrowForFailedGetResult() + 0x13
         at System.Threading.Tasks.Sources.ManualResetValueTaskSourceCore`1.GetResult(Int16) + 0x2b
         at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1.StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource<TResult>.GetResult(Int16) + 0x20
         at Usbipd.Tools.<ReadMessageAsync>d__0.MoveNext() + 0x1b6
      --- End of stack trace from previous location ---
         at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x20
         at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task) + 0xb2
         at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task, ConfigureAwaitOptions) + 0x4b
         at Usbipd.Interop.UsbIp.<ReadUsbIpHeaderAsync>d__17.MoveNext() + 0x10f
      --- End of stack trace from previous location ---
         at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x20
         at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task) + 0xb2
         at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task, ConfigureAwaitOptions) + 0x4b
         at Usbipd.AttachedClient.<RunAsync>d__12.MoveNext() + 0x36b
      --- End of stack trace from previous location ---
         at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x20
         at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task) + 0xb2
         at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task, ConfigureAwaitOptions) + 0x4b
         at Usbipd.ConnectedClient.<HandleRequestImportAsync>d__8.MoveNext() + 0x1473
      --- End of stack trace from previous location ---
         at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x20
         at Usbipd.ConnectedClient.<HandleRequestImportAsync>d__8.MoveNext() + 0xff6
      --- End of stack trace from previous location ---
         at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x20
         at Usbipd.ConnectedClient.<HandleRequestImportAsync>d__8.MoveNext() + 0x1823
      --- End of stack trace from previous location ---
         at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x20
         at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task) + 0xb2
         at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task, ConfigureAwaitOptions) + 0x4b
         at Usbipd.ConnectedClient.<RunAsync>d__5.MoveNext() + 0x300
dbug: Usbipd.Server[1000]
      connection close: Unable to read data from the transport connection: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond..
dbug: Usbipd.Server[1000]
      connection closed
dbug: Usbipd.PcapNg[1000]
      Flushing

pm-monitor_202407101814last2_successful_and_failed.pcapng.txt

dorssel commented 1 week ago

The first thing that happens is:

dbug: Usbipd.ConnectedClient[1000]
      Unbind or unplug while attached

Thats only shows up if the device is suddenly "gone", as in: the USB hub no longer reports it. That is either because the device was physically unplugged (not in your case, I guess), or the driver was changed (also not in your case), or the device "resets" internally, causing the HUB to "unplug" it, and then "replug" it. To Windows all of these look exactly the same as if the device was physically unplugged and then replugged.

Since usbipd-win registers for device change notifications, it gets notified by Windows that the device is "gone". And then it simply hangs up (as in: cancels all outstanding requests and closes the TCP connection). All other "symptoms" are just a result of this "cleanup". The real problem is still: the device locks up, and the USB hub does a hard port reset hoping to revive it.

rhermsen commented 1 week ago

Thanks for looking into the details.

I get that there are likely two different reasons for these interrupts.

I think the seconds failure mode, with the USB flap, should be detectable by software, and I should be able to re-attach the client. (does usbipd win offer any options to e.g. run a script on an event?) I'm curious if the first failure mode is improved with the code-change (ETIME > EPIPE).

Failure timestamp Suspected reason Signature Debug Signature dmesg Signature journalctl
2023-06-22 1546 Timeout no kernel messages
2023-06-26 2348
2023-08-09 1915
2023-09-17 1534 Timeout no kernel messages
2023-10-10 0144 Timeout no kernel messages
2024-02-29 2028 PM-Monitor reset USB disconnected USB disconnected
2024-03-07 1108 PM-Monitor reset USB disconnected USB disconnected
2024-03-31 0300 Timeout no kernel messages
2024-04-08 0048 PM-Monitor reset USB disconnected USB disconnected
2024-05-14 2311 Timeout none no kernel messages no kernel messages
2024-08-31 0432 PM-Monitor reset Unbind or unplug USB disconnected USB disconnected
dorssel commented 1 week ago

I'm curious if the first failure mode is improved with the code-change (ETIME > EPIPE).

Maybe (although I doubt it); you can give it a try.

From https://www.kernel.org/doc/html/v4.12/driver-api/usb/error-codes.html#error-codes-returned-by-in-urb-status-or-in-iso-frame-desc-n-status-for-iso it certainly seems that STALL should map to EPIPE.

Other people have tried to switch between ETIME and EPIPE when they were experiencing trouble (e.g., https://github.com/dorssel/usbipd-win/pull/812). It always turned out not to help and really was a hardware/firmware problem where the device itself was really stuck.

Error mapping is here: https://github.com/dorssel/usbipd-win/blob/master/Usbipd/Tools.cs#L113-L119. Feel free to play around.

dorssel commented 1 week ago

does usbipd win offer any options to e.g. run a script on an event

No, but Windows does. And usbipd-win logs every attach/detach to Windows Eventlog.

rhermsen commented 4 days ago

I'm curious if the first failure mode is improved with the code-change (ETIME > EPIPE).

Maybe (although I doubt it); you can give it a try.

From https://www.kernel.org/doc/html/v4.12/driver-api/usb/error-codes.html#error-codes-returned-by-in-urb-status-or-in-iso-frame-desc-n-status-for-iso it certainly seems that STALL should map to EPIPE.

Other people have tried to switch between ETIME and EPIPE when they were experiencing trouble (e.g., #812). It always turned out not to help and really was a hardware/firmware problem where the device itself was really stuck.

Error mapping is here: https://github.com/dorssel/usbipd-win/blob/master/Usbipd/Tools.cs#L113-L119. Feel free to play around.

I already have that change in place (see discussion between 3-July and 6-July in this thread), but have not see that failure-mode yet to see the behavior.

rhermsen commented 4 days ago

does usbipd win offer any options to e.g. run a script on an event

No, but Windows does. And usbipd-win logs every attach/detach to Windows Eventlog.

I found I should be able to detect that from within my Python script (running on WSL2), and trigger a script in Win11 to re-attach the USB-client. I do have to change some exception handling.