dorssel / usbipd-win

Windows software for sharing locally connected USB devices to other machines, including Hyper-V guests and WSL 2.
GNU General Public License v3.0
3.7k stars 231 forks source link

Read TCP Keep Alive settings from environment variables. #412

Closed kaplan2539 closed 2 years ago

kaplan2539 commented 2 years ago

Even with the latest version 2.3.0 we are seeing random connection drops. We hope to increase stability by using a higher value for the TCP/IP keep alive time parameter. Cezanne's usbip-win project allows to configure the TCP/IP keep alive timeout via an enviornment variable (c.f. https://github.com/cezanne/usbip-win/wiki/Deployment-Tips#handling-abortive-disconnection).

I'm planning to make PR to add a similar feature to this project.

dorssel commented 2 years ago

What exactly is causing the drops? We already have this: https://github.com/dorssel/usbipd-win/blob/master/Usbipd/ConnectedClient.cs#L39-L42

You are the first to report such issues. Are you sure that TCP drops (connection reset) are the cause? Do you have a PCAP for analysis? Did you run with debugging messages as described here? https://github.com/dorssel/usbipd-win/wiki/Troubleshooting.

If this will ever be made configurable, then certainly not via an environment variable.

dorssel commented 2 years ago

I've noticed your PR. What values are you using to fix your problem? If those are better defaults than we already have, then I much rather use that than making this configurable at all.

kaplan2539 commented 2 years ago

Wow, thanks for the fast reply!

We are seeing the same symptoms as described in https://github.com/dorssel/usbipd-win/issues/300 The connection is closed on the Linux (WSL2) client side. Collecting logs and captures will take me some time.

We started with winusbipd v2.1.0 and custom built kernel based on the official MS WSL2 5.4 (only adding support for usbip and FTDI usb serial adapters, no other config changes). Moving to windusbipd v.2.3.0 and upgrading our kernel to 5.10 made the problem occur less often but we still see it from time to time.

To be honest I haven't though much about the impact of making this configurable via environment variables - I've just copied the idea from https://github.com/cezanne/usbip-win/wiki/Deployment-Tips#handling-abortive-disconnection.

Regarding the default values: What we are trying to verify on our side is, weather a keepalive time-out of 30 seconds instead of 10.

kaplan2539 commented 2 years ago

The exact cause of the drops is unknown. We have quite complex setup with some device constantly streaming real-time measurement data via an FTDI usb-serial interface. When reading it out with the exact same software on a native linux host we don't see any connection drops. But when running our software inside WSL2 and piping the data through winusbipd / usbip we see random drops.

kaplan2539 commented 2 years ago

Actually I found some old logs (still usbip v2.1 / Linux 5.4): log-usbipd.txt log.txt

No capture though...

dorssel commented 2 years ago

From log-usbipd.txt:

dbug: UsbIpServer.ConnectedClient[1000] Unbind or unplug while attached

This does not get logged by a simple TCP connection reset. This gets logged only if the device is physically unplugged (or its equivalent: the HUB no longer reports it as present, which could happen if the device is stuck), or if the user unbinds the device while it is attached (in which case the driver is reset to the "real" Windows driver instead of the proxy driver, which in turn looks to Windows as if the device is unplugged and another device is plugged in). In other words: the proxy device was suddenly gone from the hub. And then usbipd deliberately closed the connection.

kaplan2539 commented 2 years ago

Thank you for looking into this. So it is not the usbip client causing the connection to close, but the usbipd server on the Windows host site?

dorssel commented 2 years ago

From this log data: yes.

dorssel commented 2 years ago

I've just updated the KeepAlive settings slightly in PR #434 to fix Windows 2012 R2. This should not affect existing behavior on Windows 10 or newer (but you never know).

However, I think we already concluded that your issue was not caused by KeepAlive after all.

Also, your existing PR #413 is not how I prefer to see configuration settings applied (via the environment). That is hard to control for services. Configuration of service options can be done via the command line already, see https://github.com/dorssel/usbipd-win/blob/master/Usbipd/PcapNg.cs#L27 as an example.

Do you still want KeepAlive configurable? I certainly do welcome contributions, so I'll leave #413 open for you to amend.

If not, can we close this issue and the PR?

kaplan2539 commented 2 years ago

Hi Frans, I am still not 100% sure the random connections drops are cured by higher TCP keep-alive timeout values. I am perfectly OK of not making these values configurable over environment variables and I'll close my PR. But what do you think of increasing the values for tcpKeepAliveRetryCount from 5 to 10 and for tcpKeepAliveTime from 10 to 30?

dorssel commented 2 years ago

@kaplan2539 I've created a new PR #442, could you give that a try? Installer is at https://github.com/dorssel/usbipd-win/actions/runs/2951839137.

It allows you to specify usbipd:TcpKeepAliveTime and usbipd:TcpKeepAliveInterval, both in milliseconds.

TcpKeepAliveTime is the time after the last received packet to start sending keep-alives. The retry count is fixed at 10 times, to retain backward compatibility with Windows Server 2012 R2 and Windows 8.1. The interval between retries is TcpKeepAliveInterval. Effectively, the defaults will recognize a dead connection after 10.000 + 10 * 500 = 15.000 ms = 15 s.

I don't want to increase the defaults; they are mostly used to detect WSL shutdowns.

kaplan2539 commented 1 year ago

Hi and sorry for the radio silence. If I would kike to try out different TcpKeepAliveTime settings, I guess I need to specify them in the Windows Registry? Where exactly? in \\HKLM\SOFTWARE\usbipd-win\TcpKeepAliveTime and \\HKLM\SOFTWARE\usbipd-win\TcpKeepAliveTime? How do I know the values are getting picked up by usbipd-win? I can't find anything in the logs - do I need to enable debug logs and how would I do that?

Sorry for the many questions - I've searched the Wiki and Github issues, and also tried to read the source but I'm not familiar with Microsoft.Extensions.Configuration

kaplan2539 commented 1 year ago

Ok, I believe I figured it out by starring at the code long enough, is the following a valid example? usbipd server usbipd:TcpKeepAliveTime=60000 usbipd:TcpKeepAliveInterval=1000

dorssel commented 1 year ago

That's the correct syntax, yes. You can add Logging:LogLevel:Default=Trace to verify; the actual TCP keepalive settings are then logged at program start. See https://github.com/dorssel/usbipd-win/wiki/Troubleshooting#logging.

kaplan2539 commented 1 year ago

Thanks Frans! We're still seeing connection drops. I'll open a new issue to discuss: #538