analogdevicesinc / libiio

A cross platform library for interfacing with local and remote Linux IIO devices
http://analogdevicesinc.github.io/libiio/
GNU Lesser General Public License v2.1
484 stars 312 forks source link

usb.c device discovery does not play nice with other libusb devices #586

Closed guruofquality closed 2 years ago

guruofquality commented 4 years ago

To whom it may concern. I maintain a small SDR ecosystem (SoapySDR) in which the PlutoSDR is one of the supported devices. Since I have added PlutoSDR to the ecosystem and some of the installers, I have had a number of users report issues that the non-libiio devices do not get discovered when the PlutoSDR support is installed.

The issue

We have narrowed down the issue to how libiio usb device discovery functions. To put it simply, libiio usb opens every device in libusb, even devices which are not IIO devices, and have nothing to do with IIO. When this happens, the libusb drivers for devices like RTL, USRP, HackRF, etc, cannot discover their devices, because IIO is holding the open handles.

Because IIO is also closing the device handles, this is effectively a race condition as to which thread will get to open the device first: IIO, or the proper driver?

Potential fix

RTL, USRP, HackRF, etc drivers use VID/PID to narrowly select which devices they are supposed to open. I don't really see that in the libiio usb.c code.

I think that one way to fix this would be to pass the expected VID/PID for the PlutoSDR into the iio scan context, so libiio usb support would only look for the specific devices rather than all of them.

I think that VID/PID support is probably an addition of some sort that would need to be added to usb.c, but I would be happy to hear otherwise.

This is the code in question that find the PlutoSDR and uses the libIIO API: https://github.com/pothosware/SoapyPlutoSDR/blob/master/PlutoSDR_Registration.cpp#L5

I'm not an IIO API expert, I dont have a PlutoSDR. I'm just trying to support and fix a problem in the community. So I hope the explanation of the problem makes sense, and I look forward to any advice, fixes, or work arounds. Thanks!

rgetz commented 4 years ago

The problem makes sense - thanks for the description - it's well thought out and stated.

libiio has nothing to do with the PlutoSDR - it's one of the many (hundreds) of IIO devices that is used with the linux kernel's IIO infrastructure. Pluto is just a popular one.

that doesn't mean we don't want to fix it, but we decided a long time ago, that USB PID/VIDs don't belong in a core library - so we would prefer not to do that. (we do the same thing with the libiio udev rule - there isn't a PID/VID list - since it's not a product library, it's an interface library).

the code you are talking about is here: https://github.com/analogdevicesinc/libiio/blob/master/usb.c#L486-L513 (which as you indicates - looks at all devices to find out if libusb_get_string_descriptor_ascii() returns "IIO" which is what Pluto, and all other generic IIO devices do).

Since this is a generic race condition (could happen by any code - not just libiio) - isn't the proper thing to do - to have the error handling code in the HackRF driver (or other) retry n times after m milliseconds on failure? That sounds like a more robust solution to the generic libusb problem (That it can't let more than one process open a device and get the device descriptor strings).

?

guruofquality commented 4 years ago

I think most of the discussion ended up here, but I wanted to to leave a response.

that doesn't mean we don't want to fix it, but we decided a long time ago, that USB PID/VIDs don't belong in a core library - so we would prefer not to do that. (we do the same thing with the libiio udev rule - there isn't a PID/VID list - since it's not a product library, it's an interface library).

I tend to agree. Since we know in particular device, we know the PID/VID, and we have control over the client code that uses libiio -- it seems like the best solution is to have the client code invoke libiio to only open from a select VID/PID or list of VID/PIDs that are specified by the client code. This operation would be remarkably similar to how other SDR devices use libusb.

Since this is a generic race condition (could happen by any code - not just libiio) - isn't the proper thing to do - to have the error handling code in the HackRF driver (or other) retry n times after m milliseconds on failure? That sounds like a more robust solution to the generic libusb problem (That it can't let more than one process open a device and get the device descriptor strings).

So if libusb_open() fails, the assumption is that some other process or context is holding the device handle open and using it. So its not normally a race, nor is it really a failure in this case. And most of these device drivers, if not all, wont report devices that are "in-use". -- I guess the issue is there is an unspoken rule that usb devices are only held open when they are supposed to be. And to that credit, there is an ecosystem of drivers that do this --- they place nice. Actually libiio is the first time I have seen something like this.

I dont think I can convince HackRF, RTLSDR, LimeSuite, USRP, and potentially un-enumerable others to put retry loops. I'm sure I would get pushback from respective devs wondering why libIIO was opening their device, when its not an "IIO" compatible device. -- But I think on a day-to-day basis, you almost never see someone developing with libiio messing up another unrelated USB device driver, although from the looks of it, it could happen.

So, this is my unique problem because SoapySDR lets you have a unified API for all of these diverse devices, and the fact that it can discover all of them in one API call is what exacerbates the race condition. Further, the different hardware support modules are discovered in parallel (multi-thread), or the operation would take too long in many cases. If the devices various APIs were called serially, this would not happen. But I cant do that, and I don't have a way to know that this one particular hardware support module for PlutoSDR needs to be "serialized" after the others.

Anyway, its an open support issue for me that not always easy to diagnose. And I'm sure even more people have just run into this problem and gave up with without even saying anything. So I have to do something. At the moment SoapyPlutoSDR.dll may have to go into some kind of optional install component to prevent the issue.

rgetz commented 4 years ago

this isn't a libiio specific issue - lsusb -vvv does the same thing.

https://github.com/gregkh/usbutils/blob/master/lsusb.c#L3612

There are many libraries & applications that do the same. It's pretty common.

We will add something that allows you to specify only specific vid/pid when you want...

-Robin

guruofquality commented 4 years ago

this isn't a libiio specific issue - lsusb -vvv does the same thing. There are many libraries & applications that do the same. It's pretty common.

I totally agree. Basically I found a way to reliably create the issue, and its outside of the user's control. Where as I hope that in other cases such as lsusb -vvv its less likely or at least the user can not run some command or daemon at the same time they are opening the SDR app.

Technically there's nothing wrong here. Its more like getting applications, drivers, and people to cooperate and agree on behaviour. And there isn't really a "correct" answer. :-)

We will add something that allows you to specify only specific vid/pid when you want...

Much appreciated, thanks! I am looking forward to it.

gozu42 commented 2 years ago

(taking that off the GSG side of things)

We will add something that allows you to specify only specific vid/pid when you want...

@rgetz did this ever happen?

rgetz commented 2 years ago

not yet - looking at it now.

rgetz commented 2 years ago

I have a PR now - will let you know how it gets reviewed.

rgetz commented 2 years ago

This should be fixed in #791 which was recently merged to master

StefanBruens commented 1 year ago

This is still very fragile and problematic:

  1. In case one has multiple devices with a matching VID:PID, but only one is in IIO mode (e.g the other one may be in a FW download mode), the other may be still matched with the rule
  2. The udev rule is run for every usb device and every usb interface which is added or removed. I.e. spawn a shell, fork iio_info, fork grep.

(2.) can be somewhat mitigated by adding an ACTION=="remove", GOTO="iio_end" match at the beginning, and also match with ENV{DEVTYPE}=="usb_interface".

Do IIO usb devices have a specific class/subclass/protocol, either on the device or interface level? If yes, this can be added as a match as well.

rgetz commented 2 days ago

In case one has multiple devices with a matching VID:PID, but only one is in IIO mode (e.g the other one may be in a FW download mode), the other may be still matched with the rule

That's not the way USB PID works - when you change mode - into DFU or anything else - you are required to change the PID. Otherwise you won't pass the USB test specs. Things like Pluto and M2k do that.

Pluto switches it’s normal USB PID (which is 0xB673) to 0xB674 (PlutoSDR DFU)

See: https://wiki.analog.com/university/tools/pluto/users/firmware#how_can_i_check_if_the_device_is_in_dfu_mode

So, this is a non-issue.

StefanBruens commented 2 days ago

In case one has multiple devices with a matching VID:PID, but only one is in IIO mode (e.g the other one may be in a FW download mode), the other may be still matched with the rule

That's not the way USB PID works - when you change mode - into DFU or anything else - you are required to change the PID. Otherwise you won't pass the USB test specs. Things like Pluto and M2k do that.

Can you point to an actual part of the USB specification where this requirement is written down? Changing VID:PID is a requirement for DFU (specifically written down for DFU, not because of USB, but "because some OSes are not able to do otherwise"). So "or anything else" is incorrect.

Interface altsettings are part of the standard, and thus interfaces may appear and disappear without changing VID:PID.

Devices may also be switched persistently (e.g. with a hardware switch, or with a config command) between modes, not necessarily providing an IIO capable interface in all modes.

You are too focused on the Pluto devices, IIO tries to be a vendor neutral standard interface.