airspy / airspyone_host

AirSpy's usemode driver and associated tools
http://airspy.com
245 stars 88 forks source link

Deadlock in airspy_close if device is unplugged while streaming #74

Closed johanhedin closed 3 years ago

johanhedin commented 3 years ago

If a device that is streaming is unplugged, libairspy sometimes deadlock when airspy_close is called to close the open, but now absent, device. As far as I can see, what is happening is that the conditional: https://github.com/airspy/airspyone_host/blob/229d5a5045b890621cd9cc200a19b1541c6021ae/libairspy/src/airspy.c#L356 is not released properly. And then, in airspy_close, the destruction of the conditional blocks since it is still "active": https://github.com/airspy/airspyone_host/blob/229d5a5045b890621cd9cc200a19b1541c6021ae/libairspy/src/airspy.c#L1000 Observed on Linux/Fedora 33 with a Airspy R2 and using latest libairspy from git.

Simple program to reproduce, deadlock.c:

#include <stdio.h>
#include <unistd.h>
#include <inttypes.h>

#include <airspy.h>

struct ctx {
    unsigned counter;
};

static int data_cb(airspy_transfer *transfer) {
    struct ctx *ctx = (struct ctx*)transfer->ctx;

    if (++ctx->counter == 20) {
        printf("Consumer thread: Data received...\n");
        ctx->counter = 0;
    }

    return 0;
}

int main(int argc, char **argv) {
    uint64_t              serial = 0;
    struct airspy_device *dev = NULL;
    int                   ret;
    struct ctx            ctx = { 0 };

    // Get serial if provided
    if (argc > 1) {
        ret = sscanf(argv[1], "%" SCNx64, &serial);
        if (ret != 1) serial = 0;
    }

    printf("Main thread: Running deadlock test using device %.16" PRIx64 "....\n", serial);

    ret = airspy_open_sn(&dev, serial);
    if (ret != AIRSPY_SUCCESS) {
        printf("Error: Unable to open device.\n");
        return 1;
    }

    airspy_start_rx(dev, data_cb, &ctx);

    printf("Main tread: Waiting for device to be unplugged...\n");
    while (airspy_is_streaming(dev) == AIRSPY_TRUE) {
        sleep(1);
    }

    airspy_stop_rx(dev);
    airspy_close(dev);

    return 0;
}

Build and run with:

$ gcc -Wall -I/usr/include/libairspy -o deadlock deadlock.c -lairspy -lusb-1.0 -lpthread
$ ./deadlock

and then unplug the device. Sometimes the program exits correctly, but sometimes it hangs in airspy_close.

bvernoux commented 3 years ago

I suspect this "issue" mainly comes from libusb which does not manage correctly hotplug by default

johanhedin commented 3 years ago

Full hotpluging looks nice, but I think than this issue is actually in the current libairspy implementation (I might be wrong ofcourse).

When the device is removed, all libusb_* functions that are blocking, like here: https://github.com/airspy/airspyone_host/blob/229d5a5045b890621cd9cc200a19b1541c6021ae/libairspy/src/airspy.c#L503 or called, like here: https://github.com/airspy/airspyone_host/blob/229d5a5045b890621cd9cc200a19b1541c6021ae/libairspy/src/airspy.c#L478 will return with some libusb error and libairspy will, as you mention, act on the error and set device->streaming = false which will stop streaming. But setting device->streaming = false without releasing the conditional will cause the consumer_threadproc to never exit.

The "cleanup release" of the conditional that is called during a "normal" stop here: https://github.com/airspy/airspyone_host/blob/229d5a5045b890621cd9cc200a19b1541c6021ae/libairspy/src/airspy.c#L524 is not called when streaming == false as in this error case.

I think the fix is as simple as making sure that the conditional is signaled from the transfer_threadproc when libusb_handle_events_timeout_completed returns an error and sets device->streaming = false.

I will create a pull request for you to look at.

bvernoux commented 3 years ago

Yes very interesting thanks for the analyze Such use case was not tested as it is not "normal process" to remove the USB cable when streaming anyway it is a good improvement if that can be correctly managed in libairspy. To be checked with your code which reproduce the issue more "easily" with a PR to test the fix. Note: About libusb hotplug support it seems still not supported for Windows see https://github.com/libusb/libusb/issues/86

johanhedin commented 3 years ago

You can easily run your own "hotplug" with the current libairspy (if we disregard the current deadlock) by simply checking airspy_is_streaming. If that function returns error, it means that the device disappeared somehow and you can close the device and try opening it again. And just repeat until the device is inserted.

My use case is using an Airspy from a program in deamon mode so I like to have a robust implementation that can handle if the device is pulled for some reason.

BTW, there are memory leaks in libairspy as well when pulling a device that are streaming (if not valgrind is messing up something here...). I think I have some leads, but some way of dealing with the deadlock seem like priority one.

bvernoux commented 3 years ago

Thanks for your contribution it is fixed with https://github.com/airspy/airspyone_host/pull/76

johanhedin commented 3 years ago

Thank you for the quick response!

touil commented 3 years ago

There is still one problem. If one transfer fails, the entire streaming will be stopped. This could happen more often than you think depending on the USB host hardware and the quality of the transmission line. That's why the buffer sizes are aligned to specific values so the streaming can resume without corrupting the upcoming data. Losing one transfer is less dramatic than stopping the entire streaming session. In some scenarios, stopping the stream cannot be tolerated. A better solution would be to detect if the device is still alive using a time stamp which is updated on successful receive, and only stop the streaming if no data is reaching the host after some timeout.

johanhedin commented 3 years ago

There is still one problem. If one transfer fails, the entire streaming will be stopped.

Point taken. But as far as I can se, this was already the case before #76. Right?

touil commented 3 years ago

Correct. The line in question is commented in other implementations (like airspy_adsb.)

bvernoux commented 3 years ago

This issue shall be fixed with https://github.com/airspy/airspyone_host/pull/78

bvernoux commented 3 years ago

This issue is now fixed with merge of #78