airspy / airspyone_host

AirSpy's usemode driver and associated tools
http://airspy.com
245 stars 88 forks source link

Use of uninitialized variables after commit 099256f #93

Closed johanhedin closed 8 months ago

johanhedin commented 8 months ago

The https://github.com/airspy/airspyone_host/commit/099256fd5e3ec29a3513fb047e041a4ec3257f95 commit cause use of uninitialized variables in some cases and introduce potential thread synchronization issues.

Explanation:

A call to airspy_close() will call airspy_stop_rx(). airspy_stop_rx() calls kill_io_threads() and the code in kill_io_threads() will join the two internal worker threads. The problem is that the threads are only created by a call to airspy_start_rx(). So a use case like this:

airspy_open_sn()
airspy_version_string_read()
airspy_close()

(i.e. using the lib without start streaming) will cause the use of uninitialized variables here: https://github.com/airspy/airspyone_host/blob/d309c71c9085e8fc3b452f7a557c27985a82dc33/libairspy/src/airspy.c#L532-L533

On for example a Raspberry Pi 4 with latest Raspberry Pi OS (Bookworm) the calls to pthread_join() with uninitialized thread handles does not cause anything to happen but on for example a x86 machine with Fedora 39 pthread_join() will segfault the program.

Before https://github.com/airspy/airspyone_host/commit/099256fd5e3ec29a3513fb047e041a4ec3257f95 device->streaming guarded against this but the changes by the commit alters the behavior.

I also have a strong feeling (the closest to "knowing" when it comes to thread synchronization...) that the change from device->stop_requested = true to device->streaming = false in the worker threads can/will cause other race like problems.

The old code had a clean separation between what thread set what variable. device->stop_requested was only set from inside the worker threads to indicate a USB problem or stop by request from callback and device->streaming was only set from the thread calling the high level api (airspy_{open,start_rx,stop_rx...}) to intentionally start/stop streaming.

I tried to come up with a simple pull request to fix this but it felt like going down several threaded rabbit holes...

I don't know what parts of the commit that fixed the Windows libpthread issue and what parts that cleaned up the streaming logic (or if they are the same), but I would suggest going back to the old way regarding device->stop_requested/device->streaming to fix this if possible.

touil commented 8 months ago

Did you read the actual code and go through the "rabbit hole" or do you just "think" it should be "better" this way? More precisely, did you read this line?

https://github.com/airspy/airspyone_host/blob/master/libairspy/src/airspy.c#L832

johanhedin commented 8 months ago

Did you read the actual code and go through the "rabbit hole" or do you just "think" it should be "better" this way? More precisely, did you read this line?

https://github.com/airspy/airspyone_host/blob/master/libairspy/src/airspy.c#L832

Realized the wording "uninitialized" is wrong on my side. I should have expressed it as "calling ptherad_jonin() with a thread id of zero causes segfault".

touil commented 8 months ago

I am yet to see a segfault with that code. pthread_join returns ESRCH with thread_id 0, which is perfectly fine.

johanhedin commented 8 months ago

The following program segfaults on CentOS 6, CentOS 7 and Fedora 39 but works on Raspberry Pi OS (Bookworm and Buster). So not all pthread implementations on all platforms handles a thread id of 0 to pthread_join() as ESRCH.

#include <pthread.h>

int main(int argc, char **argv) {
    pthread_t thread = 0;

    pthread_join(thread, NULL);

    return 0;
}

Compiled with:

$ gcc -o out segfault.c -lpthread
touil commented 8 months ago

Can't reproduce on any up to date Linux on any CPU I tested (x86, x86_64, ARMHF, AARCH64.) Maybe you just highlighted a problem in your distro? I'm sure you can find more broken Linux stuff if you bring a distro from the 90s. Yet, the "fix" you proposed still doesn't work because of the many possible ways libusb can fail. Someone has to go through that rabbit hole and do the actual investigation work. Maybe a simple check can work for the fossilized distros. Or maybe we should exclude them to keep a sane code base.

Try it: https://onlinegdb.com/VQarzj8G-

johanhedin commented 8 months ago

The reference to the old CentOS distros was just to show that the behavior of pthread_join() has been consistent for a very long time in the RHEL-line of distros. Fedora 39 is a recent distro and the issue I'm raising is not about fossilized distros.

As far as I can read, passing anything other than a joinable thread id as the first argument to pthread_join() is undefined behavior. Distros/implementations have apparently made different choices how to handle this (Debian/Ubuntu: return ESRCH, RHEL-line: segfault) . Nothing odd about that but code using pthread should be written with this in mind. In this case not calling pthread_join() with a invalid/NULL thread id or a thread id that was previously joined.

I'll create a PR with a check.

touil commented 8 months ago

This should be it.

johanhedin commented 8 months ago

Tnx.