f4exb / sdrangel

SDR Rx/Tx software for Airspy, Airspy HF+, BladeRF, HackRF, LimeSDR, PlutoSDR, RTL-SDR, SDRplay and FunCube
GNU General Public License v3.0
2.97k stars 447 forks source link

FileInputThread can be destroyed while ::tick() is executing #562

Closed Vort closed 4 years ago

Vort commented 4 years ago

FileInput::stop() calls m_fileInputThread->stopWork(). But despite its name, stopWork() just disconnects tick() slot. Which can't prevent tick() function from continuing of its execution. Then delete m_fileInputThread; is called and this pointer inside tick() becomes invalid. Which means crash.

FileInput plugin looks overcomplicated, so it, most likely, deserve rewrite. But for now it may be sufficient to patch most annoying problems.

Here is my attempt to fix the problem: ```patch diff --git a/plugins/samplesource/fileinput/fileinput.cpp b/plugins/samplesource/fileinput/fileinput.cpp index 2eafef0e8..9f12c5121 100644 --- a/plugins/samplesource/fileinput/fileinput.cpp +++ b/plugins/samplesource/fileinput/fileinput.cpp @@ -373,26 +373,30 @@ bool FileInput::handleMessage(const Message& message) else if (FileInputThread::MsgReportEOF::match(message)) { qDebug() << "FileInput::handleMessage: MsgReportEOF"; - m_fileInputThread->stopWork(); - if (getMessageQueueToGUI()) + if (m_fileInputThread != 0) { - MsgReportFileInputStreamTiming *report = MsgReportFileInputStreamTiming::create(m_fileInputThread->getSamplesCount()); - getMessageQueueToGUI()->push(report); - } + m_fileInputThread->stopWork(); - if (m_settings.m_loop) - { - seekFileStream(0); - m_fileInputThread->startWork(); - } - else - { if (getMessageQueueToGUI()) { - MsgPlayPause *report = MsgPlayPause::create(false); + MsgReportFileInputStreamTiming *report = MsgReportFileInputStreamTiming::create(m_fileInputThread->getSamplesCount()); getMessageQueueToGUI()->push(report); } + + if (m_settings.m_loop) + { + seekFileStream(0); + m_fileInputThread->startWork(); + } + else + { + if (getMessageQueueToGUI()) + { + MsgPlayPause *report = MsgPlayPause::create(false); + getMessageQueueToGUI()->push(report); + } + } } return true; diff --git a/plugins/samplesource/fileinput/fileinputthread.cpp b/plugins/samplesource/fileinput/fileinputthread.cpp index 95842e44d..869678a55 100644 --- a/plugins/samplesource/fileinput/fileinputthread.cpp +++ b/plugins/samplesource/fileinput/fileinputthread.cpp @@ -93,6 +93,7 @@ void FileInputThread::stopWork() qDebug() << "FileInputThread::stopWork"; disconnect(&m_timer, SIGNAL(timeout()), this, SLOT(tick())); m_running = false; + QMutexLocker lock(&m_tickMutex); wait(); } @@ -176,6 +177,8 @@ void FileInputThread::tick() { if (m_running) { + QMutexLocker lock(&m_tickMutex); + qint64 throttlems = m_elapsedTimer.restart(); if (throttlems != m_throttlems) diff --git a/plugins/samplesource/fileinput/fileinputthread.h b/plugins/samplesource/fileinput/fileinputthread.h index c1cefefd6..ab143f3b4 100644 --- a/plugins/samplesource/fileinput/fileinputthread.h +++ b/plugins/samplesource/fileinput/fileinputthread.h @@ -70,6 +70,7 @@ public: private: QMutex m_startWaitMutex; + QMutex m_tickMutex; QWaitCondition m_startWaiter; volatile bool m_running; ```
f4exb commented 4 years ago

FileInput plugin looks overcomplicated

Running smoothly at the proper sample rate is complicated. This is not just reading the file and dumping it to the DSP chain. I do not see the need for a rewrite.

Having said that I agree on the (Edit: possible) issue.

f4exb commented 4 years ago

Threading is always complicated and Qt despite their good will do not make things much easier. One one hand I am thinking that the code in FileInputThread is running in "the" thread so execution cannot be in more than one place. However with the tick() signal this may be different and indeed since the thread loop just checks evey second if the thread should be running and then goes back to sleep this defeats the idea that this portion of code would run in sequence with the tick{} handler.

So let's assume this is possible and then indeed disconnecting the handler prevents new ticks to be handled but not the current one if it is running. Maybe we should build up on this idea of letting the current tick finish and then stop.

Edit: I've tried a few things and was never able to see the tick() running past stopWork() so we may be chasing non issues there. I have never seen FileInput throwing a segfault for this reason either. However perhaps the design could be made nicer and at least apparently safer so one would feel more comfortable looking at this code.

Vort commented 4 years ago

so we may be chasing non issues there.

Even theoretical crashes are important. But this one is real: fi_crash

was never able to see the tick() running past stopWork()

It happens when system is overloaded with something. To simulate this problem you can add QThread::msleep(30); or so right after the start of FileInputThread::tick().

f4exb commented 4 years ago

OK got it. What I suggest is first move the connect and disconnect inside the run() function. This does not fix the problem this is just for clarity.

Then obviously the issue is with tick() taking too long and QThread::msleep() can indeed simulate that although I had to push it a bit harder to 300 ms to get only occasional crashes. Of course at 6 times the nominal tick period it displays shit but I agree that even though it shouldn't crash.

One possibility as you show in your code is to use a mutex and tie run() and tick() so that run() will not exit before tick() is done nor tick() will start if run() goes past the mutex lock. This is nice but I am always a bit reluctant to use mutexes in time critical parts of the code because of their overhead.

Another possibility offered by Qt methods on threads is to use deleteLater() instead of a plain delete in the caller of the thread (FileInput). The Qt doc says deleteLater() will perform the delete only after the control has returned to the event loop. This is a bit mysterious but in fact it does exactly what we want because it will "return to the event loop" only after tick() is finished. With this arrangement I could not get any crash after many attempts.

Edit: this only needs very few changes. I think I could also get rid of the if (m_running) that becomes useless. I have kept the msleep commented out for documentary purposes it can also be removed later:

Using deleteLater() to fix the problem: ```diff diff --git a/plugins/samplesource/fileinput/fileinput.cpp b/plugins/samplesource/fileinput/fileinput.cpp index 2eafef0e8..cd83a2f32 100644 --- a/plugins/samplesource/fileinput/fileinput.cpp +++ b/plugins/samplesource/fileinput/fileinput.cpp @@ -218,7 +218,7 @@ void FileInput::stop() if (m_fileInputThread) { m_fileInputThread->stopWork(); - delete m_fileInputThread; + m_fileInputThread->deleteLater(); m_fileInputThread = nullptr; } diff --git a/plugins/samplesource/fileinput/fileinputthread.cpp b/plugins/samplesource/fileinput/fileinputthread.cpp index 95842e44d..959e0f3a8 100644 --- a/plugins/samplesource/fileinput/fileinputthread.cpp +++ b/plugins/samplesource/fileinput/fileinputthread.cpp @@ -80,7 +80,6 @@ void FileInputThread::startWork() while(!m_running) m_startWaiter.wait(&m_startWaitMutex, 100); m_startWaitMutex.unlock(); - connect(&m_timer, SIGNAL(timeout()), this, SLOT(tick())); } else { @@ -91,7 +90,6 @@ void FileInputThread::startWork() void FileInputThread::stopWork() { qDebug() << "FileInputThread::stopWork"; - disconnect(&m_timer, SIGNAL(timeout()), this, SLOT(tick())); m_running = false; wait(); } @@ -163,12 +161,14 @@ void FileInputThread::run() { m_running = true; m_startWaiter.wakeAll(); + connect(&m_timer, SIGNAL(timeout()), this, SLOT(tick())); while(m_running) // actual work is in the tick() function { sleep(1); } + disconnect(&m_timer, SIGNAL(timeout()), this, SLOT(tick())); m_running = false; } @@ -176,6 +176,7 @@ void FileInputThread::tick() { if (m_running) { +// QThread::msleep(300); qint64 throttlems = m_elapsedTimer.restart(); if (throttlems != m_throttlems) ```
Vort commented 4 years ago

It fixes the crash. But in fact FileInput::stop() transforms to "stopLater()". And it relies on caller's return to event loop. So it may cause problems in future. Or maybe not. Can't predict for sure.

It would be nice if you also get rid of sleep(1). It causes hiccup when you stop AM decoding from file. But if you want to limit this issue to a single problem, then I will create separate report.

f4exb commented 4 years ago

It would be nice if you also get rid of sleep(1).

So... just a forever tight loop like while (m_running) {} ? Would this work nicely because since it is a thread? Or will it consume all of the CPU thread for just nothing? I might try and see but for now I don't know the answer.

f4exb commented 4 years ago

And it relies on caller's return to event loop

Is there anything better to do? The mutex approach will overcome that only if assorted with a timeout and what happens when the time is out? Depending on where it is in the tick() code it may crash anyway,

Vort commented 4 years ago

So... just a forever tight loop like while (m_running) {} ?

I mean maybe thread is not needed here at all.

But that is a discussion for another report.

f4exb commented 4 years ago

But that is a discussion for another report.

OK I'm open to new ideas but I don't see yet how you can do that.

Vort commented 4 years ago

I don't see yet how you can do that.

Like this: ```diff diff --git a/plugins/samplesource/fileinput/fileinputthread.cpp b/plugins/samplesource/fileinput/fileinputthread.cpp index 95842e44d..190d8c9d3 100644 --- a/plugins/samplesource/fileinput/fileinputthread.cpp +++ b/plugins/samplesource/fileinput/fileinputthread.cpp @@ -32,7 +32,7 @@ FileInputThread::FileInputThread(std::ifstream *samplesStream, const QTimer& timer, MessageQueue *fileInputMessageQueue, QObject* parent) : - QThread(parent), + QObject(parent), m_running(false), m_ifstream(samplesStream), m_fileBuf(0), @@ -71,17 +71,17 @@ void FileInputThread::startWork() { qDebug() << "FileInputThread::startWork: "; + QMutexLocker lock(&m_tickMutex); if (m_ifstream->is_open()) { qDebug() << "FileInputThread::startWork: file stream open, starting..."; - m_startWaitMutex.lock(); - m_elapsedTimer.start(); - start(); - while(!m_running) - m_startWaiter.wait(&m_startWaitMutex, 100); - m_startWaitMutex.unlock(); - connect(&m_timer, SIGNAL(timeout()), this, SLOT(tick())); - } + if (!m_running) + { + m_running = true; + m_elapsedTimer.start(); + connect(&m_timer, SIGNAL(timeout()), this, SLOT(tick())); + } + } else { qDebug() << "FileInputThread::startWork: file stream closed, not starting."; @@ -91,9 +91,13 @@ void FileInputThread::startWork() void FileInputThread::stopWork() { qDebug() << "FileInputThread::stopWork"; - disconnect(&m_timer, SIGNAL(timeout()), this, SLOT(tick())); - m_running = false; - wait(); + + QMutexLocker lock(&m_tickMutex); + if (m_running) + { + m_running = false; + disconnect(&m_timer, SIGNAL(timeout()), this, SLOT(tick())); + } } void FileInputThread::setSampleRateAndSize(int samplerate, quint32 samplesize) @@ -159,21 +163,10 @@ void FileInputThread::setBuffers(std::size_t chunksize) } } -void FileInputThread::run() -{ - m_running = true; - m_startWaiter.wakeAll(); - - while(m_running) // actual work is in the tick() function - { - sleep(1); - } - - m_running = false; -} - void FileInputThread::tick() { + QMutexLocker lock(&m_tickMutex); + if (m_running) { qint64 throttlems = m_elapsedTimer.restart(); diff --git a/plugins/samplesource/fileinput/fileinputthread.h b/plugins/samplesource/fileinput/fileinputthread.h index c1cefefd6..8458ebba8 100644 --- a/plugins/samplesource/fileinput/fileinputthread.h +++ b/plugins/samplesource/fileinput/fileinputthread.h @@ -35,7 +35,7 @@ class SampleSinkFifo; class MessageQueue; -class FileInputThread : public QThread { +class FileInputThread : public QObject { Q_OBJECT public: @@ -69,8 +69,7 @@ public: void setSamplesCount(quint64 samplesCount) { m_samplesCount = samplesCount; } private: - QMutex m_startWaitMutex; - QWaitCondition m_startWaiter; + QMutex m_tickMutex; volatile bool m_running; std::ifstream* m_ifstream; @@ -90,7 +89,6 @@ private: QElapsedTimer m_elapsedTimer; bool m_throttleToggle; - void run(); //void decimate1(SampleVector::iterator* it, const qint16* buf, qint32 len); void writeToSampleFifo(const quint8* buf, qint32 nbBytes); ```
Vort commented 4 years ago

I will try to make another solution soon.

f4exb commented 4 years ago

Of course you don't need to be a QThread to handle events as soon as you inherit of QObject then you are able to do that. However this solution puts the timer handler (the tick() method) in the main thread. This has to be experimented with a bit more. Does it work nicely if many FileInputs are running in parallel and for a variety possibly large sample rates? One question may be why would you like to do that? Answer: if this works with threads but not with all handlers in the main thread then this is an issue.

A very simple solution still using threads can be to reduce the "sleep" time to match the nominal timer time (~50ms). In fact a tight loop ({}) may be an issue but as soon as some delay even short is introduced this could be enough to break the "CPU burn".

Vort commented 4 years ago

However this solution puts the timer handler (the tick() method) in the main thread.

It was not in FileInputThread anyway. I will test another option and tell the results.

Vort commented 4 years ago

Here is the solution, which is more complex, but I like it more: 53b1db1c18d7f091125708336b08bc1b7e9376d1. In this commit, *Thread was renamed into *Worker, like it was done for kiwisdr. Looks like it is the only correct way to bind slots to thread.

It of course needs polishing, but there should be no race conditions possible with such approach. FileInput::stopWorker() shuts down event loop of m_fileInputWorkerThread completely. No tick() can slip through.

And here is the proof that read task is having its own thread now (9908): sdra_worker Compare it with previous screenshot, where reads are mixed with various other tasks.

f4exb commented 4 years ago

Yes looks good. It is a recommendation anyway not to derive from QThread but instead moving object to thread. the FileInput plugin is a bit old and was not designed that way.

f4exb commented 4 years ago

It of course needs polishing

Looks already good for me like this. Some extra polishing would have been needed on the original code anyway that can be done in a second pass.

Vort commented 4 years ago

Ok. I did very little testing for it, but if you think that it is ok, then it is fine. I can always create another bugreport :)

BTW, it may be that other plugins requires the same change, which may make they operation more smooth. But it should be tested of course, code may be not ready for different threads situation.

f4exb commented 4 years ago

But it should be tested of course

Yes np I can do that.

BTW, it may be that other plugins requires the same change

I spotted LocalSink that has the exact same structure with a sleep for one second in the run() method. All the other device plugins have their threads subclassing QThread which is not nice but may not be so detrimental as they do something useful in their run() method. The channel plugins were refactored recently and apply the good practice with threads.

Vort commented 4 years ago

It should be fixed with c75c35acad703dc92f8967d78b9de1c808942297. If problems arise with this commit, either this report should be reopened or new one created.

f4exb commented 4 years ago

Done in v4.14.16 and v5.7.10