NatronGitHub / Natron

Open-source video compositing software. Node-graph based. Similar in functionalities to Adobe After Effects and Nuke by The Foundry.
http://NatronGitHub.github.io
GNU General Public License v2.0
4.54k stars 332 forks source link

Fix DeleterThread and CacheCleanerThread locking. #877

Closed acolwell closed 1 year ago

acolwell commented 1 year ago

Thanks for submitting a pull request! Please provide enough information so that others can review your pull request. Additionally, make sure you've done all of these things:

PR Description

What type of PR is this? (Check one of the boxes below)

What does this pull request do?

This change fixes double unlock that was causing crashes when the Tests binary would exit on Windows.

This change replaces the explicit unlock() on the mutex with a call to unlock() on the QMutexLocker holding the lock. This allows to QMutexLocker to avoid calling unlock() in its destructor when the mutex has already been unlocked.

The QMutexLocker that is declared right after the unlock() was also given a unique name in the function scope. This has no functional effect, but could help avoid accidentally calling the wrong object if this code is modified in the future.

Have you tested your changes (if applicable)? If so, how?

Yes. Tests.exe was crashing on Windows in the lock code before this change. It no longer crashes with this change applied.

devernay commented 1 year ago

Out of curiosity, @acolwell how did you find this? I spent a long time re-reading this code, knowing there was a bug here

acolwell commented 1 year ago

Out of curiosity, @acolwell how did you find this? I spent a long time re-reading this code, knowing there was a bug here

I was getting reliable crashing in the Tests.exe binary. When I ran the code in GDB, the debug info said it was crashing at the closing brace of the block containing the scope lock. I've done a ton of multithreaded bug fixing over my career and a crash like that often indicates a lock/unlock mismatch so I started looking at all the locking code. I noticed that the mutex was being passed to the scope lock and then being directly manipulated afterwards. This is usually a no-no and if you do it you need to make sure you put the mutex back into the right state before the scope lock destructor runs. When I saw that this wasn't happening I started looking for solutions. I noticed Qt's scope lock implementation allows you to explicitly tell it that the mutex needed to be unlocked, so I just decided to use that instead of relocking the mutex before the return call.

None of this is particularly tricky. I have just seen this type of issue a lot and so I have a lot of practice looking for the common issues.

FWIW, I'm still a little suspicious of the mustQuit logic in these classes. It looks like this is trying to simulate something like pthread_join(), but it isn't actually guaranteeing that the thread is truly dead before returning from quitThread(). I believe there is a tiny window between the k2 scope lock unlocking the mutex and the thread actually terminating where problems could occur. This is especially true if the QThread implementation modifies class state after run() returns. I don't really know the history of this code, but do you know why the mustQuitCond loop is being used instead of just calling the QThread::wait() function? It seems like wait() is in a better position to make sure the thread is dead before returning.