christophhart / HISE

The open source framework for sample based instruments
http://hise.audio
Other
1.03k stars 118 forks source link

SampleThreadPool thread safety #106

Closed martinfinke closed 5 years ago

martinfinke commented 6 years ago

In ~SampleThreadPool(), the pimpl is deleted before stopping the background thread. I could be missing some external synchronization/safety here, but it looks like a race condition:

  1. ~SampleThreadPool() does pimpl = nullptr;
  2. The thread could still be running, so run() calls pimpl->jobQueue.peek() for example, which is a null pointer dereference.
  3. ~SampleThreadPool() calls stopThread. After that has returned, we know that the thread isn't running anymore.

I get crash reports in SampleThreadPool::run(), while the main thread is inside ~SampleThreadPool().

A solution could be to use atomic shared_ptr, keeping a local shared_ptr<Pimpl> inside run()'s while loop.

martinfinke commented 6 years ago

In the comment for SampleLoader, it says:

add an instance of this to a ThreadPool (but don't delete it!)

So it seems the preferred way is to reuse StreamingSamplerVoice instances. As an example, juce::Synthesiser can reuse a voice as soon as StreamingSamplerVoice::resetVoice() has returned. Is it really safe to reuse at this point, or could the SampleThreadPool background thread still be in the middle of calling SampleLoader::fillInactiveBuffer(), filling the buffers with some sample data that belongs to the old sound?

martinfinke commented 6 years ago

it looks like a race condition

The crash can be reproduced reliably by adding a sleep:

SampleThreadPool::~SampleThreadPool()
{
    pimpl = nullptr;

    Thread::sleep(3000);

    stopThread(300);
}

It then crashes consistently in SampleThreadPool::run() because pimpl is nullptr. Of course, the destructor "never" takes 3 seconds, it just shows that you can't rely on the background thread not accessing pimpl there.

I suggest doing stopThread first in the destructor, and then resetting pimpl (or not resetting it by hand).

christoph-hart commented 6 years ago

Alright, thanks. What if you simply exchange the lines?

stopThread(1000); // more is better
pimpl = nullptr;

EDIT: You already suggested that :)

martinfinke commented 6 years ago

There's another race here. Calling ReaderWriterQueue::peek() is thread-safe by itself, but calling WeakReference<Job>::get() isn't.

A thread-safe approach would be to use ReaderWriterQueue::try_dequeue() to get a local copy of the WeakReference<Job>, but then you don't keep Jobs in the queue until jobHasFinished. Is it important that the Job stays at the front of the queue, or could we push it to the end of the queue again? (that would need a multi-producer queue)

christoph-hart commented 5 years ago

Sorry, took me a while to get back here. I fixed the order in the destructor, so it shouldn't crash anymore now.

You're right about the try_dequeue being more safe than peeking the pointer (I can imagine that this race condition is a bit theoretical, but if it makes Thread Sanitizer happy, then I am fine with the change). I've changed this too. There shouldn't be any issues with pushing it back to the queue if it needs running again (if it checks if it's currently being executed, it uses one of the atomic flags) - the only thing is that the reexecution has to wait until the other jobs are done, but in this scenario, you'll be in trouble anyway.

RE: the resetVoice(): if you're using monoliths (which I know you do), there is no deferred action launched on the streaming thread so you can definitely reuse the voices after calling StreamingSamplerVoice::resetVoice()- however if you use single audio files, it might happen that the file handles need to be closed (because funky macOS can't open more than ~242 files in certain sandboxed hosts) and this can't be done on the audio thread. In this case, it will launch Unmapper::runJob(), which closes the file handles (which is like any other OS call non-deterministic in its execution time).

This is the commit that fixes everything addressed here:

https://github.com/christophhart/HISE/commit/8ac9de96d0b11945ec31ab7da5bac4f7989b4241

I am going to close the issue, feel free to reopen if there is something left.

martinfinke commented 5 years ago

No problem, and thanks for all the details!

Yeah, the point with that race condition is to keep Thread Sanitizer clean, and maybe not having to remember / comment that it's not-critical, when reading the code again in a few months.