bkaradzic / bgfx

Cross-platform, graphics API agnostic, "Bring Your Own Engine/Framework" style rendering library.
https://bkaradzic.github.io/bgfx/overview.html
BSD 2-Clause "Simplified" License
14.94k stars 1.94k forks source link

Data race conditions with multithreading enabled #1186

Open volcoma opened 7 years ago

volcoma commented 7 years ago

I've found several race conditions with the clang thread sanitizer in the library. I will point some of them out but you should probably run the thread sanitizer yourself to maybe detect even more issues.

  1. Context::flip and Context::swap are executed on different threads and possibly touching the same memory as the sanitizer pointed out "m_render" member variable is read by flip while bx::xchg(m_render, m_submit); is happening.

  2. bgfx_p.h reset function called from main thread is writing to "m_flipAfterRender" while Context::renderFrame is reading it on the render thread

  3. "m_singleThreaded" in bgfx::init is set after the thread is started and the "apiSemWait" is potentially reading the "m_singleThreaded" is being writed to in the main thread. Here simply moving the "m_singleThreaded = false" before the thread.init should do the trick.

  4. I was under the impression that bgfx::getStats is thread safe which it turns out it isn't and calling it from the main thread can cause issues because the render thread can be writing those values while you read them.

All these race conditions are potential issues and crashes. Again I suggest you run the clang's thread sanitizer and dig even deeper. It is pretty easy to use.

bkaradzic commented 7 years ago

All those are false positive, or your usage error... For example when you call getStats you get data from previous frame. Only way to get race condition is to hold to pointer for multiple frames.

  1. it doesn't really matter... behavior is relaxed.
volcoma commented 7 years ago
Write of size 8 at 0x7fe00d9cb640 by main thread:
    #0 void bx::xchg<bgfx::Frame*>(bgfx::Frame*&, bgfx::Frame*&) /home/default/Desktop/Work/EtherealEngine/3rdparty/bgfx/bx/include/bx/inline/bx.inl:32 (editor+0xe4ddcc)
    #1 bgfx::Context::swap() /home/default/Desktop/Work/EtherealEngine/3rdparty/bgfx/bgfx/src/bgfx.cpp:1652 (editor+0xe4ddcc)

Previous read of size 8 at 0x7fe00d9cb640 by thread T1:
    #0 bgfx::Context::flip() /home/default/Desktop/Work/EtherealEngine/3rdparty/bgfx/bgfx/src/bgfx.cpp:1688 (editor+0xe49c6c)
    #1 bgfx::Context::renderFrame(int) /home/default/Desktop/Work/EtherealEngine/3rdparty/bgfx/bgfx/src/bgfx.cpp:1708 (editor+0xe49c6c)

SUMMARY: ThreadSanitizer: data race /home/default/Desktop/Work/EtherealEngine/3rdparty/bgfx/bx/include/bx/inline/bx.inl:32 in void bx::xchg<bgfx::Frame*>(bgfx::Frame*&, bgfx::Frame*&)

--------------------------------------------------------

 Read of size 1 at 0x7fe00dadace2 by thread T1:
    #0 bgfx::Context::renderFrame(int) /home/default/Desktop/Work/EtherealEngine/3rdparty/bgfx/bgfx/src/bgfx.cpp:1724 (editor+0xe49e82)
    #1 bgfx::renderFrame() /home/default/Desktop/Work/EtherealEngine/3rdparty/bgfx/bgfx/src/bgfx.cpp:1044 (editor+0xe49a55)
    #2 bgfx::Context::renderThread(void*) /home/default/Desktop/Work/EtherealEngine/3rdparty/bgfx/bgfx/src/bgfx_p.h:2327 (discriminator 1) (editor+0xe5fb14)
    #3 bx::Thread::entry() /home/default/Desktop/Work/EtherealEngine/3rdparty/bgfx/bx/src/thread.cpp:254 (editor+0x128935b)
    #4 bx::ThreadInternal::threadFunc(void*) /home/default/Desktop/Work/EtherealEngine/3rdparty/bgfx/bx/src/thread.cpp:70 (editor+0x128935b)

  Previous write of size 1 at 0x7fe00dadace2 by main thread:
    [failed to restore the stack]

SUMMARY: ThreadSanitizer: data race /home/default/Desktop/Work/EtherealEngine/3rdparty/bgfx/bgfx/src/bgfx.cpp:1724 in bgfx::Context::renderFrame(int)

-------------------------------------------------------------------------
Read of size 8 at 0x7f2752d66be0 by main thread:
//HERE I JUST CALL getStats()
    #0 show_statistics(unsigned int) /home/default/Desktop/Work/EtherealEngine/editor/interface/docks/scene_dock.cpp:43 (editor+0x59384b)

Previous write of size 8 at 0x7f2752d66be0 by thread T1:
    #0 bgfx::Context::apiSemWait(int) /home/default/Desktop/Work/EtherealEngine/3rdparty/bgfx/bgfx/src/bgfx_p.h:4239 (editor+0xe49da8)
    #1 bgfx::Context::renderFrame(int) /home/default/Desktop/Work/EtherealEngine/3rdparty/bgfx/bgfx/src/bgfx.cpp:1711 (editor+0xe49da8)

As you can see both threads are writing/reading the same addresses at the same time so I don't see how they are all false positives. The sanitizer tells you exactly what size is written/read from that address as it happens. Also I don't see how it s a wrong usage since all my work with bgfx is on the main thread. Correct me if I am wrong.

bkaradzic commented 7 years ago

On API thread (https://github.com/bkaradzic/bgfx/blob/master/src/bgfx.cpp#L1616):

renderSemWait(); <- wait render thread to finish
frameNoRenderWait(); <- do swap, at this point render thread is is 
       between renderSemPost and apiSemWait on snipped below, 
       and nothing is touching `m_frame`, nor `m_render`.

On render thread (https://github.com/bkaradzic/bgfx/blob/master/src/bgfx.cpp#L1711):

if (apiSemWait(_msecs) ) <- wait for API thread to finish
{
    rendererExecCommands(m_render->m_cmdPre);
    if (m_rendererInitialized)
    {
        BGFX_PROFILER_SCOPE(bgfx, render_submit, 0xff2040ff);
        m_renderCtx->submit(m_render, m_clearQuad, m_textVideoMemBlitter);
        m_flipped = false;
    }
    rendererExecCommands(m_render->m_cmdPost);

    renderSemPost(); <- signal that render thread is done

You should look bgfx internals yourself, rather than blindly trusting clang analyzer that doesn't have enough context to understand what's going on. It's not clang fault really, just things are more complex and more code that to clang looks unrelated needs to be analyzed.

If you still think there is some bug, you should figure out how to create repro case to force of crash, race condition, etc. and submit bug. Static analysis report above is just false positive.

volcoma commented 7 years ago

First of all I am not using the clang static analysis as I pointed out. I used clang thread sanitizer which does runtime detection, as race conditions happen and tells you where they happened so... here is your repo. Every time I start my app the console is flooded by reports of race conditions from the bgfx library and I have looked into them and I believe all the issues that I posted first are valid. Those memory blocks/variables are read/written by the both threads at the same time and are not guarded. Maybe look into the clang thread sanitizer... not the static analysis. The getStats race was actually on the "waitSubmit" member of the struct.

bkaradzic commented 7 years ago

Look code at those call stacks and you'll see that data is not guarded but there is nothing that could go wrong. Anyhow do you know if there is clang annotation to shut it up?

volcoma commented 7 years ago

I believe this is : attribute((no_sanitize("thread"))) https://clang.llvm.org/docs/ThreadSanitizer.html

Well it is "undefined behavior" and not safe :) as it can mess up the memory with the possibility of a crash down the line. Although one of the data races fix is just moving the "m_singleThreaded = false" before the thread.init.

The one I'm worried most about is this one. Context::flip and Context::swap are executed on different threads and possibly touching the same memory as the sanitizer pointed out "m_render" member variable is read by flip while bx::xchg(m_render, m_submit); is happening.

You should try out the clang thread sanitizer as it catches other things like usage of heap after delete or accessing out of bounds memory. Maybe run the examples with it as they cover most of the usages of bgfx. The sanitizer is pretty easy to setup - it's just a flag to the compiler. there is a cmake that checks availability - https://github.com/KDE/extra-cmake-modules/blob/master/modules/ECMEnableSanitizers.cmake

https://api.kde.org/ecm/module/ECMEnableSanitizers.html