PixarAnimationStudios / OpenUSD

Universal Scene Description
http://www.openusd.org
Other
6.05k stars 1.2k forks source link

Segmentation fault in usdview when removing a prim from another thread #3358

Open nukep opened 1 day ago

nukep commented 1 day ago

Description of Issue

If a prim is created and removed on a layer from another thread in a very short amount of time, this can cause a segmentation fault in usdview. I've observed this happening for longer durations as well, but it's harder to reproduce.

I believe this is a bug because this page suggests that writes to a stage are thread-safe: https://openusd.org/dev/api/_usd__page__multi_threading.html

Depending on chance, I sometimes see the following message logged in the terminal:

ERROR: Usdview encountered an error while rendering.Used null prim - UsdExpiredPrimAccessError thrown:
 -> Usd_ThrowExpiredPrimAccessError at /home/dan/VFXPlatform/repos/openusd-v24.08/pxr/usd/usd/primData.cpp:264 from
     #0   0x00007f98fe26e322 in pxrInternal_v0_24__pxrReserved__::Usd_ThrowExpiredPrimAccessError(pxrInternal_v0_24__pxrReserved__::Usd_PrimData const*)+0xc2
     #1   0x00007f98b9188c83 in pxrInternal_v0_24__pxrReserved__::UsdImagingDelegate::_AdapterLookup(pxrInternal_v0_24__pxrReserved__::UsdPrim const&, bool)+0x143
     #2   0x00007f98b91dfdbd in pxrInternal_v0_24__pxrReserved__::UsdImagingIndexProxy::_AddHdPrimInfo(pxrInternal_v0_24__pxrReserved__::SdfPath const&, pxrInternal_v0_24__pxrReserved__::UsdPrim const&, shared_ptr<pxrInternal_v0_24__pxrReserved__::UsdImagingPrimAdapter> const&)+0x96d
     #3   0x00007f98b91e0825 in pxrInternal_v0_24__pxrReserved__::UsdImagingIndexProxy::InsertRprim(pxrInternal_v0_24__pxrReserved__::TfToken const&, pxrInternal_v0_24__pxrReserved__::SdfPath const&, pxrInternal_v0_24__pxrReserved__::UsdPrim const&, shared_ptr<pxrInternal_v0_24__pxrReserved__::UsdImagingPrimAdapter>)+0x25
     #4   0x00007f98b9246407 in pxrInternal_v0_24__pxrReserved__::UsdImagingGprimAdapter::_AddRprim(pxrInternal_v0_24__pxrReserved__::TfToken const&, pxrInternal_v0_24__pxrReserved__::UsdPrim const&, pxrInternal_v0_24__pxrReserved__::UsdImagingIndexProxy*, pxrInternal_v0_24__pxrReserved__::SdfPath const&, pxrInternal_v0_24__pxrReserved__::UsdImagingInstancerContext const*)+0xf7
     #5   0x00007f98b922bd57 in pxrInternal_v0_24__pxrReserved__::UsdImagingCubeAdapter::Populate(pxrInternal_v0_24__pxrReserved__::UsdPrim const&, pxrInternal_v0_24__pxrReserved__::UsdImagingIndexProxy*, pxrInternal_v0_24__pxrReserved__::UsdImagingInstancerContext const*)+0x67
     #6   0x00007f98b918b890 in pxrInternal_v0_24__pxrReserved__::UsdImagingDelegate::_Populate(pxrInternal_v0_24__pxrReserved__::UsdImagingIndexProxy*)+0xac0
     #7   0x00007f98b918cec7 in pxrInternal_v0_24__pxrReserved__::UsdImagingDelegate::ApplyPendingUpdates()+0x507
     #8   0x00007f98e2bc46ba in pxrInternal_v0_24__pxrReserved__::UsdImagingGLEngine::_PreSetTime(pxrInternal_v0_24__pxrReserved__::UsdImagingGLRenderParams const&)+0x22a
     #9   0x00007f98e2bc931e in pxrInternal_v0_24__pxrReserved__::UsdImagingGLEngine::PrepareBatch(pxrInternal_v0_24__pxrReserved__::UsdPrim const&, pxrInternal_v0_24__pxrReserved__::UsdImagingGLRenderParams const&)+0x19e
     #10  0x00007f98e2bcbd9d in pxrInternal_v0_24__pxrReserved__::UsdImagingGLEngine::Render(pxrInternal_v0_24__pxrReserved__::UsdPrim const&, pxrInternal_v0_24__pxrReserved__::UsdImagingGLRenderParams const&)+0x3d
     #11  0x00007f98bccdf362 in boost::python::detail::caller_arity<3u>::impl<void (pxrInternal_v0_24__pxrReserved__::UsdImagingGLEngine::*)(pxrInternal_v0_24__pxrReserved__::UsdPrim const&, pxrInternal_v0_24__pxrReserved__::UsdImagingGLRenderParams const&), boost::python::default_call_policies, boost::mpl::vector4<void, pxrInternal_v0_24__pxrReserved__::UsdImagingGLEngine&, pxrInternal_v0_24__pxrReserved__::UsdPrim const&, pxrInternal_v0_24__pxrReserved__::UsdImagingGLRenderParams const&> >::operator()(_object*, _object*)+0xf2
     #12  0x00007f990cfe2565 in boost::python::objects::function::call(_object*, _object*) const+0x275
     #13  0x00007f990cfe752b in boost::python::detail::exception_handler::operator()(boost::function0<void> const&) const+0x4b
     #14  0x00007f98ff32b498 in boost::detail::function::function_obj_invoker2<boost::_bi::bind_t<bool, boost::python::detail::translate_exception<pxrInternal_v0_24__pxrReserved__::TfBaseException, void (*)(pxrInternal_v0_24__pxrReserved__::TfBaseException const&)>, boost::_bi::list3<boost::arg<1>, boost::arg<2>, boost::_bi::value<void (*)(pxrInternal_v0_24__pxrReserved__::TfBaseException const&)> > >, bool, boost::python::detail::exception_handler const&, boost::function0<void> const&>::invoke(boost::detail::function::function_buffer&, boost::python::detail::exception_handler const&, boost::function0<void> const&)+0x18
     #15  0x00007f990cfe746f in boost::python::handle_exception_impl(boost::function0<void>)+0x2f
    ... 35 more frames
Segmentation fault (core dumped)

Steps to Reproduce

  1. Open usdview with an empty .usda file (i.e. just "#usda 1.0")
  2. In the interpreter, run the following:
from threading import Thread
from time import sleep

layer = usdviewApi.stage.GetEditTarget().GetLayer()

def threaded_function():
    for i in range(500):
        # Create a cube, e.g. /foo0, /foo1, foo2...
        primspec = Sdf.CreatePrimInLayer(layer, f'/foo{i}')
        primspec.typeName = 'Cube'
        primspec.specifier = Sdf.SpecifierDef
        sleep(0.02)
        # Delete the prim
        del layer.rootPrims[f'foo{i}']

# Spawn another thread
Thread(target = threaded_function).start()
  1. After a number of iterations, a segmentation fault occurs.

System Information (OS, Hardware)

Rocky Linux 9.4 (Blue Onyx)

$ uname -a
Linux hobbes 5.14.0-427.37.1.el9_4.x86_64 #1 SMP PREEMPT_DYNAMIC Wed Sep 25 11:51:41 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Package Versions

OpenUSD v24.08: https://github.com/PixarAnimationStudios/OpenUSD/tree/v24.08

Build Flags

cd build
cmake <flags> ..
cmake --build . --config Release -j4

Cmake flags:

-DCMAKE_EXPORT_COMPILE_COMMANDS=1
-DPXR_ENABLE_OSL_SUPPORT=OFF
  -DPXR_ENABLE_PTEX_SUPPORT=ON
  -DPXR_BUILD_OPENIMAGEIO_PLUGIN=ON
  -DPXR_BUILD_OPENCOLORIO_PLUGIN=ON
  -DPXR_ENABLE_OPENVDB_SUPPORT=ON
  -DPXR_BUILD_PRMAN_PLUGIN=ON
  -DPXR_ENABLE_MATERIALX_SUPPORT=ON
  -DRENDERMAN_LOCATION=$RMANTREE
  -DPXR_BUILD_EXAMPLES=ON
jesschimein commented 1 day ago

Filed as internal issue #USD-10295

asluk commented 1 day ago

To clarify the doc you linked, it's not safe to modify the SAME stage from a child thread -- see https://openusd.org/release/api/_usd__page__multi_threading.html

Although it is not possible for multiple threads to simultaneously write "to the same stage", it is safe for different threads to write simultaneously to different stages.

nukep commented 19 hours ago

@asluk In this particular case, only one thread is writing to the layer and nothing else. Because it's not "multiple threads", I believe this is adequate for thread-safety, unless there's some other nuance I'm missing.

I interpreted this document as saying "reading a layer from one thread and writing a layer from another is thread-safe". Is this true?

spiffmon commented 18 hours ago

That is an incorrect interpretation. Just as the stl threading model, even if only a single thread is modifying a (e.g.) std::vector, no other thread can even be reading from that vector concurrently. That's what the "multiple readers or single writer" is meant to convey - we're definitely interested in improving the language if that will help.

In usdview, the main thread may be doing imaging or GUI updating, which in turn will interrogate the stage, thus reading from the layer-being-mutated.

nukep commented 17 hours ago

Thanks for clarifying @spiffmon, that makes sense. I must have interpreted the "multiple readers or single writer" claim as a logical or (multiple_readers || single_writer) instead of a mutually exclusive or. :laughing:

As far as languages goes, I'd maybe suggest describing the restrictive relationship of concurrent read/writes, so something like "You cannot write to a stage or layer while another thread is reading the same stage or layer". And "Multiple threads can read from the same stage or layer as long as no other thread is writing to it".

nukep commented 16 hours ago

To clarify why I filed this issue in the first place: My use case is that I'm interested in authoring the stage asynchronously when a result is "done", started by outside events not triggered by the user.

In usdview I think the UI thread is the rendering thread, so maybe I could put an event in the Qt event loop or something. I suspect there isn't a client-agnostic way of achieving this (for non-Qt DCCs), but if there is, that'd be of interest to me!

meshula commented 16 hours ago

There was a nice presentation at DigiPro this year where Animal Logic tackled the same problem in their Filament application. I am not suggesting that you undertake a similar project in usdview, but I thought their article is a helpful read about the issues involved in keeping a UI responsive under the single-writer/multi-reader paradigm. https://dl.acm.org/doi/abs/10.1145/3665320.3670992

nukep commented 16 hours ago

@meshula Interesting! Thanks, I'll take a look