RenderKit / ospray

An Open, Scalable, Portable, Ray Tracing Based Rendering Engine for High-Fidelity Visualization
http://ospray.org
Apache License 2.0
1k stars 182 forks source link

Segfault when re-rendering #368

Closed paulmelis closed 4 years ago

paulmelis commented 4 years ago

While trying to debug #367 I hit upon a segfault triggered by ospRenderFrame re-using the same world and framebuffer. Maybe I'm doing something wrong, but I don't see it.

Relevant address sanitizer report:

AddressSanitizer:DEADLYSIGNAL
=================================================================
AddressSanitizer:DEADLYSIGNAL
AddressSanitizerAddressSanitizer:DEADLYSIGNAL
:DEADLYSIGNAL
==9265==ERROR: AddressSanitizer: SEGV on unknown address 0x0000000000e0 (pc 0x7f52d549cf42 bp 0x000000000000 sp 0x7f52cb34cbc0 T7)
AddressSanitizer:DEADLYSIGNAL
AddressSanitizer:DEADLYSIGNAL
==9265==The signal is caused by a READ memory access.
AddressSanitizer:DEADLYSIGNAL
==9265==Hint: address points to the zero page.
    #0 0x7f52d549cf41 in rtcIntersect8 (/home/paulm/software/ospray-superbuild-git/lib/libembree3.so.3+0xb6f41)
    #1 0x7f52d856f6e1 in rtcIntersectV___un_3C_s_5B_unRTCSceneTy_5D__3E_un_3C_s_5B_unRTCIntersectContext_5D__3E_un_3C_s_5B_vyRTCRayHit_5D__3E_avx2 /home/paulm/software/ospray-superbuild-git/include/embree3//rtcore_scene.isph:138
    #2 0x7f52d856f6e1 in traceRay___un_3C_s_5B__c_unWorld_5D__3E_REFs_5B_vyRay_5D_un_3C_unv_3E_avx2 /home/paulm/c/ospray-git/ospray/common//World.ih:41
    #3 0x7f52d856f6e1 in traceRay___un_3C_s_5B__c_unWorld_5D__3E_REFs_5B_vyRay_5D_avx2 /home/paulm/c/ospray-git/ospray/common//World.ih:48
    #4 0x7f52d856f6e1 in SciVis_renderSample___un_3C_s_5B_unRenderer_5D__3E_un_3C_s_5B_unFrameBuffer_5D__3E_un_3C_s_5B_unWorld_5D__3E_un_3C_unv_3E_REFs_5B_vyScreenSample_5D_avx2 /home/paulm/c/ospray-git/ospray/render/scivis//SciVis.ispc:48

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV (/home/paulm/software/ospray-superbuild-git/lib/libembree3.so.3+0xb6f41) in rtcIntersect8
Thread T7 created by T2 here:
    #0 0x7f52dc66b367 in __interceptor_pthread_create /build/gcc/src/gcc/libsanitizer/asan/asan_interceptors.cc:208
    #1 0x7f52dc07a2a0 in tbb::internal::rml::private_server::wake_some(int) (/home/paulm/software/ospray-superbuild-git/lib/libtbb.so+0x232a0)
    #2 0x7f52cce90e7f  (<unknown module>)

Thread T2 created by T0 here:
    #0 0x7f52dc66b367 in __interceptor_pthread_create /build/gcc/src/gcc/libsanitizer/asan/asan_interceptors.cc:208
    #1 0x7f52dc07a2a0 in tbb::internal::rml::private_server::wake_some(int) (/home/paulm/software/ospray-superbuild-git/lib/libtbb.so+0x232a0)
    #2 0x7f52cce9107f  (<unknown module>)

==9265==ABORTING

Here's the file based on ospTutorial.c: code.zip

jeffamstutz commented 4 years ago

Related to our discussion over in #369, the main issue is that you created a world and passed it to ospRenderFrame without committing it first. Embree scene creation is done on commit, not on construction.

Some other notes (more relevant to real apps, these may not actually produce issues in the version you posted):

jeffamstutz commented 4 years ago

I should amend my first bullet: ospCancel is not guaranteed to synchronize with the future...though in some implementations (i.e. MPI) it might.

paulmelis commented 4 years ago

the main issue is that you created a world and passed it to ospRenderFrame without committing it first

Right, so I should just always commit all new objects and I got lucky so far in cases where I wasn't setting any parameters after creating. Got it.

ospCancel does not synchronize with the future, thus you should use ospWait to guarantee that the task has competed

So what exactly does ospCancel guarantee? That the running render has been signaled to cancel, but not that is has been canceled? That could explain the issues I'm seeing as the render thread will stay active while I was not expecting it. The comment in ospray.h is then also a bit misleading as it says

Cancel the given task (may block calling thread)

which to me signals that the render thread will have been canceled when ospCancel returns, as it might involve waiting for it to cancel

paulmelis commented 4 years ago

And would ospWait for OSP_TASK_FINISHED be the right event after a cancelation? Or might it not get there?

jeffamstutz commented 4 years ago

So what exactly does ospCancel guarantee? That the running render has been signaled to cancel, but not that is has been canceled?

You are correct: ospCancel is not a status query (it returns void, not bool) unlike calls like ospIsReady. This is the exact same semantic as the old return value of the progressCallback from OSPRay v1.x used to signal frame cancellation. I think the better mental model is that cancellation is just a shortcut to get the running render task to finish before it completes, not that a frame being cancelled is an explicit state that the application should track. In other words, ospCancel is to stop a long running frame because the application will no longer be interested in the finished result.

The comment "(may block calling thread)" came from previous MPI implementations we had over the summer that just blocked no matter what. We can certainly update the comment to be more clear!

jeffamstutz commented 4 years ago

And would ospWait for OSP_TASK_FINISHED be the right event after a cancelation? Or might it not get there?

Yes: essentially OSP_TASK_FINISHED is our sentinel value for "everything is finished no matter what". In the future, we may add more asynchronous APIs, where OSPFuture may be a handle to other types of async tasks (ospCommit is what we are thinking about here)...so OSP_TASK_FINISHED will always be the most conservative thing to sync with (and is the C++ default value on that parameter if you are not using C).

paulmelis commented 4 years ago

I guess we can close this one