RenderKit / ospray

An Open, Scalable, Portable, Ray Tracing Based Rendering Engine for High-Fidelity Visualization
http://ospray.org
Apache License 2.0
1k stars 182 forks source link

intermittent crashes on program termination, cleaning up static shared_ptrs #355

Closed demarle closed 4 years ago

demarle commented 5 years ago

We are seeing intermittent crashes on application exit within ospray cleanup that looks like this. pvpython: ../nptl/pthread_mutex_lock.c:433: __pthread_mutex_lock_full: Assertion `INTERNAL_SYSCALL_ERRNO (e, __err) != ESRCH || !robust' failed.

Loguru caught a signal: SIGABRT Stack trace: 12 0x7f32a076d437 /home/kitware/misc/root/ospray-1.8.4/lib64/libospray.so.0(+0x13437) [0x7f32a076d437] 11 0x7f32a7582c07 __cxa_finalize + 247 10 0x7f32a077cd6f std::shared_ptr::~shared_ptr() + 79 9 0x7f328a944009 ospray::api::ISPCDevice::~ISPCDevice() + 9 8 0x7f328a943e6e ospray::api::ISPCDevice::~ISPCDevice() + 46 7 0x7f3288f4435e rtcReleaseDevice + 30 6 0x7f328a0d95be /home/kitware/misc/root/ospray-1.8.4/lib64/libembree3.so.3(+0x12295be) [0x7f328a0d95be] 5 0x7f32a29afd12 /lib64/libpthread.so.0(+0xad12) [0x7f32a29afd12] 4 0x7f32a7578566 /lib64/libc.so.6(+0x30566) [0x7f32a7578566] 3 0x7f32a756a769 /lib64/libc.so.6(+0x22769) [0x7f32a756a769] 2 0x7f32a756a895 abort + 295 1 0x7f32a757fe75 gsignal + 325 0 0x7f32a757ff00 /lib64/libc.so.6(+0x37f00) [0x7f32a757ff00] ( 1.284s) [main thread ] :0 FATL| Signal: SIGABRT

@utkarshayachit is pretty sure that the issue is the use of static shared pointers to a derived class instance here https://github.com/ospray/ospray/blob/master/ospray/api/Device.h#L36 since he's seen that in other code.

The compiler in question is gcc (GCC) 9.1.1 20190503 (Red Hat 9.1.1-1), which is on ParaView's vall regression test machine.

I am looking into other ways to implement the current device ptr and am open to any tips or suggestions.

jeffamstutz commented 4 years ago

I think the easiest strategy is for us to use ospcommon ref counted pointers (the same ones we use for other OSPRay objects) instead of std::shared_ptr<>. We don't need any of the atomic guarantees that std::shared_ptr<> implementations provide here (and are causing the issue?).

I can get this on release-2.0.x today.

jeffamstutz commented 4 years ago

I switched us over to the ospcommon pointer types which should solve this in 01a4e8490 (I wasn't able to reproduce this locally). Please let us know if there continue to be any experienced issues.

jeffamstutz commented 4 years ago

BTW, that's on release-2.0.x.

tachyon-john commented 4 years ago

I'm seeing a similar termination-time crash in VMD with OSPRay 2.1.1, using the Intel precompiled libs, compiled on CentOS 8:

Thread 1 "vmd_LINUXAMD64" received signal SIGSEGV, Segmentation fault. 0x00007f4da2ce357c in std::_Sp_counted_ptr<openvkl::api::Driver*, (__gnu_cxx::_Lock_policy)2>::_M_dispose() () from /usr/local/lib/vmdtest2/libopenvkl.so.0 (gdb) where

0 0x00007f4da2ce357c in std::_Sp_counted_ptr<openvkl::api::Driver*, (__gnu_cxx::_Lock_policy)2>::_M_dispose() () from /usr/local/lib/vmdtest2/libopenvkl.so.0

1 0x00007f4da2ce86d9 in std::shared_ptr::~shared_ptr()

() from /usr/local/lib/vmdtest2/libopenvkl.so.0

2 0x00007f4dec00313c in __run_exit_handlers () from /lib64/libc.so.6

3 0x00007f4dec003270 in exit () from /lib64/libc.so.6

4 0x00007f4debfec87a in __libc_start_main () from /lib64/libc.so.6

5 0x0000000000505c5e in _start ()