RenderKit / ospray

An Open, Scalable, Portable, Ray Tracing Based Rendering Engine for High-Fidelity Visualization
http://ospray.org
Apache License 2.0
997 stars 182 forks source link

Segfault when getting device version number more than once #400

Closed paulmelis closed 4 years ago

paulmelis commented 4 years ago

Found while adding some version info calls to ospray-python:

#include <ospray/ospray_cpp.h>

void f()
{
    ospray::cpp::Device device(ospGetCurrentDevice());

    printf("%d %d %d\n", 
        ospDeviceGetProperty(device.handle(), OSP_DEVICE_VERSION_MAJOR),
        ospDeviceGetProperty(device.handle(), OSP_DEVICE_VERSION_MINOR),
        ospDeviceGetProperty(device.handle(), OSP_DEVICE_VERSION_PATCH));
}

int main(int argc, const char *argv[])
{
    ospInit(&argc, argv);

    f();
    f();
}
melis@juggle 21:14:~/concepts/ospray-python$ ./t_device_version 
2 0 1
Segmentation fault (core dumped)
melis@juggle 21:14:~/concepts/ospray-python$ gdb ./t_device_version core.8692 
GNU gdb (GDB) 8.3.1
...
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
Core was generated by `./t_device_version'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f909596aa96 in ospDeviceGetProperty (_device=0x55fbf8b84420, _deviceProperty=OSP_DEVICE_VERSION_PATCH) at /home/melis/c/ospray-git/ospray/api/API.cpp:165
165   return device->getProperty(_deviceProperty);
/usr/lib/../share/gcc-9.2.0/python/libstdcxx/v6/xmethods.py:731: SyntaxWarning: list indices must be integers or slices, not str; perhaps you missed a comma?
  refcounts = ['_M_refcount']['_M_pi']
(gdb) bt
#0  0x00007f909596aa96 in ospDeviceGetProperty (_device=0x55fbf8b84420, _deviceProperty=OSP_DEVICE_VERSION_PATCH) at /home/melis/c/ospray-git/ospray/api/API.cpp:165
#1  0x000055fbf73a4203 in f() ()
#2  0x000055fbf73a42c7 in main ()
(gdb) 
jeffamstutz commented 4 years ago

This one is subtle, which I'm not surprised as soon as I saw it: construction of a C++ wrapper from a handle is a "move" operation, which enables applications to progressively move from the C API to the C++ API. However, this can be surprising when you can't get the current device from the C++ API! :)

Thus the problem is that the cpp::Device going out of scope releases the underlying handle, which then ends up freeing the device. The workaround for now is to ospRetain() the device handle before handing it to the C++ wrapper. The long term solution is to have C++ coverage of getting the current device without needing to use the C API (probably with something like cpp::Device::current()).

paulmelis commented 4 years ago

I had a feeling it was something like that, but then I don't understand that changing main() to this doesn't avoid the segfault (as this was an earlier test I did before submitting this issue):

int main(int argc, const char *argv[])
{
    ospInit(&argc, argv);

    ospray::cpp::Device device_in_main(ospGetCurrentDevice());

    f();
    f();
}

Shouldn't the extra C++ wrapper of the current device work the same as an extra ospRetain(), as device_in_main will outlive the temporary references used in the function calls? The segfault still happens in one of the function calls:

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7f7ba96 in ospDeviceGetProperty (_device=0x555555580420, _deviceProperty=OSP_DEVICE_VERSION_PATCH) at /home/melis/c/ospray-git/ospray/api/API.cpp:165
165   return device->getProperty(_deviceProperty);
(gdb) bt
#0  0x00007ffff7f7ba96 in ospDeviceGetProperty (_device=0x555555580420, _deviceProperty=OSP_DEVICE_VERSION_PATCH) at /home/melis/c/ospray-git/ospray/api/API.cpp:165
#1  0x0000555555555203 in f () at t_device_version.cpp:7
#2  0x00005555555552ee in main (argc=1, argv=0x7fffffffe218) at t_device_version.cpp:20
jeffamstutz commented 4 years ago

Your reply helps with clarity of what the right answer should be: ospGetCurrentDevice() should do an ospDeviceRetain() on the returned device handle. Here's why...

Devices are special objects in that there is a global ref pointer holding the "current" device, which after ospInit() get's called ends up being a single instance of a device that has a ref count of 1. When this ref pointer goes out of scope (app exit) or gets cleared with ospShutdown(), everything is fine: the ref count is decremented and the 0 triggers the object to be freed. However, ospGetCurrentDevice() gets you the handle, but does not participate in the object lifetime (i.e. doesn't mutate the ref count). Thus when the C++ wrapper takes "ownership" of the handle, the ref count remains 1, but now both the C++ wrapper and global current device ref pointer both think they need to decrement this count, which the 2nd one ends up doing so on freed memory. (doh!)

The C++ wrapper interface is incomplete because it assumes that it either consumes a C created handle (via ospNewDevice()) or creates the device itself via the string constructor. Thus there's a gap when you want to use the C++ wrapper with ospInit()/ospGetCurrentDevice(), which I think should trigger a ref count increment to make sure that device handle remains valid as long as the application is holding on to it (like all other handles).

Whew! Anyway, at least the answer is straightforward to put into v2.0.1. :)

jeffamstutz commented 4 years ago

FYI, I implemented this in 79f1ccf4b. (I also immediately fixed the doc typo too, it should read ospDeviceRelease(), not ospRelease())

paulmelis commented 4 years ago

construction of a C++ wrapper from a handle is a "move" operation, which enables applications to progressively move from the C API to the C++ API

Wow, I fully expected the C++ wrappers to always do an incref on the handle passed to them when being constructed. Reason for that is that I noticed the ospRelease in ~ManagedObject(). But I did not notice the absence of ospRetain() in ManagedObject(HANDLE_T object). That asymmetry is a bit nasty, although I get the point of transitioning from C to C++ API. But then again, adding an extra ospRelease() after constructing a C++ wrapper would not be so bad either, as a temporary measure. And this only really applies to manually wrapping a reference with a C++ object, right? Regular object construction in C++, e.g. ospray::cpp::Camera("perspective"), hides all of the reference machinery anyway.

What would also help is to clarify in the docs what kind of reference certain functions return. The Python C API docs are pretty good in this respect, as they describe the differences between new references (that one gets ownership of and should decref when done) versus borrowed references (that you should not do a decref on). So ospGetCurrentDevice() could be flagged as returning a borrowed reference, while e.g. ospNewCamera() returns a new reference, etc.

Ah, I now see you added exactly such a description in 79f1ccf :smile:

jeffamstutz commented 4 years ago

Wow, I fully expected the C++ wrappers to always do an incref on the handle passed to them when being constructed. ... And this only really applies to manually wrapping a reference with a C++ object, right?

Yes, this is true when you copy construct from an existing C++ wrapper, but not when constructing the wrapper from a C handle. Mixing the C and C++ APIs has gotchas either way: leaks or extra frees. We should probably write more explicit documentation on the C++ wrappers intent and expected usage.

You are correct, though, that more comprehensive documentation about handle lifetimes should exist throughout our docs. Specifically, the missing items are:

jeffamstutz commented 4 years ago

I think having wrappers do an ospRetain() on construction from an existing handle is growing on me, especially since you point out that it would only be temporary anyway.

paulmelis commented 4 years ago

I think having wrappers do an ospRetain() on construction from an existing handle is growing on me, especially since you point out that it would only be temporary anyway.

Well, that's an assumption on my part: that the C++ API will become the main one to use, with the C API built on top of it (somewhat flipping the current situation). But if (parts of) the C API will need to be used in ospray applications for certain operations for the foreesable future then "temporary" can be a long time...

jeffamstutz commented 4 years ago

The full intention is to have everything the C API can do exposed in the C++ API (except for ospInit/ospShutdown), so the lack of a static cpp::Device::current() method is an anomaly. Given that it's a feature, we will add the missing method to v2.1.

I'll still think about it more, but at least for now (and v2.0.1) the current semantics will remain.