RenderKit / oidn

Intel® Open Image Denoise library
https://www.openimagedenoise.org/
Apache License 2.0
1.74k stars 160 forks source link

Unknown error "driver shutting down" raised when exiting application #176

Closed ColinChargyBentley closed 9 months ago

ColinChargyBentley commented 9 months ago

Hi, When my software uses OIDN, upon exiting, an unknown error is raised with error message "driver shutting down" during the call to oidnReleaseDevice. Why is that ? Regards, Colin Chargy

atafra commented 9 months ago

Hi Colin,

What I can say for sure is that this error does not originate inside OIDN, it just passes it along (that's why it's an unknown error). More information is necessary to be able to investigate this further: what kind of device are you releasing (CPU, GPU, which type?), the OS, the system configuration (CPU, GPUs, etc.), driver versions, etc.

Also, what is the frequency of this error? Does it happen every time or only occasionally? Does it happen only on this machine or other similar machines too?

Regards, Attila

ColinChargyBentley commented 9 months ago

Hi Attila, I'm running OIDN under CUDA on a NVIDIA GeForce RTX 2070 (driver version 31.0.15.3734). The CPU is Intel(R) Core(TM) i9-9920X with 32 of RAM installed. The OS is: Edition Windows 11 Enterprise Version 22H2 Installed on ‎6/‎30/‎2023 OS build 22621.2283 Experience Windows Feature Experience Pack 1000.22662.1000.0

Need anything else ? Regards, Colin

atafra commented 9 months ago

Thanks for the details. The error message seems to come from the CUDA runtime. An important detail is that the issue happens upon exiting. Are you calling oidnReleaseDevice when a static variable/object is being destroyed? If that's the case, I think I know what's the problem.

oidnReleaseDevice calls some CUDA runtime functions to clean up resources but it seems that by this time the CUDA runtime has been already unloaded. OIDN has no contA frustrating issue about shared libraries in general is that the unloading order is undefined, so if you're calling functions of some other library in a static variable's destructor, it's not guaranteed that the library will be still loaded at that time. There are basically two robust solutions to this problem:

There might a third option too: with some tricks/hacks it may be possible to change the library unloading order for your application but it would still rely on undefined behavior, and correctness wouldn't be guaranteed.

ColinChargyBentley commented 9 months ago

Hi Attila, I will delete the DeviceRef before closing the app. Could you add a release method to oidn::DeviceRef to avoid using dirty tricks to release them when needed (such as device.~DeviceRef() or oidn::DeviceRef deviceCopy = device; UNUSED(deviceCopy); or the need to store a DeviceRef in another smart pointer. Thanks for the explanation. Regards, Colin

atafra commented 9 months ago

Maybe I misunderstood something but it seems you need both a release() and a retain() in DeviceRef, right?

release() would be roughly equivalent to device.~DeviceRef(), but you can already release it in cleaner and safer way by simply device = nullptr.

Is the intent of oidn::DeviceRef deviceCopy = device; UNUSED(deviceCopy); to call oidnRetainDevice(), to prevent the device to be released?

ColinChargyBentley commented 9 months ago

This is a mistake, the correct code (other than device = nullptr;) would be oidn::DeviceRef deviceCopy = std::move(device); UNUSED(deviceCopy);. Thanks

atafra commented 9 months ago

I could still add release() as an alternative to device = nullptr because standard smart pointers also have this functionality. But you don't need anything else, right?

ColinChargyBentley commented 9 months ago

I'm all right, I wasn't aware of the = nullptr trick.