RenderKit / oidn

Intel® Open Image Denoise library
https://www.openimagedenoise.org/
Apache License 2.0
1.78k stars 164 forks source link

Internal amdhip64.dll crash on oidnGetNumPhysicalDevices #174

Closed Nielsbishere closed 1 year ago

Nielsbishere commented 1 year ago

Hi, when testing on a system with an AMD cpu with an integrated GPU and an nvidia GPU, we're seeing a crash on oidnGetNumPhysicalDevices. This crash is present even with v2.0.1; I was hoping the changelog about fixing something with AMD gpus was also true for us.

In production this shows up as the device from LUID function crashing, but when trying to dig deeper by looking at why I hit a roadblock at oidnGetNumPhysicalDevices (because it initializes the context, which crashes). It seems like this function ends up calling amdhip64.dll which causes a crash. I've built this OIDN version from source to see the symbols, but it's also happening in the release.

amdhip64.dll!00007ffd495a8281()    Unknown     
amdhip64.dll!00007ffd492cd29c()    Unknown     
amdhip64.dll!00007ffd490e3f19()    Unknown     
amdhip64.dll!00007ffd4912ad4d()    Unknown     
amdhip64.dll!00007ffd490e3239()    Unknown     
amdhip64.dll!00007ffd492597d4()    Unknown     
amdhip64.dll!00007ffd490ee496()    Unknown     
amdhip64.dll!00007ffd49110e96()    Unknown

at OpenImageDenoise_device_hip.dll!oidn::HIPDevice::getPhysicalDevices() Line 53    
at C:\oidn-2.0.1\devices\hip\hip_device.cpp(53)OpenImageDenoise_device_hip.dll!oidn_init_module_device_hip_v20001() Line 33    
at C:\oidn-2.0.1\devices\hip\hip_module.cpp(33)OpenImageDenoise_core.dll!oidn::ModuleLoader::load(const std::string & name) Line 77    
at C:\oidn-2.0.1\core\module.cpp(77)[Inline Frame] OpenImageDenoise_core.dll!oidn::Context::init::<lambda_0>::operator()() Line 35    
at C:\oidn-2.0.1\core\context.cpp(35)[Inline Frame] OpenImageDenoise_core.dll!std::invoke(oidn::Context::init::<lambda_0> && _Obj) Line 1752    
at C:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Tools\MSVC\14.37.32822\include\type_traits(1752)[Inline Frame] OpenImageDenoise_core.dll!std::call_once(std::once_flag & _Once, oidn::Context::init::<lambda_0> && _Fx) Line 626   
at C:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Tools\MSVC\14.37.32822\include\mutex(626)OpenImageDenoise_core.dll!oidn::Context::init() Line 16    
at C:\oidn-2.0.1\core\context.cpp(16)[Inline Frame] OpenImageDenoise.dll!`anonymous namespace'::initContext() Line 59    
at C:\oidn-2.0.1\api\api.cpp(59)OpenImageDenoise.dll!oidnGetNumPhysicalDevices() Line 120    

Do you know how to acquire amdhip64.dll and amdhip64.pdb, or is this just an internal Windows thing that AMD ships? I've looked at the manual HIP build and found no dll, though it did successfully build and produce a lib (tried turning on/off their option for static libraries too). It also produced the OIDN hip dll. My guess would be that the amdhip64 is incompatible, but without supplying a manual amdhip64.dll we can't prevent fallback to the default (crashing) version. If this is not possible to fix in oidn, I'll have to contact AMD about fixing it in HIP instead.

In release we will be deleting the OpenImageDenoise_device_hip.dll, since HIP itself is not supported properly in Windows anyways. Once we have confirmation from AMD that LUID is supported, we will be enabling it again.

Thanks for the help.

atafra commented 1 year ago

Hi,

amdhip64.dll is part of the AMD GPU driver so it could/should not be shipped separately. You could try to update the AMD driver, maybe that will fix it. Could you share what driver version and AMD CPU are you using?

OIDN doesn't actually support any integrated AMD GPUs so it doesn't even try to use them, it just queries the list of HIP devices, and it seems this is where HIP crashes. The issue seems to be a variation of the bug in the HIP driver that was causing issues before v2.0.1. If you could share the exact AMD CPU model you have, maybe we could add another workaround but this is a fundamental bug in the HIP driver which should be fixed. So probably it would be best if you could contact AMD about this.

Nielsbishere commented 1 year ago

Hi, It seems like I didn't double check our QA to update the driver of the integrated gpu. It seems to work if you manually update the AMD driver (the one shipped with windows doesn't work). The CPU is a Ryzen 5 Pro 5650G and the driver that didn't work was from 30 March 2023 (31.0.12027.9001). When I tested the latest (31.0.21029.1006 released at 14 Aug 2023) it seems to work again. Thanks for the heads up 👍