RenderKit / oidn

Intel® Open Image Denoise library
https://www.openimagedenoise.org/
Apache License 2.0
1.78k stars 164 forks source link

Unable to use buffers malloced with HIP with an HIP device #224

Closed TomClabault closed 2 months ago

TomClabault commented 3 months ago

I'm trying to use a buffer allocated with HIP as the input to my RT filter. However, doing so throws a

Exception thrown at 0x00007FFEC8B4543C in PathTracer.exe: Microsoft C++ exception: oidn::Exception at memory location 0x0000008644AFE660.

exception when calling setImage("color", hipMalloced, ...);

oidn::DeviceRef device;
device = oidn::newDevice(oidn::DeviceType::HIP);
device.commit();

hipDeviceptr_t hipMallocedInput;
hipDeviceptr_t hipMallocedOutput;
hipMalloc(&hipMallocedInput, sizeof(ColorRGB) * 1280 * 720);
hipMalloc(&hipMallocedOutput, sizeof(ColorRGB) * 1280 * 720);

oidn::FilterRef filter = device.newFilter("RT");
filter .setImage("color", hipMallocedInput, oidn::Format::Float3, 1280, 720);
filter .setImage("output", hipMallocedOutput, oidn::Format::Float3, 1280, 720);
filter .commit();

Allocating the HIP buffers with hipMallocManagedyields the same kind of exception.

The only way I found to have denoising work on an HIP device is by using oidn::Buffer for both the input and output to the filter. This however requires twice the amount of VRAM: once for the oidn::Buffer and once for the HIP buffer that use in my application. I also have to copy the data from my HIP buffer to the oidn::Buffer before denoising which is expensive.

Am I missing something in the usage of the API?

I'm running Windows 11 on an AMD RX 7900XTX (Adrenalin drivers 24.5.1) using the 2.3.0 beta release binaries (same issue with the 2.2.0).

atafra commented 3 months ago

This is very strange for two reasons.

First, I don't understand how an oidn::Exception isn't handled, regardless of the reason. Exceptions are thrown only inside OIDN and are also caught internally, so this should never happen. oidn::Exception is derived from std::exception. Could you try to catch it and print the exception string? But again, no exception should leave OIDN.

Second, passing pointers allocated with hipMalloc should work fine. I tried it on Linux and it worked without issues. I haven't tried it on Windows yet.

Could you please share the full code of a minimal reproducer?

atafra commented 3 months ago

The error message you get is also a bit strange to me. It's Exception thrown but if I have an unhandled exception I get an Unhandled exception error. Did you set up the debugger to break on any exception, even if it's handled?

You should let the exception handled and query the error reported by OIDN to find out what's the issue.

TomClabault commented 3 months ago

This is my minimal reproducer code:

#include <iostream>

#include <Orochi/Orochi.h>
#include <OpenImageDenoise/oidn.hpp>

struct ColorRGB
{
    float r, g, b;
};

int main(int argc, char* argv[])
{
    oroInitialize((oroApi)(ORO_API_HIP), 0);

    oidn::DeviceRef device = oidn::newDevice(oidn::DeviceType::HIP);
    device.commit();

    hipDeviceptr_t hipMallocedInput;
    hipDeviceptr_t hipMallocedOutput;
    hipMallocManaged(&hipMallocedInput, sizeof(ColorRGB) * 1280 * 720, hipMemAttachGlobal);
    hipMallocManaged(&hipMallocedOutput, sizeof(ColorRGB) * 1280 * 720, hipMemAttachGlobal);

    oidn::FilterRef m_beauty_filter = device.newFilter("RT");
    m_beauty_filter.setImage("color", hipMallocedInput, oidn::Format::Float3, 1280, 720);
    m_beauty_filter.setImage("output", hipMallocedOutput, oidn::Format::Float3, 1280, 720);
    m_beauty_filter.commit();

    const char* errorMessage;
    if (device.getError(errorMessage) != oidn::Error::None)
        std::cout << errorMessage << std::endl;

    return 0;
}

The printed device error is an oidn::Error::InvalidArgument with message image data not accessible by the device, please use OIDNBuffer or device allocator for storage.

The important point is that it uses Orochi to load HIP dynamically at runtime. I cannot reproduce my issue in a standard HIP setup.

Even though this seems to be an Orochi issue, do you perhaps have some insights on what could go wrong? What could be the issue with dynamically loaded binaries vs. linking at compile time?

The Exception thrown message comes from the output window of Visual Studio. The exception is indeed caught in OIDN and does not break the debugger.

image

atafra commented 3 months ago

Thanks, this is very useful. I will give it a try but it doesn't seem that Orochi should make any difference.

Do you have a single AMD GPU in the machine (so no AMD iGPU, only the discrete one)?

TomClabault commented 3 months ago

Do you have a single AMD GPU in the machine (so no AMD iGPU, only the discrete one)?

Correct.

I also tried the code I posted above on Ubuntu 22.04 and got the same error as on Windows.

atafra commented 3 months ago

I tried your code on Ubuntu 22.04, Radeon RX 7900 XT, ROCm 5.7.3, latest Orochi main, and OIDN 2.3.0-beta official binaries but unfortunately I cannot reproduce the issue. There are no errors reported.

The error which you get suggests that HIP cannot identify the pointer you passed as a device pointer.

atafra commented 3 months ago

What ROCm version are you using?

TomClabault commented 3 months ago

I tried again with a fresh clone of Orochi main and could still reproduce the issue on both Windows (ROCm 5.7) and Ubuntu 22.04 (ROCm 6.1.1). Maybe the error is in my workflow?

I cloned Orochi and modified Orochi/UnitTest/premake5.lua by adding the includedirs { "../", "../OIDN/include" } and libdirs { "../OIDN/lib" } directives. The OIDN folder is at the root of the cloned Orochi repo and contains the lib and include folders from the 2.3.0 beta release binaries.

I then pasted my "minimal reproducer code" above in Orochi/UnitTest/main.cpp and configured/compiled the project from the Orochi root folder:

./tools/premake5/linux/premake5 gmake
make -j
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/tom/Desktop/Orochi/OIDN/lib ./dist/bin/Debug/Unittest64D

>

image data not accessible by the device, please use OIDNBuffer or device allocator for storage

What can I try next?

atafra commented 3 months ago

I managed to track down the issue: it is caused by using two different HIP runtimes in the same application: version 5 and 6. OIDN is currently built with ROCm 5 and if you build your application also with ROCm 5, there is no error. However, Orochi loads HIP runtime v6 by default, which it seems cannot be mixed with v5 in the same application. Since this is a limitation of HIP, nothing can be done on the OIDN side but I see two options:

  1. Rebuild OIDN with ROCm 6. Unfortunately this is currently possible only on Linux because there is still no HIP SDK 6 for Windows.
  2. Modify Orochi to always load HIP runtime v5 instead of v6. You need to comment out some lines here: https://github.com/GPUOpen-LibrariesAndSDKs/Orochi/blob/58db9c18b952f5148f0e4465a86a66e487d50525/contrib/hipew/src/hipew.cpp#L655 https://github.com/GPUOpen-LibrariesAndSDKs/Orochi/blob/58db9c18b952f5148f0e4465a86a66e487d50525/contrib/hipew/src/hipew.cpp#L673 https://github.com/GPUOpen-LibrariesAndSDKs/Orochi/blob/58db9c18b952f5148f0e4465a86a66e487d50525/contrib/hipew/src/hipew.cpp#L631 https://github.com/GPUOpen-LibrariesAndSDKs/Orochi/blob/58db9c18b952f5148f0e4465a86a66e487d50525/contrib/hipew/src/hipew.cpp#L635 I confirmed that this works.
TomClabault commented 3 months ago

And I can also confirm that this works! Forcing the loading of HIP runtime v5 by commenting the lines you gave did the job and works fine on both Windows and Linux.

Thanks for your time!