NVIDIAGameWorks / RayTracingDenoiser

NVIDIA Ray Tracing Denoiser
Other
504 stars 46 forks source link

Does NRDIntegration.hpp require using NRI for the entire engine? #39

Closed code-monkey-101 closed 2 years ago

code-monkey-101 commented 2 years ago

NrdIntegration::Denoise asserts because the given nri::CommandBuffer is not in recording state.

Based on the source code, it seems that you have to call NRI::BeginCommandBuffer to start recording (passing your own descriptor pool, because the one in NrdIntegration is not publicly available). However, if I do that, I'm getting another assert because the wrapped native ID3D12GraphicsCommandList is not closed (since it already has previous commands of my engine in it).

I looked at the sample, which uses NRI for everything.

Does this mean, you can't use NRDIntegration.hpp if you don't want to base your entire engine on NRI?

code-monkey-101 commented 2 years ago

Also, according to the documentation you don't need a command allocator for NRD but CreateCommandBufferD3D12 asserts if you don't assign one.

// Wrap the command buffer
nri::CommandBufferD3D12Desc cmdDesc = {};
cmdDesc.d3d12CommandList = (ID3D12GraphicsCommandList*)&d3d12CommandList;
cmdDesc.d3d12CommandAllocator = nullptr; // Not needed for NRD integration layer

nri::CommandBuffer* cmdBuffer = nullptr;
NRI.CreateCommandBufferD3D12(*nriDevice, cmdDesc, cmdBuffer);
code-monkey-101 commented 2 years ago

There is also an sneaky pitfall that should be mentioned in the documentation which leads to a barrier mismatch:

NrdIntegration::Denoise has a default parameter called enableDescriptorCaching which is true by default. This is a problem when you are recreating the wrapped textures (e.g. when the screen size has changed). The newly wrapped textures can sometimes get the same memory address as one of the older textures. If enableDescriptorCaching is true, NRD will use a cached descriptor for a texture pointer that is now pointing to a different texture.

This can be avoided by passing false for enableDescriptorCaching for the first frame after recreating the wrapped texture.

This pitfall can be reproduced easily (within a couple of frames) if you recreate the wrapped textures every frame. That's probably something that many people will do in the beginning - especially since it's not clear which of the 7 steps have to happen per frame and which have to happen only once.

code-monkey-101 commented 2 years ago

Unless I've made a mistake somewhere, NRD or NRI is also not releasing all references when shutting down.

DXGI WARNING: Live IDXGIFactory at 0x000001EBBB394960, Refcount: 2 [ STATE_CREATION WARNING #0: ]
DXGI WARNING:   Live IDXGIAdapter at 0x000001EBBB39AAB0, Refcount: 1 [ STATE_CREATION WARNING #0: ]
DXGI WARNING: Live                   IDXGIAdapter :      1 [ STATE_CREATION WARNING #0: ]
D3D12 WARNING: Live ID3D12Device at 0x000001EBC61CB460, Refcount: 4 [ STATE_CREATION WARNING #274: LIVE_DEVICE]
D3D12 WARNING:  Live ID3D12CommandQueue at 0x000001EBC6293DB0, Refcount: 1, IntRef: 0 [ STATE_CREATION WARNING #570: LIVE_COMMANDQUEUE]
D3D12 WARNING:  Live ID3D12DescriptorHeap at 0x000001EBC8144440, Refcount: 1, IntRef: 0 [ STATE_CREATION WARNING #576: LIVE_DESCRIPTORHEAP]
D3D12 WARNING: Live             ID3D12CommandQueue :      1 [ STATE_CREATION WARNING #255: LIVE_OBJECT_SUMMARY]
D3D12 WARNING: Live           ID3D12DescriptorHeap :      1 [ STATE_CREATION WARNING #255: LIVE_OBJECT_SUMMARY]
dzhdanNV commented 2 years ago

Does this mean, you can't use NRDIntegration.hpp if you don't want to base your entire engine on NRI?

No. It seems to me that you are trying to use NRDIntegration to render everything. NRDIntegration only "appends" Dispatch calls to already opened CommandList. It doesn't "manage" CommandList by its own.

Also, according to the documentation you don't need a command allocator for NRD but CreateCommandBufferD3D12 asserts if you don't assign one.

Thanks. I will clarify it. It's really not needed, but NRI validation layer requires something assigned.

NrdIntegration::Denoise has a default parameter called enableDescriptorCaching which is true by default. This is a problem when you are recreating the wrapped textures (e.g. when the screen size has changed)

NRD doc says:

NRD doesn’t have a "resize" functionality. On resolution change the old denoiser needs to be destroyed and a new one needs to be created with new parameters.

Unless I've made a mistake somewhere, NRD or NRI is also not releasing all references when shutting down.

You have made a mistake, unfortunately:

code-monkey-101 commented 2 years ago

Thank you for the quick response dzhdanNV!

Let me reply to your statements separately (have to do a bit of investigation, so it might take me a while).

NrdIntegration::Denoise has a default parameter called enableDescriptorCaching which is true by default. This is a problem when you are recreating the wrapped textures (e.g. when the screen size has changed)

NRD doc says:

NRD doesn’t have a "resize" functionality. On resolution change the old denoiser needs to be destroyed and a new one needs to be created with new parameters.

I did see that but for some reason I remembered it as "it's fine to call Initialize multiple times" which is clearly not what it says.

So it's partly my fault but you will still run into this if you recreate the texture wrappers every frame. It would be great if you could at least clarify which of the seven steps must be done per frame and which ones are one-time only. It's also not clear if you have to wrap the command buffer every frame. You need at least as many command allocators as there are frames in flight, so usually, you need at least 2 or 3 wrapped command buffers. On the other hand, the command allocator is ignored, so maybe a single one is fine?

code-monkey-101 commented 2 years ago

Does this mean, you can't use NRDIntegration.hpp if you don't want to base your entire engine on NRI?

No. It seems to me that you are trying to use NRDIntegration to render everything.

I'm not trying to do that.

NRDIntegration only "appends" Dispatch calls to already opened CommandList. It doesn't "manage" CommandList by its own.

Yes, I understand. Maybe this is more of a NRI question than a NRD question.

I'm getting [nri(D3D12)::ERROR] -- Can't set descriptor pool: the command buffer must be in the recording state.

This is the call stack

    KernelBase.dll!00007ffb2c6c90f2()   Unknown
>   NRI.dll!AbortExecution(void * userArg) Line 446 C++
    NRI.dll!Log::ReportMessage(nri::Message message, const char * format, ...) Line 63  C++
    NRI.dll!nri::CommandBufferVal::SetDescriptorPool(const nri::DescriptorPool & descriptorPool) Line 263   C++
    NRI.dll!CmdSetDescriptorPool(nri::CommandBuffer & commandBuffer, const nri::DescriptorPool & descriptorPool) Line 46    C++
    CoreLib3.dll!NrdIntegration::Denoise(unsigned int consecutiveFrameIndex, nri::CommandBuffer & commandBuffer, const nrd::CommonSettings & commonSettings, const std::array<NrdIntegrationTexture,26> & userPool, bool enableDescriptorCaching) Line 384  C++
    CoreLib3.dll!NRDComponentImpl::RenderCallback(Renderer & renderer) Line 481 C++
    CoreLib3.dll!`NRDComponentImpl::Render'::`2'::<lambda_1>::operator()(Renderer & renderer) Line 422  C++
    CoreLib3.dll!std::invoke<`NRDComponentImpl::Render'::`2'::<lambda_1> &,Renderer &>(NRDComponentImpl::Render::__l2::<lambda_1> & _Obj, Renderer & _Arg1) Line 1503   C++
    CoreLib3.dll!std::_Invoker_ret<void,1>::_Call<`NRDComponentImpl::Render'::`2'::<lambda_1> &,Renderer &>(NRDComponentImpl::Render::__l2::<lambda_1> & _Func, Renderer & <_Vals_0>) Line 665  C++
    CoreLib3.dll!std::_Func_impl_no_alloc<`NRDComponentImpl::Render'::`2'::<lambda_1>,void,Renderer &>::_Do_call(Renderer & <_Args_0>) Line 834 C++
    CoreLib3.dll!std::_Func_class<void,Renderer &>::operator()(Renderer & <_Args_0>) Line 881   C++
    CoreLib3.dll!SimpleRenderer::Execute(const CommandList::CallbackEntry & command) Line 1042  C++
    CoreLib3.dll!SimpleRenderer::Execute(const std::shared_ptr<CommandList> & commandList) Line 154 C++
    RaytracingInSixteenMs.exe!GraphicsEngine::OnMessage(const Render & msg) Line 566    C++
    RaytracingInSixteenMs.exe!std::invoke<bool (__cdecl GraphicsEngine::*&)(Render const &),GraphicsEngine * &,Render const &>(bool(GraphicsEngine::*)(const Render &) & _Obj, GraphicsEngine * & _Arg1, const Render & <_Args2_0>) Line 1503   C++
    RaytracingInSixteenMs.exe!std::_Invoker_ret<std::_Unforced,0>::_Call<bool (__cdecl GraphicsEngine::*&)(Render const &),GraphicsEngine * &,Render const &>(bool(GraphicsEngine::*)(const Render &) & _Func, GraphicsEngine * & <_Vals_0>, const Render & <_Vals_1>) Line 684 C++
    RaytracingInSixteenMs.exe!std::_Call_binder<std::_Unforced,0,1,bool (__cdecl GraphicsEngine::*)(Render const &),std::tuple<GraphicsEngine *,std::_Ph<1>>,std::tuple<Render const &>>(std::_Invoker_ret<std::_Unforced,0> __formal, std::integer_sequence<unsigned __int64,0,1> __formal, bool(GraphicsEngine::*)(const Render &) & _Obj, std::tuple<GraphicsEngine *,std::_Ph<1>> & _Tpl, std::tuple<Render const &> && _Ut) Line 1962  C++
    RaytracingInSixteenMs.exe!std::_Binder<std::_Unforced,bool (__cdecl GraphicsEngine::*&)(Render const &),GraphicsEngine * &,std::_Ph<1> const &>::operator()<Render const &>(const Render & <_Unbargs_0>) Line 1999  C++
    RaytracingInSixteenMs.exe!std::invoke<std::_Binder<std::_Unforced,bool (__cdecl GraphicsEngine::*&)(Render const &),GraphicsEngine * &,std::_Ph<1> const &> &,Render const &>(std::_Binder<std::_Unforced,bool (__cdecl GraphicsEngine::*&)(Render const &),GraphicsEngine * &,std::_Ph<1> const &> & _Obj, const Render & _Arg1) Line 1503 C++
    RaytracingInSixteenMs.exe!std::_Invoker_ret<bool,0>::_Call<std::_Binder<std::_Unforced,bool (__cdecl GraphicsEngine::*&)(Render const &),GraphicsEngine * &,std::_Ph<1> const &> &,Render const &>(std::_Binder<std::_Unforced,bool (__cdecl GraphicsEngine::*&)(Render const &),GraphicsEngine * &,std::_Ph<1> const &> & _Func, const Render & <_Vals_0>) Line 674    C++
    RaytracingInSixteenMs.exe!std::_Func_impl_no_alloc<std::_Binder<std::_Unforced,bool (__cdecl GraphicsEngine::*&)(Render const &),GraphicsEngine * &,std::_Ph<1> const &>,bool,Render const &>::_Do_call(const Render & <_Args_0>) Line 834  C++
    RaytracingInSixteenMs.exe!std::_Func_class<bool,Render const &>::operator()(const Render & <_Args_0>) Line 881  C++
    RaytracingInSixteenMs.exe!Messaging::Register::__l2::<lambda>(const Message & message) Line 107 C++
    RaytracingInSixteenMs.exe!std::invoke<bool <lambda>(const Message &) &,Message const &>(Messaging::Register::__l2::bool <lambda>(const Message &) & _Obj, const Message & _Arg1) Line 1503  C++
    RaytracingInSixteenMs.exe!std::_Invoker_ret<bool,0>::_Call<bool <lambda>(const Message &) &,Message const &>(Messaging::Register::__l2::bool <lambda>(const Message &) & _Func, const Message & <_Vals_0>) Line 674 C++
    RaytracingInSixteenMs.exe!std::_Func_impl_no_alloc<bool <lambda>(const Message &),bool,Message const &>::_Do_call(const Message & <_Args_0>) Line 834   C++
    CoreLib3.dll!std::_Func_class<bool,Message const &>::operator()(const Message & <_Args_0>) Line 881 C++
    CoreLib3.dll!Messaging::Send<Render>(const Render & message) Line 74    C++
    CoreLib3.dll!Window::RunMessagePump() Line 421  C++
    RaytracingInSixteenMs.exe!WinMain(HINSTANCE__ * hInstance, HINSTANCE__ * hPrevInstance, char * lpCmdLine, int nCmdShow) Line 43 C++
    RaytracingInSixteenMs.exe!invoke_main() Line 107    C++
    RaytracingInSixteenMs.exe!__scrt_common_main_seh() Line 288 C++
    RaytracingInSixteenMs.exe!__scrt_common_main() Line 331 C++
    RaytracingInSixteenMs.exe!WinMainCRTStartup(void * __formal) Line 17    C++
    kernel32.dll!00007ffb2d1754e0() Unknown
    ntdll.dll!00007ffb2f0c485b()    Unknown

This is my function which calls NRD::Denoise

void NRDComponentImpl::RenderCallback(Renderer& renderer) const
{
    SimpleRenderer* simpleRenderer = dynamic_cast<SimpleRenderer*>(&renderer);
    if (simpleRenderer == nullptr)
        return;

    const auto renderingWindow = UseRenderingWindow::GetCurrentRenderingWindow();

    // Wrap the command buffer
    nri::CommandBufferD3D12Desc cmdDesc = {};
    cmdDesc.d3d12CommandList = simpleRenderer->GetNativeCommandList().Get();
    cmdDesc.d3d12CommandAllocator = renderingWindow->GetCommandAllocator().Get();   // for the current frame

    // TODO don't do this every frame as soon as simpleRenderer has been fixed to not create a new command list every frame
    nri::CommandBuffer* cmdBuffer = nullptr;
    nri.CreateCommandBufferD3D12(*nriDevice, cmdDesc, cmdBuffer);

    // Specify the current state of the resource here, after denoising NRD can modify
    // this state. Application must continue state tracking from this point.
    // Useful information:
    //    SRV = nri::AccessBits::SHADER_RESOURCE, nri::TextureLayout::SHADER_RESOURCE
    //    UAV = nri::AccessBits::SHADER_RESOURCE_STORAGE, nri::TextureLayout::GENERAL
    mvEntryDesc.nextAccess = ConvertResourceStateToAccessBits(motionVectors);
    mvEntryDesc.nextLayout = ConvertResourceStateToLayout(motionVectors);
    normalRoughnessEntryDesc.nextAccess = ConvertResourceStateToAccessBits(normalRoughness);
    normalRoughnessEntryDesc.nextLayout = ConvertResourceStateToLayout(normalRoughness);
    viewZEntryDesc.nextAccess = ConvertResourceStateToAccessBits(viewZ);
    viewZEntryDesc.nextLayout = ConvertResourceStateToLayout(viewZ);
    diffRadianceHitDistEntryDesc.nextAccess = ConvertResourceStateToAccessBits(diffRadianceHitDist);
    diffRadianceHitDistEntryDesc.nextLayout = ConvertResourceStateToLayout(diffRadianceHitDist);
    outDiffRadianceHitDistEntryDesc.nextAccess = ConvertResourceStateToAccessBits(outDiffRadianceHitDist);
    outDiffRadianceHitDistEntryDesc.nextLayout = ConvertResourceStateToLayout(outDiffRadianceHitDist);

    // Populate common settings
    //  - for the first time use defaults
    //  - currently NRD supports only the following view space: X - right, Y - top, Z - forward or backward
    nrd::CommonSettings commonSettings = {};
    CopyTransposed(nonJitteredProjectionMatrix, commonSettings.viewToClipMatrix);
    CopyTransposed(isCameraCut ? nonJitteredProjectionMatrix : lastNonJitteredProjectionMatrix, commonSettings.viewToClipMatrixPrev);
    CopyTransposed(viewMatrix, commonSettings.worldToViewMatrix);
    CopyTransposed(isCameraCut ? viewMatrix : lastViewMatrix, commonSettings.worldToViewMatrixPrev);
    commonSettings.cameraJitter[0] = jitter.x;
    commonSettings.cameraJitter[1] = jitter.y;
    commonSettings.timeDeltaBetweenFrames = deltaTimeMs;
    commonSettings.frameIndex = static_cast<uint32_t>(renderingWindow->GetCurrentFrameIndex());
    commonSettings.accumulationMode = isCameraCut ? nrd::AccumulationMode::RESTART : nrd::AccumulationMode::CONTINUE;
    //commonSettings.isMotionVectorInWorldSpace = false;

    // Set settings for each denoiser
    nrd::ReblurSettings settings = {};
    nrd->SetMethodSettings(nrd::Method::REBLUR_DIFFUSE, &settings);

    // This is where the denoising happens
    nrd->Denoise(commonSettings.frameIndex, *cmdBuffer, commonSettings, userPool);

    // NRD integration layer binds its own descriptor pool, don't forget to re-bind back your own descriptor pool (heap)
    motionVectors->SetResourceState(ConvertAccessBitsToResourceState(mvEntryDesc.nextAccess));
    normalRoughness->SetResourceState(ConvertAccessBitsToResourceState(normalRoughnessEntryDesc.nextAccess));
    viewZ->SetResourceState(ConvertAccessBitsToResourceState(viewZEntryDesc.nextAccess));
    diffRadianceHitDist->SetResourceState(ConvertAccessBitsToResourceState(diffRadianceHitDistEntryDesc.nextAccess));
    outDiffRadianceHitDist->SetResourceState(ConvertAccessBitsToResourceState(outDiffRadianceHitDistEntryDesc.nextAccess));
    simpleRenderer->SetCurrentStateToUnknown();
    lastViewMatrix = viewMatrix;
    lastNonJitteredProjectionMatrix = nonJitteredProjectionMatrix;

    // TODO don't do this every frame as soon as simpleRenderer has been fixed to not create a new command list every frame
    nri.DestroyCommandBuffer(*cmdBuffer);
}

It makes sense that this assert is happening because NRI::BeginCommandBuffer is not called anywhere but I don't understand how this is meant to work without using NRI for everything. I've modified NRI to make sure m_IsRecordingStarted is set to true when wrapping a command buffer but there must be a better way.

I've attached my NRD integration if you want to have a look. nrdcomponent.zip

code-monkey-101 commented 2 years ago

Unless I've made a mistake somewhere, NRD or NRI is also not releasing all references when shutting down.

You have made a mistake, unfortunately:

* NRD is just a collection of shaders

* NRD integration releases resources in _Destroy_

* NRD sample is validation errors clean (D3D12 and VK)

* a bug in NRI is possible, but very unlikely

Found it. I was missing the nri::DestroyDevice(*nriDevice); call.

The readme is missing it too!

dzhdanNV commented 2 years ago

I'm getting [nri(D3D12)::ERROR] -- Can't set descriptor pool: the command buffer must be in the recording state.

Finally, I got it :D Most likely due to recent changes in NRI, NRI validation has become "smarter" and validates "is recording started?" state. Obviously, it doesn't make sense for wrapped Command Buffers. Please, as a workaround try to run without NRI validation (GAPI validation is fine).

I was missing the nri::DestroyDevice(*nriDevice) call. The readme is missing it too!

Thanks. I will try to improve the doc.

dzhdanNV commented 2 years ago

I have improved the doc based on your feedback. Please, review. What do you think?

code-monkey-101 commented 2 years ago

It's still missing the information that you have to pass false for enableDescriptorCaching when you are recreating the texture wrappers every frame but other than that, I'm happy ;)

Thank you very much for your help. Appreciate it.

dzhdanNV commented 2 years ago

Sorry. I will clarify it in the next release.

dzhdanNV commented 2 years ago

NRI validation layer has been fixed.