crud89 / LiteFX

Modern, flexible computer graphics and rendering engine, written in C++23 with support for Vulkan 🌋 and DirectX 12 ❎.
https://litefx.crudolph.io/
MIT License
84 stars 7 forks source link

Release builds break D3D12 ray tracing samples. #136

Open crud89 opened 3 months ago

crud89 commented 3 months ago

Describe the Bug

When building in release profile, the D3D12 ray-tracing samples BLAS appears to be broken. This might be caused by compaction and only affects the DirectX 12 backend.

Tractorou24 commented 2 months ago

It looks like an MSVC error since clang compiles without issue:
error C2146: syntax error: missing '>' before identifier '__A0'
error C2511: 'const LiteFX::Rendering::IImage &LiteFX::Rendering::Backends::VulkanFrameBuffer::operator [](unknown-type) const': overloaded member function not found in 'LiteFX::Rendering::Backends::VulkanFrameBuffer'

__A0 is a reserved identifier, certainly used in compilation. It may be fixed in a later update. If we find a small repro, it can also be reported.

crud89 commented 2 months ago

Maybe I should have added a little bit more context to this issue, but I guess those observations are different from what I was experiencing. How did you get to those compiler errors?

The problem I am facing with this issue is that ray tracing under the DirectX backend breaks in different ways, but only in release builds. When I first encountered this problem, the BLAS geometry appeared broken, so I created this issue with the intend to take a look at it later. A few days ago, I tried to debug the RT samples by comparing recorded commands between release and debug builds using NSight and PIX. Weirdly enough, the RT sample started working again (without any changes), but in the ray queries sample, no geometry is rendered. The recorded commands in NSight are the same between debug and release builds and I am able to view the whole TLAS/BLAS structure. Debugging the pixel shader in PIX, however, always results in a miss detected on first chance (i.e. no hit detected at all).

This is where I left it, but with the clang support you provided in the PR, I can see that release and debug builds both work. I am still unsure why this happens (again, this only happens with the DirectX backend), but I will try to run this through an MSVC release build with your changes to see if it gets fixed. If not, my current hope is to catch this when writing RT tests for #129.

Tractorou24 commented 2 months ago

I had those errors compiling with MSVC in release mode... Tested it with the latest 17.11 VS preview version.

crud89 commented 2 months ago

I've looked into the issue, but it appears unrelated. It is fixed when removing the covariance from the return value of the operator[] and image functions of the FrameBuffer implementations (only the ones that take a StringView are affected):

//inline const IVulkanImage& operator[](StringView renderTargetName) const override {
inline const IImage& operator[](StringView renderTargetName) const override {
    return this->resolveImage(hash(renderTargetName));
}

//inline const IVulkanImage& image(StringView renderTargetName) const override {
inline const IImage& image(StringView renderTargetName) const override {
    return this->resolveImage(hash(renderTargetName));
}

This indeed appears like a regression with the compiler. I will see if I can create a MRE from it and report it. ~As for the engine, I will try to move this to the base class and see if it resolves the issue.~ (it doesn't)

Update: This was indeed a separate issue, introduced by Visual Studio 17.11 and fixed in 17.12 or 17.11.5.

Nevertheless, ray queries are still broken. I've managed to create two identical PIX captures, one from the release and the other one from the debug builds. Both use the same shaders (debug version, as they retain the PDBs). I've spent quite some time investigating differences, but couldn't find any. I've tested the samples on a 2080 Ti, where RT works, but RQs always return COMMITTED_NOTHING. On my 4080, however, both ray tracing samples show invalid geometry. I suspect this is somehow related to how acceleration structures are built in the D3D backend.

Interestingly RelWithDebInfo also works, so just MSVC release is broken. Also I've tested RT validation with nvapi, but nothing gets reported there either. I suspect we're hitting some UB here... My guess would be some lambda capture, as I ran into this before. On the other hand, PIX captures are unremarkable in this regard.