RobertBeckebans / RBDOOM-3-BFG

Doom 3 BFG Edition source port with updated DX12 / Vulkan renderer and modern game engine features
https://www.moddb.com/mods/rbdoom-3-bfg
GNU General Public License v3.0
1.47k stars 253 forks source link

HiZ Buffer issues on Integrated GPUs (Intel, Apple Silicon) #940

Open SRSaunders opened 1 week ago

SRSaunders commented 1 week ago

When the Hierarchical Depth Buffer is enabled (either for the old SSAO, or the new Filmic Post FX passes), I see problems on Integrated GPUs as follows:

  1. On the master branch when running Vulkan on Linux + Intel iGPU: Game crashes after loading a level with a device lost error.
  2. On the push constants branch when running Vulkan on M1 Apple Silicon GPU: Game crashes after loading a level with device out of memory followed by device lost errors. I suspect this one does not show up on the master branch since the M1 is far too slow over there (< 10 fps). To get a reassonable frame rate I have to test on the push constants branch.

If I set r_useHierarchicalDepthBuffers = 0 the problem goes away for both cases above. I have looked at the mipmapgen shader code and played with the parameters a bit, but I don't see anything obviously wrong, other than potential memory issues. I am wondering whether there is a subtle sync + resource release delay problem that is exposed on iGPUs that are slower and have more limited memory.

If you don't want to research this, I can easily patch it by adding the following code to DeviceManager_VK.cpp:

    if( !otherGPUs.empty() )
    {
--->        //FIXME: On integrated GPUs disable HiZ buffer to avoid device out of memory/lost errors
--->        r_useHierarchicalDepthBuffer.SetBool( false );

        m_VulkanPhysicalDevice = otherGPUs[0];
        return true;
    }

Please let me know what you want to do with this.

RobertBeckebans commented 1 week ago

Maybe we can add an enum to the device manager where we scan for Nvidia, AMD and Intel vendor devices and then have a cvar like in former times r_useIntelHacks wrapped around by #if defined(linux).

SRSaunders commented 1 week ago

I remember those old r_skipIntelWorkarounds cvars. However, they were used to influence the setting of booleans for feature availability (e.g. timer queries). In this case, the HiZ depth buffer feature is already controlled by the cvar r_useHierarchicalDepthBuffer. So using one cvar to set another seems over-complex unless there are other GPU-specific cases which I am not aware of at this point.

I would rather use conditional #if defined(linux, mac) logic combined with integrated GPU detection to directly control the r_useHierarchicalDepthBuffer cvar. Note my original suggestion above avoids both of these options and just disables it for iGPUs on all platforms. While required on linux and macos, I realize this might be unnecessary on Windows. However, in the past I have seen memory problems which cause crashes on linux be more general in nature, and while they don't crash on Windows there is indeed something wrong underneath. Linux can act like a "canary in the coal mine" for subtle issues. That's why I asked earlier if you want to look at this to see if you could see something wrong with mipmapgen (e.g. sync, memory, etc) that I have missed.

RobertBeckebans commented 6 days ago

I can look into the mipmap gen pass but it's pretty much just copied from the Donut framework and it does not trigger any validation errors so it is more likely we have a driver bug on Linux. Is it the same behaviour on macOS?

SRSaunders commented 5 days ago

Is it the same behaviour on macOS?

Yes, when running the master branch on Apple Silicon/arm64 (my M1 laptop), the HiZ buffer feature can cause an out-of-device-memory crash. It does not happen on my x86_64 machine with a discrete AMD 6600XT GPU. And I can't test using macOS with an Intel iGPU since my ancient Apple Intel laptop is too old and won't run the game - but that combo is mostly irrelevant these days in any case.

I am suspicious that this is not a driver bug since it happens in two different environments: linux and macOS. And a clue when running on macOS with Apple Silicon: when I enable either new SSAO or TAA or both, the crash does not appear with HiZ enabled on the master branch. When both new SSAO and TAA are disabled, the crash happens immediately with HiZ enabled. Without HiZ no crash occurs in any situation. That's what makes me suspicious about a compute shader resource or sync issue with HiZ enabled.