baldurk / renderdoc

RenderDoc is a stand-alone graphics debugging tool.
https://renderdoc.org
MIT License
8.88k stars 1.33k forks source link

[Vulkan] Image barriers cause some images to become incorrectly trashed #3360

Closed JuanDiegoMontoya closed 3 months ago

JuanDiegoMontoya commented 3 months ago

Description

Despite not transitioning to or from LAYOUT_UNDEFINED, some images are becoming "trashed" (filled with the text "UNDEFINED") in RenderDoc only. For example, this image (gDepth) is perfectly fine before the start of the pass "HZB Build Pass" (pardon the gaudy marker colors):

image

At the beginning of HZB Build Pass, there is a pipeline barrier (two actually- the first is just a global VkMemoryBarrier) to ensure gDepth is no longer being used and is in the correct layout (READ_ONLY_OPTIMAL) before, well, building the HZB. That said, the transition is from READ_ONLY_OPTIMAL to READ_ONLY_OPTIMAL, so the image contents should be preserved. However, this is how the image looks after clicking on "HZB Build Pass" (the image is read in the first dispatch and unused by the rest of the pass):

image

The barrier is visible in the API inspector.

A similar issue happens to another image, "VSM Physical Pages": image image

For synchronization, I am using solely constructs from VK_KHR_synchronization2: vkCmdPipelineBarrier2, VkImageMemoryBarrier2. Every stage mask is ALL_COMMANDS_BIT and every access mask is MEMORY_READ_BIT | MEMORY_WRITE_BIT, i.e. every barrier is a gigabarrier and I only have to worry about image layout transitions. The validation layer does not report any sync or API usage issues.

Steps to reproduce

capture.zip

To reproduce the issue, open the capture and go to the resource inspector. Search for "gDepthPrev" (this image swaps with "gDepth" each frame), open the image view in the texture viewer, then browse the event timeline for "HZB Build Pass". Observe that the image has been trashed.

The issue can be reproduced for the image "VSM Physical Pages" by similarly searching for the image view, opening it, then browsing to "Virtual Shadow Maps" -> "VSM Enqueue and Clear Dirty Pages", then alternating between actions 38 and 39, observing how the image becomes trashed by the latter despite there only being a GENERAL->GENERAL layout transition.

The application can be run locally by compiling and running my renderer at this commit: https://github.com/JuanDiegoMontoya/Frogfood/commit/ab0bcbc6d3644d4be83ffefc39f8ebafcc46f7e1. Compilation should be fairly straightforward with CMake as dependencies are automatically fetched and a simple test model (the one shown in the images) is included.

Environment

Miscellaneous

I suspect this may be an AMD-only issue, but I don't have an Nvidia GPU to test with right now.

baldurk commented 3 months ago

I believe this is an AMD driver bug I've seen reported before. The patterns you are seeing are not done (directly) by RenderDoc which is why they are inconsistent and not fully and clearly filled with UNDEFINED but a trashed version (if you see the bottom half of the image for example).

Could you do a test and change any barriers which have the same layout before and after, and duplicate them into two barriers that go A -> B and B -> A where B is any other layout that would be valid for the image? If this is the same bug, it only happens when using sync2 and identical-layout transitions. I was going to try and compile your program to test that myself but it doesn't build for me.

JuanDiegoMontoya commented 3 months ago

Thanks for the quick response, and apologies for the app not building.

I tried your suggestion and it worked for both the cases I showed. Transitioning to a different layout and back makes the images not lose their contents. I suppose I can work around this in my abstraction by simply emitting a VkMemoryBarrier2 instead of VkImageMemoryBarrier2 when the old and new layouts are the same.

baldurk commented 3 months ago

Thanks for verifying, sounds like it is indeed the same AMD driver bug I've seen before. Unfortunately I'm not aware of any workaround on RD's side that would be feasible so this needs to be fixed on their side.