Open Onhi opened 3 weeks ago
Hi @Onhi, Can you provide compilable source code that reproduces this problem? Anything else you can provide would also be helpful. Thanks, Owen
I'll get to reducing the engine to a simple case demonstrating the issue. In the meantime, here's a link to a renderdoc capture (compatible with an RX 7900XTX) and a reduced API dump of the issue. In the following, the resource "camera_depth" is the one showcasing the issue. https://drive.google.com/drive/folders/13M_uPHq1WdBmEf0FVYVS0QKMV86O7pkb?usp=drive_link
I got a version of the code ready, where should I put it?
@Onhi You may create a repository in your github space https://github.com/Onhi?tab=repositories to put your code
I cant put the code on github. I've create a rar file in the google drive folder I linked just above for you to retrieve the code.
Thanks @Onhi, once I get access to the folder I will create internal ticket for this issue.
@owenzhangzhengzhong you should be able to access the folder now.
Hey @Onhi, Able to access the folder, and was able to compile and see the issue in the editor on my local system. I've created internal ticket to track this issue, and we'll look into it. In the meantime, is it possible for you to simplify the testcase for this issue? There's a lot of source to go through, if you can reduce it as much as possible to just rendering with the vkCmdPipelineBarrier that's causing the corruption that would make it much easier for us to investigate. Thanks, Owen
Cool, thanks Owen, I'll do my best to reduce the code to a minimum. Might take some time but I'll get to it during the weekend.
Hi, I'm having an hard time reducing the engine more than what I have provided while still providing a solid repro case (we need to move a camera to change the range of captured information and or resize images... so its not trivial to isolate).
Doing more tests, I see irregular behavior around depth/stencil images & layout transition barriers. On some captures, its the depth that is blocky and bugged (like in the rdc capture I provided) other times it's the stencil aspect of the image that is destroyed.
Full disclaimer, in my effort to reduce the size of the code, I noticed an issue with one of the barrier pair in the code I provided. Around the OIT pass, I wrongfully removed the pair of barriers that where converting from shader to depth stencil and vice versa. Still, fixing it had no impact on the issue.
I understand that this is a foreign code base so I would like to offer help debugging the issue if needed. (via video conf. or otherwise).
Hi @Onhi, Cursory look with synchronization validation enabled shows some errors: SYNC-HAZARD-READ-AFTER-WRITE SYNC-HAZARD-WRITE-AFTER-WRITE Associated with camera_depth and camera_image_view, can you look at how you're setting up the pipeline barriers associated with those command submissions. See attached full log of validation errors: Editor.txt Meanwhile we'll look further as well, Owen
Fixed the validation errors using store op none (instead of store) on the read only usage of the camera_depth buffer in the OIT pass.
Sadly it didnt fix the main issue.
Did you fix all the validation issues? Can you post that code as well? With details on which lines you updated?
Hi, I've uploaded a v2 of the reduced engine with fixes for validation errors and details of lines changed.
I've updated the version again with more fixes (V3). This version can go through queue synchronization validatation & synchronization2 layer without error/warnings.
The problem with depth is still unresolved. :(
Hi @Onhi,
Brief update, was able to not see the issue with triggering CmdBarrier call after every command driver side:
Before:
After:
(This is with resize the windows for a bit)
After narrowing it down a bit it seems related to missing barriers after commands associated with direct dispatch and transfer copies involving an image. Still narrowing it down further.
Wow! Very nice progress! Thanks for the update.
Hi @Onhi,
After narrowing it down further I see the issue resolved by adding this barrier:
VkMemoryBarrier mem_barrier{};
mem_barrier.sType = VK_STRUCTURE_TYPE_MEMORY_BARRIER;
mem_barrier.srcAccessMask = VK_ACCESS_2_COLOR_ATTACHMENT_READ_BIT;
mem_barrier.dstAccessMask = VK_ACCESS_2_NONE;
vkCmdPipelineBarrier(command_buffer,
VK_PIPELINE_STAGE_NONE,
VK_PIPELINE_STAGE_2_TOP_OF_PIPE_BIT_KHR,
0,
1,
&mem_barrier,
0,
nullptr,
0,
nullptr);
In the file: AMDReproEngine\Oasis\code\Engine\Vulkan\Vulkan.Interface.h In the following functions after the Vulkan API calls: void copyImage(... After: vkCmdCopyImage2(command_buffer, ©_image_info);
void copyBufferToImage(... After: vkCmdCopyBufferToImage2(command_buffer, ©_buffer_to_image_info);
void copyImageToBuffer(... After: vkCmdCopyImageToBuffer2(command_buffer, ©_image_to_buffer_info);
void blitImage(... After: vkCmdBlitImage2(command_buffer, &blit_image_info);
void dispatch(... After: vkCmdDispatch(command_buffer, p_group_count_x, p_group_count_y, p_group_count_z);
I see issue mostly resolved after adding the barrier in copyImage and dispatch. I suspect further optimizations can be made.
Something is up with vkCmdPipelineBarrier... It's messing up the content of depth buffers.
GPU: RX 7900XTX Driver Version: 31.0.24033.1003 OS: Windows 11 Pro (10.0.22000 Build 22000)
Here's an exemple:
I know this looks like a problem with the application but I almost certain it is not. Vulkan validation layers are not showing any warning/errors, RenderDoc can capture and replay the issue... And the same code is running flawlessly on team green hardware.
I can help finding the issue providing time, renderdoc captures, vulkan api dumps, code (possible but tricky), etc...
Thanks!