Closed HuYuxin closed 1 week ago
There's likely some relationship with https://github.com/KhronosGroup/Vulkan-ValidationLayers/issues/3605
Basically, the range of a buffer resource is not taken into account in syncval (it seems)
I wonder if https://github.com/KhronosGroup/Vulkan-ValidationLayers/issues/3605 was fixed by https://github.com/KhronosGroup/Vulkan-ValidationLayers/pull/8141, we indeed did not compute offsets correctly for dynamic descriptors (but I did not look closely at these issues yet, so could be a wrong suggestion). This issue should be something different because it is based on the latest code.
I wonder if #3605 was fixed by #8141, we indeed did not compute offsets correctly for dynamic descriptors (but I did not look closely at these issues yet, so could be a wrong suggestion). This issue should be something different because it is based on the latest code.
Yes, looks like the dynamic offset part is fixed: https://chromium-review.googlesource.com/c/angle/angle/+/5873529
We're still getting false positives because the vertex buffer ranges are not taken into account (it seems)
Should be fixed by https://github.com/KhronosGroup/Vulkan-ValidationLayers/pull/8674
It doesn't look like this was about indirect buffers only though. I tried removing this suppression and we still get VVL failures: https://chromium-review.googlesource.com/c/angle/angle/+/5937422
Typically, the failures we see are in the form of vertex-attribute usage reported not synced with transfer, but those are happening to different ranges of the buffer.
It doesn't look like this was about indirect buffers only though
The error message in this issue is exactly what did not work (indirect command that accesses index or vertex buffer which are suballocated) but I might miss something.
[ SYNC-HAZARD-WRITE-AFTER-READ ] Validation Error: [ SYNC-HAZARD-WRITE-AFTER-READ ] Object 0: handle = 0x5557db026030, type = VK_OBJECT_TYPE_COMMAND_BUFFER; Object 1: handle = 0x200000000020, type = VK_OBJECT_TYPE_BUFFER; | MessageID = 0x376bc9df | vkCmdCopyImageToBuffer(): Hazard WRITE_AFTER_READ for dstBuffer VkBuffer 0x200000000020[], region 0. Access info (usage: SYNC_COPY_TRANSFER_WRITE, prior_usage: SYNC_INDEX_INPUT_INDEX_READ, read_barriers: VK_PIPELINE_STAGE_2_BOTTOM_OF_PIPE_BIT, command: vkCmdDrawIndexedIndirect, seq_no: 8, reset_no: 3, resource: VkBuffer 0x200000000020[]).
@ShabbyX I'm not sure where I need to look for the errors in the above link, could you help to locate it?
I'm going to use angle directly for testing to be sure. Will be looking into this tomorrow, hopefully we will fix this.
@ShabbyX could you confirm that out/LinuxDebug/angle_end2end_tests --gtest_filter="LineLoopIndirectTest.UShortIndexIndirectBuffer/ES3_1_Vulkan" --verbose --local-output
command line still produces the errors reported by this issue when the corresponding section in vk_renderer.cpp is uncommented?
Also should I apply any patches to the latest angle code? My impression that the first patch (https://chromium-review.googlesource.com/c/angle/angle/+/5832709/4) is already merged but not the second (https://chromium-review.googlesource.com/c/angle/angle/+/5832710/10).
That test is fixed, I can confirm. Unsuppressing related VVL errors in https://chromium-review.googlesource.com/c/angle/angle/+/5937422, there are other failures, not related to indirect buffers. You don't need to cherry-pick anything, it has nothing to do with https://chromium-review.googlesource.com/c/angle/angle/+/5832710 and the issue is in fact much older than that.
Try taking that first change (unsuppresses VVL errors) and run this test: --gtest_filter=BufferDataTestES3.CopyBufferSubDataDraw/ES3_Vulkan
. I get this:
vk_renderer.cpp:777 (DebugUtilsMessenger): [ SYNC-HAZARD-READ-AFTER-WRITE ] Validation Error: [ SYNC-HAZARD-READ-AFTER-WRITE ] Object 0: handle = 0x240000000024, type = VK_OBJECT_TYPE_BUFFER; | MessageID = 0xe4d96472 | vkCmdDrawIndexed(): Hazard READ_AFTER_WRITE for vertex VkBuffer 0x240000000024[] in VkCommandBuffer 0x55df9d7ffb90[]. Access info (usage: SYNC_VERTEX_ATTRIBUTE_INPUT_VERTEX_ATTRIBUTE_READ, prior_usage: SYNC_COPY_TRANSFER_WRITE, write_barriers: 0, command: vkCmdCopyBuffer, seq_no: 2, reset_no: 7, resource: VkBuffer 0x240000000024[]).
Object: 0x240000000024 (type = Buffer(9))
Ooooh you know what, I think actually I know where this is coming from. vkCmdBindVertexBuffers
doesn't take the size of the buffer, only the offset. So syncval has no way to know how much of the buffer is actually used as vertex input (well, with some gymnastics it might, at least when geometry/tessellation/indirect are not involved).
If you're confident that syncval is taking buffer ranges into account, the VVL issue probably goes away for us if we use vkCmdBindVertexBuffers2
and pass in the sizes.
I remember now we had the same problem with robustness, where we couldn't rely on robustness with vertex data in conjunction with suballocation because the driver couldn't tell where the end of the data is.
Feel free to re-close for now.
I can reproduce error with --gtest_filter=BufferDataTestES3.CopyBufferSubDataDraw/ES3_Vulkan
on my machine it's about index buffer:
[ SYNC-HAZARD-READ-AFTER-WRITE ] Validation Error: [ SYNC-HAZARD-READ-AFTER-WRITE ] Object 0: handle = 0x280000000028, type = VK_OBJECT_TYPE_BUFFER; | MessageID = 0xe4d96472 | vkCmdDrawIndexed(): Hazard READ_AFTER_WRITE for vertex VkBuffer 0x280000000028[] in VkCommandBuffer 0x2309d9c33a0[]. Access info (usage: SYNC_VERTEX_ATTRIBUTE_INPUT_VERTEX_ATTRIBUTE_READ, prior_usage: SYNC_COPY_TRANSFER_WRITE, write_barriers: 0, command: vkCmdCopyBuffer, seq_no: 2, reset_no: 9, resource: VkBuffer 0x280000000028[]).
I can see a potential problem here (and the same with vertex attributes access). The error says data was written by a copy command and then read by the vertex pipeline but sync validation did not detect a barrier that ensures writes can be used by the following read access (write_barriers: 0). Could it be a missing/incomplete barrier in angle? If you think everything is okay then I can debug sync validation to confirm false positive and if that's the case then it should be a separate issue.
No like I said I'm pretty sure this is just a matter of vkCmdBindVertexBuffers
implicitly assuming the vertex data is read from the offset to the end of the buffer (because there's no size). So in a way it's a false positive, but it's Vulkan's fault (vkCmdBindVertexBuffers2
fixes it).
You can close this, when we get a chance to use vkCmdBindVertexBuffers2
and give the size, we can check if this continues to be reported.
So in a way it's a false positive
We are trying to follow the direction now that syncval does not report false positives. Ether to disable specific validation or to make it optional (disabled by default). Something I need to look at.
No like I said I'm pretty sure this is just a matter of vkCmdBindVertexBuffers implicitly assuming the vertex data is read from the offset to the end of the buffer
Makes sense, my brain was somewhere else
We are trying to follow the direction now that syncval does not report false positives. Ether to disable specific validation or to make it optional (disabled by default). Something I need to look at.
Sounds good. I guess an easy way to still have some validation for vertex attributes but also remove false positives is to implicitly assume size == 1
when vkCmdBindVertexBuffers
is used (or vkCmdBindVertexBuffers2
with pSizes == nullptr
).
You won't catch bugs with overlapping areas used for different purposes, but that should be very rare. You would correctly catch syncval errors for the usual usage of a buffer range being used as one entity (because one byte represents the entire thing)
Environment:
Describe the Issue
To reproduce:
gn args out/LinuxDebug
autoninja -C out/LinuxDebug angle_end2end_tests
out/LinuxDebug/angle_end2end_tests --gtest_filter="LineLoopIndirectTest.UShortIndexIndirectBuffer/ES3_1_Vulkan" --verbose --local-output
Observe that the test failed with this validation error:
Expected behavior
Test should pass without the SYNC-HAZARD-WRITE-AFTER-READ validation error.
Valid Usage ID If applicable, please include the validation messages encountered leading up to the issue
Additional context
The renderdoc capture shows that VVL thinks there is a missing execution barrier on the VkBuffer:
VkBuffer used as index buffer for vkCmdDrawIndexed() --- VVL thinks there is a missing execution barrier--- VkBuffer used as dest buffer for vkCmdCopyImageToBuffer()
However, the two buffers are suballocated. Even they belong to the same big VkBuffer, they are actually pointing to different memories.
We think that using
VK_KHR_Synchronization2
exposed this issue in VVL. Before we useVK_KHR_Synchronization2
, thesrcStageMask
anddstStageMask
are shared betweenVkMemoryBarrier
,VkBufferMemoryBarier
, andVkImageMemoryBarrier
. in thevkCmdPipelineBarrier ()
call. Between thevkCmdDrawIndex()
andvkCmdCopyImageToBuffer()
, even if only image barrier is needed, thesrcStageMask
anddstStageMask
values also apply to theVkMemoryBarrier
andVkBufferMemoryBarrier
, and it fulfills VVL's validation on buffer execution barrier (even there is no execution barrier needed on buffer because they are suballocated and point to different memories). After switching toVK_KHR_Synchronization2
, thesrcStageMask
anddstStageMask
are packed within theVkMemoryBarrier2
,VkBufferMemoryBarier2
,VkImageMemoryBarrier2
. So now ifVkMemoryBarrier2
is null, there is no execution barrier applied to VkBuffer, and VVL incorrectly thinks that there is a missing execution barrier on the buffer.