KhronosGroup / Vulkan-ValidationLayers

Vulkan Validation Layers (VVL)
https://vulkan.lunarg.com/doc/sdk/latest/linux/khronos_validation_layer.html
Other
751 stars 401 forks source link

VK_ERROR_DEVICE_LOST when enabling descriptors indexing validation #8377

Open Trider12 opened 1 month ago

Trider12 commented 1 month ago

Environment:

Describe the Issue

I have two graphics pipelines - A and B. A uses sets X and Y, B uses set X. X and Y are bound to sets 0 and 1 respectively. Here's the code:

VkDescriptorSetLayout setLayouts[] { setLayoutX, setLayoutY };
VkPipelineLayoutCreateInfo pipelineLayoutCreateInfo {};
pipelineLayoutCreateInfo.pSetLayouts = setLayouts;
pipelineLayoutCreateInfo.setLayoutCount = 2;
vkCreatePipelineLayout(device, &pipelineLayoutCreateInfo, nullptr, &pipelineLayoutA);
// create pipeline A using layout A

pipelineLayoutCreateInfo.setLayoutCount = 1;
vkCreatePipelineLayout(device, &pipelineLayoutCreateInfo, nullptr, &pipelineLayoutB);
// create pipeline B using layout B
VkDescriptorSet descriptorSets[] { setX, setY };
vkCmdBindDescriptorSets(cmd, VK_PIPELINE_BIND_POINT_GRAPHICS, pipelineLayoutA, 0, 2, descriptorSets, 0, nullptr);

vkCmdBindPipeline(cmd, VK_PIPELINE_BIND_POINT_GRAPHICS, pipelineA);
vkCmdDraw(...); // uses sets 0 and 1

vkCmdBindPipeline(cmd, VK_PIPELINE_BIND_POINT_GRAPHICS, pipelineB);
vkCmdDraw(...); // uses set 0

Submitting this command buffer results in VK_ERROR_DEVICE_LOST with descriptors indexing validation enabled. There's no error with it disabled. The error can be avoided by creating pipeline B with pipeline layout A.

According to the spec my usage is fine, because layouts A and B are compatible for set 0:

Two pipeline layouts are defined to be “compatible for push constants” if they were created with identical push constant ranges. Two pipeline layouts are defined to be “compatible for set N” if they were created with identically defined descriptor set layouts for sets zero through N, and if they were created with identical push constant ranges.

When binding a descriptor set (see Descriptor Set Binding) to set number N, a previously bound descriptor set bound with lower index M than N is disturbed if the pipeline layouts for set M and N are not compatible for set M. Otherwise, the bound descriptor set in M is not disturbed.

If, additionally, the previously bound descriptor set for set N was bound using a pipeline layout not compatible for set N, then all bindings in sets numbered greater than N are disturbed.

When binding a pipeline, the pipeline can correctly access any previously bound descriptor set N if it was bound with compatible pipeline layout for set N, and it was not disturbed.

Layout compatibility means that descriptor sets can be bound to a command buffer for use by any pipeline created with a compatible pipeline layout, and without having bound a particular pipeline first. It also means that descriptor sets can remain valid across a pipeline change, and the same resources will be accessible to the newly bound pipeline.

When a descriptor set is disturbed by binding descriptor sets, the disturbed set is considered to contain undefined descriptors bound with the same pipeline layout as the disturbing descriptor set.

Expected behavior

Device not being lost.

Additional context

There are no validation errors prior to VK_ERROR_DEVICE_LOST.

spencer-lunarg commented 1 month ago

@Trider12 thanks for reporting this, we are currently heavily working and fixing GPU-AV, once I get the new descriptor indexing validation setup, will come back and take a look, but hopefully it will "just be fixed" then

spencer-lunarg commented 1 week ago

@Trider12 so I was able to reproduce the crash in https://github.com/KhronosGroup/Vulkan-ValidationLayers/pull/8535 (thanks for the simple breakdown of the tests)

So I see what is happening, we are mismatching the pipeline layout underneath we use in GPU-AV and creating an invalid Vulkan flow, which causes the crash... will try hard to get in before the next SDK soon!