KhronosGroup / Vulkan-ValidationLayers

Vulkan Validation Layers (VVL)
https://vulkan.lunarg.com/doc/sdk/latest/linux/khronos_validation_layer.html
Other
748 stars 400 forks source link

vkUpdateDescriptorSets crashes for VkCopyDescriptorSet in ValidateImageUpdate calls #8274

Closed Nimanf closed 1 month ago

Nimanf commented 2 months ago

Environment:

Describe the Issue

We create two descriptor pools of different sizes, and two duplicate descriptor sets. We have an unbound array of 10,000 descriptors of VK_DESCRIPTOR_TYPE_SAMPLED_IMAGE type, used for array of textures like [[vk::binding(0,0)]] Texture2D<float4> Float4Texture2Ds[] : register(t0, space1);

Some of the descriptors are undefined and some others are set to textures. (Shader only access the ones that set the descriptor) Once we call vkUpdateDescriptorSets() with VkCopyDescriptorSet to copy the 10k descriptors from descriptor-set1 to descriptor-set2, validation layers crash in ValidateImageUpdate, with this callstack:

    VkLayer_khronos_validation.dll!CoreChecks::ValidateImageUpdate(VkImageView_T * image_view, VkImageLayout image_layout, VkDescriptorType type, const Location & image_info_loc) Line 1136    C++
>   VkLayer_khronos_validation.dll!CoreChecks::VerifyCopyUpdateContents(const VkCopyDescriptorSet & update, const vvl::DescriptorSet & src_set, VkDescriptorType src_type, unsigned int dst_set, const vvl::DescriptorSet & dst_type, VkDescriptorType copy_loc, unsigned int) Line 1734    C++
    VkLayer_khronos_validation.dll!CoreChecks::ValidateCopyUpdate(const VkCopyDescriptorSet & update, const Location & copy_loc) Line 1120  C++
    VkLayer_khronos_validation.dll!CoreChecks::ValidateUpdateDescriptorSets(unsigned int descriptorWriteCount, const VkWriteDescriptorSet * pDescriptorWrites, unsigned int descriptorCopyCount, const VkCopyDescriptorSet * pDescriptorCopies, const Location & loc) Line 1431 C++
    VkLayer_khronos_validation.dll!CoreChecks::PreCallValidateUpdateDescriptorSets(VkDevice_T * device, unsigned int descriptorWriteCount, const VkWriteDescriptorSet * pDescriptorWrites, unsigned int descriptorCopyCount, const VkCopyDescriptorSet * pDescriptorCopies, const ErrorObject & error_obj) Line 3481    C++
    VkLayer_khronos_validation.dll!vulkan_layer_chassis::UpdateDescriptorSets(VkDevice_T * device, unsigned int descriptorWriteCount, const VkWriteDescriptorSet * pDescriptorWrites, unsigned int descriptorCopyCount, const VkCopyDescriptorSet * pDescriptorCopies) Line 2700    C++

Validation layer shows:

stage_flags 2147483647
binding_flags   5 
count   10000 

Expected behavior

For large array of descriptors for raytracing, some of them might be unset and validation layer seems to mistakenly try to call "ValidateImageUpdate" on some of the invalid ones and crash? Assumption is that we copy everything as-is, and undefined/null descriptors are copied as-is.

Note: unfortunately I can't provide the app to repro this crash, and I would appreciate any help here based on the above info.

spencer-lunarg commented 2 months ago

Thanks for bringing this up - this should be fixed in the next upcoming SDK

I don't have the various commits on me (there were 4 or 5) were we went in and removed a lot of pointers not be null checked, passed things in as reference, and overall tried to reduce this crashing

If possible, at the bottom of https://github.com/KhronosGroup/Vulkan-ValidationLayers/actions/runs/9898500971 there are release binaries of the current latest Validation Layers and it should be fixed in there

Nimanf commented 2 months ago

Awesome, thanks. Nice to hear it is fixed.

The link you provided only contains the binary of "VkLayer_khronos_validation.dll", which I tried to copy over the existing Vulkan SDK and it crashed on msvcp140.dll!mtx_do_lock during vk instance loading.

Is there another way you could recommend for early testing the latest build please?

spencer-lunarg commented 2 months ago

@Nimanf 2 LunarG engineers tried and said the VkLayer_khronos_validation.dll worked for them (no one has ever seen this msvcp140.dll!mtx_do_lock error neither)

suggestions are

  1. Wait until the next SDK comes out (should be a 2-ish week now)
  2. Build from source (what everyone hates)
  3. Try set VK_LOADER_DEBUG=all and see if it gives more information what is going on before the crash in vkCreateInstance
Nimanf commented 2 months ago

Thanks for info. we will wait for the next release.

Unfortunately, VK_LOADER_DEBUG didn't provide any info pointing to the root cause of the instance crash.

>> Build from source (what everyone hates)

Mostly, update_deps.py step is when we struggle to compile vulkan validations on dev machines cleanly. We have different python versions installed on developer machines and they are not set in PATH (to avoid conflicts). Vulkan SDK fails on update_deps.py step even after python being in PATH .

Could not run update_deps.py which is necessary to download dependencies.

and if we run update_deps.py directly with python:

TypeError: __init__() got an unexpected keyword argument 'capture_output'
spencer-lunarg commented 1 month ago

@Nimanf the 1.3.290 SDK is out now, could you grab it and confirm this issue is fixed now?

Nimanf commented 1 month ago

Thanks. Tried 1.3.290 yesterday and it worked without Crashing.

Closing the ticket as resolved. Thanks again.