KhronosGroup / Vulkan-ValidationLayers

Vulkan Validation Layers (VVL)
https://vulkan.lunarg.com/doc/sdk/latest/linux/khronos_validation_layer.html
Other
750 stars 402 forks source link

Validation layer crash due to mishandle of vkDestroySwapchainKHR #8584

Open Mr-222 opened 5 days ago

Mr-222 commented 5 days ago

Environment:

Describe the Issue

This issue heavily depends on driver implementation, I only observe this issue in NVIDIA GPU.

When VK_VALIDATION_FEATURE_DISABLE_UNIQUE_HANDLES_EXT is enabled, which means I manually disable the handle wrapping. At this time, when recreating a swapchain by calling vkCreateSwapchainKHR with oldswapchain parameter, the validation layer will ignore situations like VKImage from different swapchains uses same handles. So we may get same image handles used by old swapchains when calling vkGetSwapchainImagesKHR, which the validation layer is not properly handled. Here's a simple example:

  1. Create swapchain1
  2. Create swapchain2
  3. Destroy swapchain1

When destroy the swapchain1, the validation layer would also delete the swapchain's images in the StateMap and ObjectTable without checking if newly created swapchain is using those image handles. However, swapchain2 is still using those images. As a result, the program would crash afterwards because it can't find corresponding image handle when creating image view and framebuffer for the new swapchain.

A simple reproduce

You can use Vulkan sample to reproduce this bug crash easily, just add the VK_VALIDATION_FEATURE_DISABLE_UNIQUE_HANDLES_EXT extension. What I did is change the Instance.cpp in Vulkan-Samples/framework/core, change the constructor of Instance, add the extension:

Instance::Instance(const std::string                            &application_name,
                   const std::unordered_map<const char *, bool> &required_extensions,
                   const std::vector<const char *>              &required_validation_layers,
                   const std::vector<VkLayerSettingEXT>         &required_layer_settings,
                   bool                                          headless,
                   uint32_t                                      api_version)
{
  ...

    // Some of the specialized layers need to be enabled explicitly
#ifdef USE_VALIDATION_LAYER_FEATURES
    VkValidationFeaturesEXT                   validation_features_info = {VK_STRUCTURE_TYPE_VALIDATION_FEATURES_EXT};
    std::vector<VkValidationFeatureEnableEXT> enable_features{};
    if (validation_features)
    {
#   if defined(VKB_VALIDATION_LAYERS_GPU_ASSISTED)
        enable_features.push_back(VK_VALIDATION_FEATURE_ENABLE_GPU_ASSISTED_RESERVE_BINDING_SLOT_EXT);
        enable_features.push_back(VK_VALIDATION_FEATURE_ENABLE_GPU_ASSISTED_EXT);
#   endif
#   if defined(VKB_VALIDATION_LAYERS_BEST_PRACTICES)
        enable_features.push_back(VK_VALIDATION_FEATURE_ENABLE_BEST_PRACTICES_EXT);
#   endif
#   if defined(VKB_VALIDATION_LAYERS_SYNCHRONIZATION)
        enable_features.push_back(VK_VALIDATION_FEATURE_ENABLE_SYNCHRONIZATION_VALIDATION_EXT);
#   endif

        validation_features_info.enabledValidationFeatureCount = static_cast<uint32_t>(enable_features.size());
        validation_features_info.pEnabledValidationFeatures    = enable_features.data();
        validation_features_info.pNext                         = instance_info.pNext;
        instance_info.pNext                                    = &validation_features_info;
    }
#endif

    // Enable VK_VALIDATION_FEATURE_DISABLE_UNIQUE_HANDLES_EXT
    VkValidationFeaturesEXT validation_features_info = {VK_STRUCTURE_TYPE_VALIDATION_FEATURES_EXT};
    std::vector<VkValidationFeatureDisableEXT> disable_features{};
    disable_features.push_back(VK_VALIDATION_FEATURE_DISABLE_UNIQUE_HANDLES_EXT);
    validation_features_info.disabledValidationFeatureCount = static_cast<uint32_t>(disable_features.size());
    validation_features_info.pDisabledValidationFeatures    = disable_features.data();
    instance_info.pNext                                     = &validation_features_info;

    VkLayerSettingsCreateInfoEXT layerSettingsCreateInfo{VK_STRUCTURE_TYPE_LAYER_SETTINGS_CREATE_INFO_EXT};

    // If layer settings extension enabled by sample, then activate layer settings during instance creation
    if (std::find(enabled_extensions.begin(), enabled_extensions.end(), VK_EXT_LAYER_SETTINGS_EXTENSION_NAME) != enabled_extensions.end())
    {
        layerSettingsCreateInfo.settingCount = static_cast<uint32_t>(required_layer_settings.size());
        layerSettingsCreateInfo.pSettings    = required_layer_settings.data();
        layerSettingsCreateInfo.pNext        = instance_info.pNext;
        instance_info.pNext                  = &layerSettingsCreateInfo;
    }

    // Create the Vulkan instance
    VkResult result = vkCreateInstance(&instance_info, nullptr, &handle);

  ...
}

Also add the instance extension in ray_tracing_basic.cpp:

    RaytracingBasic::RaytracingBasic()
    {
        ...

        add_instance_extension(VK_EXT_VALIDATION_FEATURES_EXTENSION_NAME, true);
    }

After that, run ray_tracing_basic sample, it may crash. It may also crashes when you resize the window. Note that there's a probability to this thing, because the GPU drive doesn't always choose to reuse image handles when recreating a swapchain. According to my experiments, this bug happens four times out of ten, and it only happens when using NVIDIA GPU. So if you don't see this bug crash at the first time, try resize the window multiple times or relaunch the sample several times.

Possible cause

There are 3 places in the code that contributes this bug, the validation layer doesn't check if the image handle is being used by newly created swapchain.

First is in image_state.cpp, line 605, this would cause validation layer can't create its image view object.

void Swapchain::Destroy() {
    for (auto &swapchain_image : images) {
        ...
        dev_data.Destroy<vvl::Image>(swapchain_image.image_state->VkHandle());
        // NOTE: We don't have access to dev_data.fake_memory.Free() here, but it is currently a no-op
    }
    ...
}

The second one lies in object_tracker_utils, line 820:

void ObjectLifetimes::PreCallRecordDestroySwapchainKHR(VkDevice device, VkSwapchainKHR swapchain,
                                                       const VkAllocationCallbacks *pAllocator, const RecordObject &record_obj) {
    ...

    auto snapshot = swapchain_image_map.snapshot(
        [swapchain](const std::shared_ptr<ObjTrackState> &pNode) { return pNode->parent_object == HandleToUint64(swapchain); });
    for (const auto &itr : snapshot) {
        swapchain_image_map.erase(itr.first);
    }
}

The last one is in thread_safety_validation.cpp, line 476:

void ThreadSafety::PostCallRecordDestroySwapchainKHR(VkDevice device, VkSwapchainKHR swapchain,
                                                     const VkAllocationCallbacks* pAllocator, const RecordObject& record_obj) {
    ...
    // Host access to swapchain must be externally synchronized
    auto lock = WriteLockGuard(thread_safety_lock);
    for (auto& image_handle : swapchain_wrapped_image_handle_map[swapchain]) {
        FinishWriteObject(image_handle, record_obj.location);
        DestroyObject(image_handle);
    }
    ...
}
spencer-lunarg commented 5 days ago

Thanks for reporting this!

I guess my first question is why you needed VK_VALIDATION_FEATURE_DISABLE_UNIQUE_HANDLES_EXT (this is me grasping how people use the VVL settings)

We are currently getting things ready for the upcoming SDK, so will be a while until I can look at this, but thanks for deep investigation you have done so far

Mr-222 commented 4 days ago

Thank you spencer, take your time! For the question, I need the extension to disable the handle wrapping feature so that I can access the “real” object allocated by the driver for some backdoor APIs. Also, I'm a little curious about the driver behavior, do you have any clue about what decides if the driver would reuse old swapchain's image handle or not?

Mr-222 commented 4 days ago

Also reformatted the codeblock in my description to make it clear, sorry for the inexperience of issuing if you think the codeblock is messy.

spencer-lunarg commented 4 days ago

do you have any clue about what decides if the driver would reuse old swapchain's image handle or not?

It took me years and I still feel I don't understand the Linux Windowing system... Windows is worse as I have no idea how drivers interact with WSI on Windows

Mr-222 commented 4 days ago

lol, got it!