unique_id_mapping lifetime not tied to instance lifetime

Environment:

OS: Arch Linux
GPU and driver version: NVidia 550.90.07
SDK or header version if building from repo: 1.3.275
Options enabled (synchronization, best practices, etc.):

Describe the Issue

In layers/vulkan/generated/chassis.cpp you have a global vl_concurrent_unordered_map unique_id_mapping that I assume tracks unique ids of Vulkan objects. Among other places, it is used in layers/vulkan/generated/layer_chassis_dispatch.cpp:822:

void DispatchDestroySemaphore(VkDevice device, VkSemaphore semaphore, const VkAllocationCallbacks* pAllocator) {
    auto layer_data = GetLayerDataPtr(get_dispatch_key(device), layer_data_map);
    if (!wrap_handles) return layer_data->device_dispatch_table.DestroySemaphore(device, semaphore, pAllocator);

    uint64_t semaphore_id = CastToUint64(semaphore);
    auto iter = unique_id_mapping.pop(semaphore_id);
    if (iter != unique_id_mapping.end()) {
        semaphore = (VkSemaphore)iter->second;
    } else { 
        semaphore = (VkSemaphore)0;
    }
    layer_data->device_dispatch_table.DestroySemaphore(device, semaphore, pAllocator);
}

I use a singleton application object in whose destructor I stop the device, cleanup any objects, and destroy the instance. At this point, the only objects still alive are the timeline semaphores used to track queue submissions, stored in an array from which they are recycled as needed. Before the device is destroyed, these semaphores must be destroyed.

When running the app in valgrind, I get the following use-after-free error:

Invalid write of size 1
   at 0x108EBA45: shiftDown (robin_hood.h:1410)
   by 0x108EBA45: erase (robin_hood.h:2038)
   by 0x108EBA45: vl_concurrent_unordered_map<unsigned long, unsigned long, 4, HashedUint64>::pop(unsigned long const&) (vk_layer_utils.h:715)
   by 0x108CB96A: DispatchDestroySemaphore(VkDevice_T*, VkSemaphore_T*, VkAllocationCallbacks const*) (layer_chassis_dispatch.cpp:822)
   by 0x107A148E: vulkan_layer_chassis::DestroySemaphore(VkDevice_T*, VkSemaphore_T*, VkAllocationCallbacks const*) (chassis.cpp:1871)
   by 0x1C2CFD: vk::Device::Semaphore::destroy() (device.cc:656)
   by 0x1BDF59: vk::Device::Semaphore::~Semaphore() (device.h:138)
   by 0x1C4931: void std::destroy_at<vk::Device::Semaphore>(vk::Device::Semaphore*) (cxxabi.h:144)
   by 0x1C43B5: void std::destroy<vk::Device::Semaphore*>(vk::Device::Semaphore*, vk::Device::Semaphore*) (cxxabi.h:152)
   by 0x1C4117: std::array<vk::Device::Semaphore>::destroy_all() (array.h:29)
   by 0x1C32ED: std::array<vk::Device::Semaphore>::~array() (array.h:41)
   by 0x1BE951: vk::Device::~Device() (device.cc:35)
   by 0x19C6DA: VApp::~VApp() (app.cc:53)
   by 0x5117FA0: __run_exit_handlers (exit.c:108)
 Address 0x11d21a21 is 225 bytes inside a block of size 246 free'd
   at 0x48458CF: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
   by 0x1083E51C: destroy (robin_hood.h:2479)
   by 0x1083E51C: destroy (robin_hood.h:2464)
   by 0x1083E51C: ~Table (robin_hood.h:1705)
   by 0x1083E51C: vl_concurrent_unordered_map<unsigned long, unsigned long, 4, HashedUint64>::~vl_concurrent_unordered_map() (vk_layer_utils.h:634)
   by 0x5117FA0: __run_exit_handlers (exit.c:108)
   by 0x511806D: exit (exit.c:138)
   by 0x50FEC8E: (below main) (libc_start_call_main.h:74)
 Block was alloc'd at
   at 0x4842788: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
   by 0x108EC451: initData (robin_hood.h:2330)
   by 0x108EC451: increase_size (robin_hood.h:2431)
   by 0x108EC451: std::pair<unsigned long, robin_hood::detail::Table<true, 80ul, unsigned long, unsigned long, HashedUint64, std::equal_to<unsigned long>
   by 0x108EC681: operator[]<> (robin_hood.h:1731)
   by 0x108EC681: void vl_concurrent_unordered_map<unsigned long, unsigned long, 4, HashedUint64>::insert_or_assign<unsigned long>(unsigned long const&,
   by 0x108CC271: WrapNew<VkBuffer_T*> (chassis.h:2441)
   by 0x108CC271: DispatchCreateBuffer(VkDevice_T*, VkBufferCreateInfo const*, VkAllocationCallbacks const*, VkBuffer_T**) (layer_chassis_dispatch.cpp:93
   by 0x10798239: vulkan_layer_chassis::CreateBuffer(VkDevice_T*, VkBufferCreateInfo const*, VkAllocationCallbacks const*, VkBuffer_T**) (chassis.cpp:976
   by 0x1A1F59: vk::Buffer::create(vk::Device&, std::string_view const&, cwiclui::IWindow::Asset::Info, unsigned long, unsigned int) (buffer.cc:46)
   by 0x1A7ACD: VkResult vk::Resources::List<vk::Buffer>::create<cwiclui::IWindow::Asset::Info, unsigned int&>(std::string_view const&, cwiclui::IWindow:
   by 0x1A4820: vk::CommandBuffer::TempBuf::consolidate() (cbuf.cc:173)
   by 0x1A42AF: vk::CommandBuffer::TempBuf::TempBuf(cwiclui::IWindow::Asset::Info, vk::CommandBuffer&, unsigned int) (cbuf.cc:111)
   by 0x1A49B8: auto std::construct_at<vk::CommandBuffer::TempBuf, cwiclui::IWindow::Asset::Info&, vk::CommandBuffer&, unsigned int&>(vk::CommandBuffer::
   by 0x1A4A3E: auto std::array<vk::CommandBuffer::TempBuf>::emplace<cwiclui::IWindow::Asset::Info&, vk::CommandBuffer&, unsigned int&>(vk::CommandBuffer
   by 0x1A4AB7: auto& std::array<vk::CommandBuffer::TempBuf>::emplace_back<cwiclui::IWindow::Asset::Info&, vk::CommandBuffer&, unsigned int&>(cwiclui::IW

What appears to be happening is that your unique_id_mapping is destroyed at exit. Its destruction is ordered before the destruction of my app object because it was created after the app object, when the app object created the Vulkan instance and with it loaded the validation layers. Because my Vulkan objects, the device, and the instance are deleted in the app object destructor, your unique_id_mapping has already been destroyed, and many use-after-free errors result from code such as DispatchDestroySemaphore which accesses it.

I do not currently experience crashes at exit, only these valgrind errors, because reuse of those freed memory blocks is unlikely in the short time the destructor runs, but obviously crashes, memory corruption, and security problems remain a possibility.

Expected behavior

Because the layers are loaded by the Vulkan instance object, I would expect that the lifetime of these layers be tied to the lifetime of the instance. That is, I would expect the layers to stay in a valid state until vkDestroyInstance is called.

The use of a global app object is a common C++ design pattern, and it is likely many people would not expect to have to implement unusual countermeasures to prevent corruption errors from your layer. What they would all need to do is install an atexit handler after creating the instance, and then do all the cleanup in there. This is not obvious.

Now, I don't quite understand how you would fix this, since you have many global objects and the code appears designed to have them. If it is possible, one solution might be to unregister the layer and all its hooks in an additional atexit handler. Otherwise, a prominent warning needs to exist in the layer documentation explaining that all Vulkan objects must be cleaned up before entering the exit handlers.

KhronosGroup / Vulkan-ValidationLayers

unique_id_mapping lifetime not tied to instance lifetime #8159