Open msharov opened 3 months ago
took a quick look into this, you seem right and thanks for the details in the issue! Unfortunately I have a lot of other things taking my priority/focus right now and won't be able to do a deep look into this for a while
Environment:
Describe the Issue
In layers/vulkan/generated/chassis.cpp you have a global vl_concurrent_unordered_map unique_id_mapping that I assume tracks unique ids of Vulkan objects. Among other places, it is used in layers/vulkan/generated/layer_chassis_dispatch.cpp:822:
I use a singleton application object in whose destructor I stop the device, cleanup any objects, and destroy the instance. At this point, the only objects still alive are the timeline semaphores used to track queue submissions, stored in an array from which they are recycled as needed. Before the device is destroyed, these semaphores must be destroyed.
When running the app in valgrind, I get the following use-after-free error:
What appears to be happening is that your unique_id_mapping is destroyed at exit. Its destruction is ordered before the destruction of my app object because it was created after the app object, when the app object created the Vulkan instance and with it loaded the validation layers. Because my Vulkan objects, the device, and the instance are deleted in the app object destructor, your unique_id_mapping has already been destroyed, and many use-after-free errors result from code such as DispatchDestroySemaphore which accesses it.
I do not currently experience crashes at exit, only these valgrind errors, because reuse of those freed memory blocks is unlikely in the short time the destructor runs, but obviously crashes, memory corruption, and security problems remain a possibility.
Expected behavior
Because the layers are loaded by the Vulkan instance object, I would expect that the lifetime of these layers be tied to the lifetime of the instance. That is, I would expect the layers to stay in a valid state until vkDestroyInstance is called.
The use of a global app object is a common C++ design pattern, and it is likely many people would not expect to have to implement unusual countermeasures to prevent corruption errors from your layer. What they would all need to do is install an atexit handler after creating the instance, and then do all the cleanup in there. This is not obvious.
Now, I don't quite understand how you would fix this, since you have many global objects and the code appears designed to have them. If it is possible, one solution might be to unregister the layer and all its hooks in an additional atexit handler. Otherwise, a prominent warning needs to exist in the layer documentation explaining that all Vulkan objects must be cleaned up before entering the exit handlers.