GPUOpen-Archive / Anvil

Anvil is a cross-platform framework for Vulkan
MIT License
594 stars 62 forks source link

Queue::submit with blocking-flag set does not guarantee command buffer completion? #133

Open Silverlan opened 5 years ago

Silverlan commented 5 years ago

So, this is a bit of an odd one and I'm not at all sure if this has even anything to do with Anvil. Occasionally I get an error from the validation layers when I try to destroy a command buffer I've submitted, even though I've set the _in_shouldblock-parameter to true:

[VK] Attempt to free command buffer (0x2101289c100) which is in use. The Vulkan spec states: All elements of pCommandBuffers must not be in the pending state (https://www.khronos.org/registry/vulkan/specs/1.1-extensions/html/vkspec.html#VUID-vkFreeCommandBuffers-pCommandBuffers-00047)

I've had this happen, for example, in the last else-branch of _Image::change_imagelayout:

[...]
else
{
    Anvil::CommandBufferMGPUSubmission cmd_buffer_submission;
    const Anvil::MGPUDevice*           mgpu_device_ptr(dynamic_cast<const Anvil::MGPUDevice*>(m_device_ptr) );

    cmd_buffer_submission.cmd_buffer_ptr = transition_command_buffer_ptr.get      ();
    cmd_buffer_submission.device_mask    = (1 << mgpu_device_ptr->get_n_physical_devices()) - 1;

    /* TODO */
    anvil_assert(in_opt_n_set_semaphores  == 0);
    anvil_assert(in_opt_n_wait_semaphores == 0);

    in_queue_ptr->submit(
        Anvil::SubmitInfo::create_execute(&cmd_buffer_submission,
                                            1, /* in_n_command_buffer_submissions */
                                            true /* should_block */)
    );
}

(The error triggers when _transition_command_bufferptr goes out of scope and is destroyed.) As far as I can tell the error is nonsense, the command buffer is allocated in the same function and only used for that one purpose, so there's no way it could be in use anywhere else. I've only had this happen with my Nvidia GTX 650Ti BOOST, so this might just be a driver bug (or a false positive by the validation layer?).

DominikWitczakAMD commented 5 years ago

Looking at Anvil's source code, I can't fathom what could possibly lead validation layers into thinking the cmd buffer is in flight..

It's quite unlikely this issue is vendor-specific, as the error is reported by core validation layer.

Are you using multiple threads? If so, it could be some sort of a threade race happening in the layer.

Are you only seeing this error when using mGPU devices? Device groups are not used very often, so chances are the layers might not be tracking cmd buffer usage for logical devices with >1 physical device.

I suppose the most productive way to make progress here would be by raising the problem @ https://github.com/KhronosGroup/Vulkan-ValidationLayers . As Anvil dev, I ran out of ideas what could be the culprit here.