KhronosGroup / Vulkan-ValidationLayers

Vulkan Validation Layers (VVL)
https://vulkan.lunarg.com/doc/sdk/latest/linux/khronos_validation_layer.html
Other
778 stars 407 forks source link

False positive vkQueueSubmit VUID-vkQueueSubmit-pCommandBuffers-00071 #8880

Closed fridenmf closed 1 week ago

fridenmf commented 1 week ago

Environment:

Describe the Issue

vkQueueSubmit trigger a false positive VUID-vkQueueSubmit-pCommandBuffers-00071 validation error:

Validation Error: [ VUID-vkQueueSubmit-pCommandBuffers-00071 ] | MessageID = 0x2e2f4d65 | vkQueueSubmit(): pSubmits[0].pCommandBuffers[0] VkCommandBuffer 0x1417c843360[CommandBuffer0] is already in use and is not marked for simultaneous use.

The Vulkan spec states: If any element of the pCommandBuffers member of any element of pSubmits was not recorded with the VK_COMMAND_BUFFER_USAGE_SIMULTANEOUS_USE_BIT, it must not be in the pending state (https://vulkan.lunarg.com/doc/view/1.3.296.0/windows/1.3-extensions/vkspec.html#VUID-vkQueueSubmit-pCommandBuffers-00071)

The application create one command buffer for each swapchain image outside the main loop, responsible for all rendering that swapchain index, and re-submits a command buffer each frame, choosing which one depending on the index acquired by vkAcquireNextImageKHR.

Each vkAcquireNextImageKHR is followed by a vkWaitForFences to make sure that the presentation engine is done presenting that particular image the previous time it was used for presentation, following the principles of the swapchain_recreation sample in Vulkan-Samples:

In this sample, a workaround is implemented where a fence signaled by vkAcquireNextImageKHR is used to determine when the previous present job involving the same image index has been completed. This is often much later than the point where the present resources can be freed.

The bug is present in all available Vulkan SDK installers tested since at least a year back:

Expected behavior

The command buffer should no longer be in use after waiting on the fence used with vkAcquireNextImageKHR in frame 3 (referring to Full_trace.txt under Additional context) since the previous present job involving the swapchain image 0 has been completed, and there should be no VUID-vkQueueSubmit-pCommandBuffers-00071 validation error.

Valid Usage ID

Validation Error: [ VUID-vkQueueSubmit-pCommandBuffers-00071 ] | MessageID = 0x2e2f4d65 | vkQueueSubmit(): pSubmits[0].pCommandBuffers[0] VkCommandBuffer 0x1417c843360[CommandBuffer0] is already in use and is not marked for simultaneous use.

The Vulkan spec states: If any element of the pCommandBuffers member of any element of pSubmits was not recorded with the VK_COMMAND_BUFFER_USAGE_SIMULTANEOUS_USE_BIT, it must not be in the pending state (https://vulkan.lunarg.com/doc/view/1.3.296.0/windows/1.3-extensions/vkspec.html#VUID-vkQueueSubmit-pCommandBuffers-00071)

Additional context

Trimmed trace showing parts of frame 0, frame 3, and the validation error during the frame 3 vkQueueSubmit: TLDR_trace.txt

Full trace: Full_trace.txt

fridenmf commented 1 week ago

Similarly I get a VUID-vkResetCommandPool-commandPool-00040 validation error if I re-record the command buffer instead of re-submitting it, same scenario as above (after re-acquire and wait fence on a previously used swapchain image index):

Validation Error: [ VUID-vkResetCommandPool-commandPool-00040 ] Object 0: handle = 0x1ca3802bce0, name = CommandBuffer0, type = VK_OBJECT_TYPE_COMMAND_BUFFER; Object 1: handle = 0xc4f3070000000021, type = VK_OBJECT_TYPE_COMMAND_POOL; | MessageID = 0xb53e2331 | vkResetCommandPool():  (VkCommandBuffer 0x1ca3802bce0[CommandBuffer0]) is in use.
The Vulkan spec states: All VkCommandBuffer objects allocated from commandPool must not be in the pending state (https://vulkan.lunarg.com/doc/view/1.3.296.0/windows/1.3-extensions/vkspec.html#VUID-vkResetCommandPool-commandPool-00040)

If I instead just create new command pools and command buffers each frame without destroying them, then I get past frame 3 but instead get a VUID-vkAcquireNextImageKHR-semaphore-01779 validation error when trying to re-signal a semaphore in vkAcquireNextImageKHR in frame 4, a semaphore that is no longer in use as of frame 3 where the swapchain image index for when it was last used was returned the second time:

Validation Error: [ VUID-vkAcquireNextImageKHR-semaphore-01779 ] Object 0: handle = 0xb9181f0000000029, type = VK_OBJECT_TYPE_SEMAPHORE; | MessageID = 0x5717e75b | vkAcquireNextImageKHR():  Semaphore must not have any pending operations.
The Vulkan spec states: If semaphore is not VK_NULL_HANDLE, it must not have any uncompleted signal or wait operations pending (https://vulkan.lunarg.com/doc/view/1.3.296.0/windows/1.3-extensions/vkspec.html#VUID-vkAcquireNextImageKHR-semaphore-01779)

Do you want a separate issues for these or will they be fixed here as well? They seem kind of related.

(I had to edit this comment a few times to get the details right for all three issues)

artem-lunarg commented 1 week ago

Do you want a separate issues for these or will they be fixed here as well? They seem kind of related.

Yes, I think it's related. Will try to investigate during this week.

artem-lunarg commented 1 week ago

If I instead just create new command pools and command buffers each frame without destroying them, then I get past frame 3 but instead get a VUID-vkAcquireNextImageKHR-semaphore-01779 validation error when trying to re-signal a semaphore in vkAcquireNextImageKHR in frame 4

That's a good experiment, by excluding command buffers from equation this might suggest it is the issue about tracking acquire semaphore. ~The above mentioned that semaphore was used in vkAcquireNextImageKHR in frame 4. In which frame and in which function this semaphore was used the last time before frame 4?~ Nevermind, I checked full API dump, it's possible we do not handle this kind of synchronization. Will try to come up with a test that reproduces this.

fridenmf commented 1 week ago

I made a trace of a run that excludes the command pool and command buffer by re-creating them every frame and without destructing them, if it helps to look through:

sem_trace.txt

My reasoning is:

The validation error:

Validation Error: [ VUID-vkAcquireNextImageKHR-semaphore-01779 ] Object 0: handle = 0x908683000000001d, type = VK_OBJECT_TYPE_SEMAPHORE; | MessageID = 0x5717e75b | vkAcquireNextImageKHR():  Semaphore must not have any pending operations.

The Vulkan spec states: If semaphore is not VK_NULL_HANDLE, it must not have any uncompleted signal or wait operations pending (https://vulkan.lunarg.com/doc/view/1.3.296.0/windows/1.3-extensions/vkspec.html#VUID-vkAcquireNextImageKHR-semaphore-01779)

... claims the semaphore with handle 0x908683000000001d has pending operations during frame 4 in vkAcquireNextImageKHR:

Thread 0, Frame 4, Time 172277 us:
vkAcquireNextImageKHR(device, swapchain, timeout, semaphore, fence, pImageIndex) returns VkResult VK_SUCCESS (0):
    device:                         VkDevice = 000001CD40CC0780
    swapchain:                      VkSwapchainKHR = D897D90000000016
    timeout:                        uint64_t = 18446744073709551615
    semaphore:                      VkSemaphore = 908683000000001D
    fence:                          VkFence = CFCDA0000000001E
    pImageIndex:                    uint32_t* = 1

The last time this was used was as a wait semaphore in frame 0 for a vkQueueSubmit call that wants to render into swapchain image 0:

Thread 0, Frame 0, Time 169358 us:
vkQueueSubmit(queue, submitCount, pSubmits, fence) returns VkResult VK_SUCCESS (0):
    queue:                          VkQueue = 000001CD3C937B60
    submitCount:                    uint32_t = 1
    pSubmits:                       const VkSubmitInfo* = 000000C01D35C220
        pSubmits[0]:                    const VkSubmitInfo = 000000C01D35C220:
            sType:                          VkStructureType = VK_STRUCTURE_TYPE_SUBMIT_INFO (4)
            pNext:                          const void* = NULL
            waitSemaphoreCount:             uint32_t = 1
            pWaitSemaphores:                const VkSemaphore* = 000001CD3E5166F8
                pWaitSemaphores[0]:             const VkSemaphore = 908683000000001D
            pWaitDstStageMask:              const VkPipelineStageFlags* = 000000C01D35C1F4
                pWaitDstStageMask[0]:           const VkPipelineStageFlags = 1024 (VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT)
            commandBufferCount:             uint32_t = 1
            pCommandBuffers:                const VkCommandBuffer* = 000001CD3E516B58
                pCommandBuffers[0]:             const VkCommandBuffer = 000001CD4AF17ED0
            signalSemaphoreCount:           uint32_t = 1
            pSignalSemaphores:              const VkSemaphore* = 000001CD3E516748
                pSignalSemaphores[0]:           const VkSemaphore = 301E6C0000000022
    fence:                          VkFence = 0000000000000000

Before that it was used as a signal semaphore in frame 0 for vkAcquireNextImageKHR of swapchain image index 0:

Thread 0, Frame 0, Time 168821 us:
vkAcquireNextImageKHR(device, swapchain, timeout, semaphore, fence, pImageIndex) returns VkResult VK_SUCCESS (0):
    device:                         VkDevice = 000001CD40CC0780
    swapchain:                      VkSwapchainKHR = D897D90000000016
    timeout:                        uint64_t = 18446744073709551615
    semaphore:                      VkSemaphore = 908683000000001D
    fence:                          VkFence = CFCDA0000000001E
    pImageIndex:                    uint32_t* = 0

Frame 3 has acquired swapchain image index 0 the second time, and waited on the fence for that vkAcquireNextImageKHR call:

Thread 0, Frame 3, Time 170693 us:
vkAcquireNextImageKHR(device, swapchain, timeout, semaphore, fence, pImageIndex) returns VkResult VK_SUCCESS (0):
    device:                         VkDevice = 000001CD40CC0780
    swapchain:                      VkSwapchainKHR = D897D90000000016
    timeout:                        uint64_t = 18446744073709551615
    semaphore:                      VkSemaphore = 3FBCD60000000028
    fence:                          VkFence = B9181F0000000029
    pImageIndex:                    uint32_t* = 0

Thread 0, Frame 3, Time 171381 us:
vkWaitForFences(device, fenceCount, pFences, waitAll, timeout) returns VkResult VK_SUCCESS (0):
    device:                         VkDevice = 000001CD40CC0780
    fenceCount:                     uint32_t = 1
    pFences:                        const VkFence* = 000001CD3E516F18
        pFences[0]:                     const VkFence = B9181F0000000029
    waitAll:                        VkBool32 = 1
    timeout:                        uint64_t = 18446744073709551615

This implies that the presentation engine does not have any ongoing jobs for swapchain image 0 as it was returned a second time. ... which implies that vkQueuePresentKHR of frame 0 is done and it's wait semaphore (301E6C0000000022) is signaled:

Thread 0, Frame 0, Time 169523 us:
vkQueuePresentKHR(queue, pPresentInfo) returns VkResult VK_SUCCESS (0):
    queue:                          VkQueue = 000001CD3C937B60
    pPresentInfo:                   const VkPresentInfoKHR* = 000000C01D35C1E0:
        sType:                          VkStructureType = VK_STRUCTURE_TYPE_PRESENT_INFO_KHR (1000001001)
        pNext:                          const void* = NULL
        waitSemaphoreCount:             uint32_t = 1
        pWaitSemaphores:                const VkSemaphore* = 000001CD3E516748
            pWaitSemaphores[0]:             const VkSemaphore = 301E6C0000000022
        swapchainCount:                 uint32_t = 1
        pSwapchains:                    const VkSwapchainKHR* = 000001CD48C2D300
            pSwapchains[0]:                 const VkSwapchainKHR = D897D90000000016
        pImageIndices:                  const uint32_t* = 000001CD4AE71CC8
            pImageIndices[0]:               const uint32_t = 0
        pResults:                       VkResult* = NULL

... which implies that vkQueueSubmit of in frame 0 is done and its wait semaphore (908683000000001D) is also signaled as that's a prerequisite for signaling 301E6C0000000022:

Thread 0, Frame 0, Time 169358 us:
vkQueueSubmit(queue, submitCount, pSubmits, fence) returns VkResult VK_SUCCESS (0):
    queue:                          VkQueue = 000001CD3C937B60
    submitCount:                    uint32_t = 1
    pSubmits:                       const VkSubmitInfo* = 000000C01D35C220
        pSubmits[0]:                    const VkSubmitInfo = 000000C01D35C220:
            sType:                          VkStructureType = VK_STRUCTURE_TYPE_SUBMIT_INFO (4)
            pNext:                          const void* = NULL
            waitSemaphoreCount:             uint32_t = 1
            pWaitSemaphores:                const VkSemaphore* = 000001CD3E5166F8
                pWaitSemaphores[0]:             const VkSemaphore = 908683000000001D
            pWaitDstStageMask:              const VkPipelineStageFlags* = 000000C01D35C1F4
                pWaitDstStageMask[0]:           const VkPipelineStageFlags = 1024 (VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT)
            commandBufferCount:             uint32_t = 1
            pCommandBuffers:                const VkCommandBuffer* = 000001CD3E516B58
                pCommandBuffers[0]:             const VkCommandBuffer = 000001CD4AF17ED0
            signalSemaphoreCount:           uint32_t = 1
            pSignalSemaphores:              const VkSemaphore* = 000001CD3E516748
                pSignalSemaphores[0]:           const VkSemaphore = 301E6C0000000022
    fence:                          VkFence = 0000000000000000

Thus 908683000000001D is no longer in use and should be able to be signaled by vkAcquireNextImageKHR in frame 4.

artem-lunarg commented 1 week ago

@fridenmf I just finished with a test that I guess repeats you original trace (I started with it so it's easy for me to follow that). If you think I missed something important in the test please let me know. My swapchain has two images. I get errors in the third frame (frame index = 2), that's when we start re-using resources for the first time. It reports error in AcquireNextImage:

Validation Error: [ VUID-vkAcquireNextImageKHR-semaphore-01779 ] Object 0: handle = 0xcfef35000000000a, type = VK_OBJECT_TYPE_SEMAPHORE; | MessageID = 0x5717e75b | vkAcquireNextImageKHR(): Semaphore must not have any pending operations. The Vulkan spec states: If semaphore is not VK_NULL_HANDLE, it must not have any uncompleted signal or wait operations pending (https://www.khronos.org/registry/vulkan/specs/1.3-extensions/html/vkspec.html#VUID-vkAcquireNextImageKHR-semaphore-01779)

TEST_F(PositiveWsi, Test) {
    AddSurfaceExtension();
    RETURN_IF_SKIP(Init());
    RETURN_IF_SKIP(InitSwapchain());
    const auto swapchain_images = m_swapchain.GetImages();
    for (auto image : swapchain_images) {
        SetImageLayoutPresentSrc(image);
    }

    std::vector<vkt::Semaphore> acquire_semaphores;
    std::vector<vkt::Fence> acquire_fences;
    std::vector<vkt::CommandBuffer> command_buffers;
    std::vector<vkt::Semaphore> submit_semaphores;
    for (size_t i = 0; i < swapchain_images.size(); i++) {
        acquire_semaphores.emplace_back(*m_device);
        acquire_fences.emplace_back(*m_device, VK_FENCE_CREATE_SIGNALED_BIT);
        command_buffers.emplace_back(*m_device, m_command_pool);
        submit_semaphores.emplace_back(*m_device);
    }

    const int frame_count = 10;
    for (int i = 0; i < frame_count; i++) {
        const uint32_t frame_index = uint32_t(i % swapchain_images.size());

        auto &acquire_semaphore = acquire_semaphores[frame_index];
        auto &acquire_fence = acquire_fences[frame_index];
        auto &command_buffer = command_buffers[frame_index];
        auto &submit_semaphore = submit_semaphores[frame_index];

        acquire_fence.Reset();
        uint32_t image_index = 0;
        vk::AcquireNextImageKHR(device(), m_swapchain, kWaitTimeout, acquire_semaphore, acquire_fence, &image_index);
        acquire_fence.Wait(kWaitTimeout);

        command_buffer.Begin();
        command_buffer.End();

        m_default_queue->Submit(command_buffer, vkt::Wait(acquire_semaphore), vkt::Signal(submit_semaphore));
        m_default_queue->Present(m_swapchain, image_index, submit_semaphore);
    }
    m_default_queue->Wait();
}
fridenmf commented 1 week ago

I don't think you are allowed to do this:

const uint32_t frame_index = uint32_t(i % swapchain_images.size());

auto &acquire_semaphore = acquire_semaphores[frame_index];

AcquireNextImageKHR is not guaranteed to return the indices in order, it could return 0,1,2,1,2,1,2,1,2, and you don't really know that the semaphores used for swapchain image index 0 is done until you get back index 0 from AcquireNextImageKHR.

Since you don't know which swapchain image index you're going to get until you have called AcquireNextImageKHR, you'd have to reuse a semaphore and fence from a pool of "unused semaphores and resources". Only once you've gotten index N from AcquireNextImageKHR, you can put resources back to that pool from swapchain index frames that you have tracked which resources you used them in.

fridenmf commented 1 week ago

Easiest way to modify your example is to add this outside the frame loop:

vkt::Semaphore unused_semaphore(*m_device);
vkt::Fence unused_fence(*m_device, VK_FENCE_CREATE_SIGNALED_BIT);

Replace this:

auto &acquire_semaphore = acquire_semaphores[frame_index];
auto &acquire_fence = acquire_fences[frame_index];

with this:

auto &acquire_semaphore = unused_semaphore;
auto &acquire_fence = unused_semaphore;

in your frame loop.

And after Present, swap in the unused resources:

std::swap(acquire_semaphores[image_index], acquire_semaphore);
std::swap(acquire_fences[image_index], acquire_fence);

Remove the frame_index variable, and index the other resources after acquire with image_index instead.

(I've not compiled this, just based it on your code example, but use it as a guide)

artem-lunarg commented 1 week ago

don't think you are allowed to do this:

Right, I was aware of this. Just in this test the order was correct and it reproduces the issue, so it's enough to have confirmation and work on solution. I will update the test based on your suggestions.

artem-lunarg commented 1 week ago

Updated the test (thanks for the guide!). Can get VUID-vkQueueSubmit-pCommandBuffers-00071.

TEST_F(PositiveWsi, Test) {
    TEST_DESCRIPTION("Acquire fence can be used to determine that submission from previous frame finished.");
    AddSurfaceExtension();
    RETURN_IF_SKIP(Init());
    RETURN_IF_SKIP(InitSwapchain());
    const auto swapchain_images = m_swapchain.GetImages();
    for (auto image : swapchain_images) {
        SetImageLayoutPresentSrc(image);
    }

    vkt::Semaphore unused_semaphore(*m_device);
    vkt::Fence unused_fence(*m_device, VK_FENCE_CREATE_SIGNALED_BIT);

    std::vector<vkt::Semaphore> acquire_semaphores;
    std::vector<vkt::Fence> acquire_fences;
    std::vector<vkt::CommandBuffer> command_buffers;
    std::vector<vkt::Semaphore> submit_semaphores;
    for (size_t i = 0; i < swapchain_images.size(); i++) {
        acquire_semaphores.emplace_back(*m_device);
        acquire_fences.emplace_back(*m_device, VK_FENCE_CREATE_SIGNALED_BIT);
        command_buffers.emplace_back(*m_device, m_command_pool);
        submit_semaphores.emplace_back(*m_device);
        command_buffers[i].Begin();
        command_buffers[i].End();
    }

    const int frame_count = 10;
    for (int i = 0; i < frame_count; i++) {
        auto &acquire_semaphore = unused_semaphore;
        auto &acquire_fence = unused_fence;

        acquire_fence.Reset();
        uint32_t image_index = 0;
        vk::AcquireNextImageKHR(device(), m_swapchain, kWaitTimeout, acquire_semaphore, acquire_fence, &image_index);
        acquire_fence.Wait(kWaitTimeout);

        auto &command_buffer = command_buffers[image_index];
        auto &submit_semaphore = submit_semaphores[image_index];

        m_default_queue->Submit(command_buffer, vkt::Wait(acquire_semaphore), vkt::Signal(submit_semaphore));
        m_default_queue->Present(m_swapchain, image_index, submit_semaphore);

        std::swap(acquire_semaphores[image_index], acquire_semaphore);
        std::swap(acquire_fences[image_index], acquire_fence);
    }
    m_default_queue->Wait();
}
fridenmf commented 1 week ago

Interesting, I would have expected a validation error that you need to reset your command buffer first (if VK_COMMAND_POOL_CREATE_RESET_COMMAND_BUFFER_BIT is set), or have a separate pool per swapchain image index and reset that, before you can try to re-record it in your frame loop?

edit: Saw your last edit, now it looks like my example!

artem-lunarg commented 1 week ago

Interesting, I would have expected a validation error that you need to reset your command buffer first

Yeah, that's because our test framework creates command pool with VK_COMMAND_POOL_CREATE_RESET_COMMAND_BUFFER_BIT flag

artem-lunarg commented 1 week ago

@fridenmf I can see what does not work in the current setup, but until this works you can use a slightly modified solution which works: because we wait on the fence immediately after AcquireNextImageKHR, a single fence is sufficient (instead one per swapchain), and then the code hits code path that handles synchronization correctly (tried this).

Here is modified test that shows this. We can reset fence immediately after wait, also no need to create fence in signaled state in this case. Acquire semaphore is also not important, since we wait on fence on the host before submit:

TEST_F(PositiveWsi, WorkaroundUntilFix) {
    TEST_DESCRIPTION("Acquire fence can be used to determine that submission from previous frame finished.");
    AddSurfaceExtension();
    RETURN_IF_SKIP(Init());
    RETURN_IF_SKIP(InitSwapchain());
    const auto swapchain_images = m_swapchain.GetImages();
    for (auto image : swapchain_images) {
        SetImageLayoutPresentSrc(image);
    }

    std::vector<vkt::CommandBuffer> command_buffers;
    std::vector<vkt::Semaphore> submit_semaphores;
    for (size_t i = 0; i < swapchain_images.size(); i++) {
        command_buffers.emplace_back(*m_device, m_command_pool);
        command_buffers[i].Begin();
        command_buffers[i].End();
        submit_semaphores.emplace_back(*m_device);
    }

    vkt::Fence fence(*m_device);

    const int frame_count = 10;
    for (int i = 0; i < frame_count; i++) {
        auto &acquire_fence = fence;

        uint32_t image_index = 0;
        vk::AcquireNextImageKHR(device(), m_swapchain, kWaitTimeout, VK_NULL_HANDLE, acquire_fence, &image_index);
        acquire_fence.Wait(kWaitTimeout);
        acquire_fence.Reset();

        auto &command_buffer = command_buffers[image_index];
        auto &submit_semaphore = submit_semaphores[image_index];

        m_default_queue->Submit(command_buffer, vkt::Signal(submit_semaphore));
        m_default_queue->Present(m_swapchain, image_index, submit_semaphore);
    }
    m_default_queue->Wait();
}
fridenmf commented 1 week ago

Thanks, this works as a workaround for now!

Interestingly enough though, creating the fence in a signaled state and then doing:

acquire_fence.Reset();
vk::AcquireNextImageKHR(device(), m_swapchain, kWaitTimeout, VK_NULL_HANDLE, acquire_fence, &image_index);
acquire_fence.Wait(kWaitTimeout);

Still triggers:

Validation Error: [ VUID-vkQueueSubmit-pCommandBuffers-00071 ] | MessageID = 0x2e2f4d65 | vkQueueSubmit(): pSubmits[0].pCommandBuffers[0] VkCommandBuffer 0x25034264b30[] is already in use and is not marked for simultaneous use.
The Vulkan spec states: If any element of the pCommandBuffers member of any element of pSubmits was not recorded with the VK_COMMAND_BUFFER_USAGE_SIMULTANEOUS_USE_BIT, it must not be in the pending state (https://vulkan.lunarg.com/doc/view/1.3.296.0/windows/1.3-extensions/vkspec.html#VUID-vkQueueSubmit-pCommandBuffers-00071)

But just changing the fence to be non signaled from creation and to your ordering:

vk::AcquireNextImageKHR(device(), m_swapchain, kWaitTimeout, VK_NULL_HANDLE, acquire_fence, &image_index);
acquire_fence.Wait(kWaitTimeout);
acquire_fence.Reset();

And there is no validation errors. I would argue that creating the fence in a signaled state and reseting before vkAcquireNextImageKHR should work as well?

artem-lunarg commented 1 week ago

And there is no validation errors. I would argue that creating the fence in a signaled state and reseting before vkAcquireNextImageKHR should work as well?

Yes