vkQuake does not handle VK_ERROR_OUT_OF_DATE_KHR

damienleone commented 7 years ago

Several Vulkan APIs can return that error, such as vkAcquireNextImageKHR() and vkQueuePresentKHR().

When this error is returned, the application is expected to teardown and re-create its swapchain.

Novum commented 7 years ago

On what platform/driver does this actually happen?

Novum commented 7 years ago

I'm not asking this because I don't want to fix it, it's just hard to make sure something works when I cannot reproduce it.

nref-dan commented 7 years ago

It should happen whenever you re-size/update the window in Windows.

VkResult acquireResult = vkAcquireNextImageKHR(
    logicalDevice,
    swapChain,
    (std::numeric_limits<std::uint64_t>::max)(),
    imageAvailableSemaphore,
    VK_NULL_HANDLE,
    &imageIndex);

if (VK_ERROR_OUT_OF_DATE_KHR == acquireResult || VK_SUBOPTIMAL_KHR == acquireResult) {
    recreateSwapchain();
    return;
}
else if (VK_SUCCESS != acquireResult) {
    logger->error("Failed to acquire next image from swap chain: {0}", acquireResult);
    throw std::runtime_error("failed to present the swapchain image");
}

If you are running single-threaded and you already re-create the swapchain and pipeline after resizing, then it's possible you won't ever see this error message under normal operation.

Novum commented 7 years ago

I get that. But I still can't test it (because it never happens on Windows on any driver) and I won't submit untested code.

damienleone commented 7 years ago

Sorry for taking so long to get back at you. I had to wait for our new drivers to be released so that you could reproduce the issue.

Using the NVIDIA 387.12 drivers on Linux, anytime vkquake enters/leaves fullscreen will cause this issue.

Examples:

Start vkquake windowed, enable fullscreen in the options
Start vkquake fullscreen, do an alt-tab

It will usually fail with: "QUAKE ERROR: Couldn't acquire next image"

Novum commented 7 years ago

The problem I'm always facing with this is, that there is already a CB that needs to complete somehow (otherwise fences will deadlock). I can't just easily recreate the swap chain in the middle of the frame, because there are render passes depending on it and other images have to have the same size as it etc.

The only option that would consistently work is to abort recording operations in the CB that affect the swap chain, but then it messes up the image layouts and you have to track that. It's a rabbit hole.

To be honest, I really don't like drivers returning VK_ERROR_OUT_OF_DATE_KHR. I might never support it. VK_SUBOPTIMAL_KHR is fine, because you can at least continue and recreate next frame before starting a new CB.

Blaming the apps by hand waving "they don't handle VK_ERROR_OUT_OF_DATE" isn't helpful btw. It's not just 5 lines of code as you maybe expect. If you realize in the middle of a frame that the memory was removed from under your feet, that's a big problem.

In vkQuake I might be able to implement correct handling, but if the app is multithreaded and has to deal with multiple command buffers this becomes even more impossible to reason about. You would have to discard and re-record command buffers after realizing that the acquire failed.

Novum commented 7 years ago

2e9ebbcc7f5070fa60bad976dcc94fb35c22479d should fix this, but again, I really don't like the implications of this error code.

sezero commented 7 years ago

2e9ebbc should fix this

gl_draw.c: In function `GL_Set2D': gl_draw.c:1020: warning: control reaches end of non-void function

A return true; statement needs adding to the end of GL_Set2D():


diff --git a/Quake/gl_draw.c b/Quake/gl_draw.c
index e26dabc..b60ddc9 100644
--- a/Quake/gl_draw.c
+++ b/Quake/gl_draw.c
@@ -1017,4 +1017,6 @@ qboolean GL_Set2D (void)

        vkCmdBeginRenderPass(vulkan_globals.command_buffer, &vulkan_globals.ui_render_pass_begin_info, VK_SUBPASS_CONTENTS_INLINE);
        render_pass_index = 1;
+
+       return true;
 }

Novum commented 7 years ago

Fixed d416c20cc48fc1d693785ff3247a500ebf9f1962

damienleone commented 7 years ago

Hi. Thanks for addressing this issue.

It appears that vkAcquireNextImageKHR() now checks for OUT_OF_DATE and SUBOPTIMAL, but not vkQueuePresentKHR(). Therefore, I was not able to confirm the fix.

cubanismo commented 7 years ago

@Novum The memory doesn't get pulled out from under the application. The images themselves and the memory backing them are still valid until the swapchain is destroyed. Presentation will fail, so you can't get them on screen, and acquiring will fail, so you can't get access to images currently held by the presentation engine, but the app is free to keep rendering to any images it has acquired as long as it needs to in order to clear its pipeline before re-creating the swapchain.

For future iterations of the Vulkan WSI extensions, it might be interesting to discuss whether AcquireNextImage() actually needs to return this error code. As I mention, the images should still be valid, so it's just a matter of whether we can guarantee the application can regain access to them without messing up some other portion of the window system, and ensure that works across all platforms and all corner cases. Might not be possible, but maybe it is, and I see why this case is harder to handle than QueuePresent() failing, so it's worth discussing. Another option for applications in the meantime if the cost of backing out an in-flight frame is too high when acquiring a presentation image fails is to add an extra image or two to the swapchain to and acquire it up-front. This doesn't add any latency or bubbles to the pipeline, but does consume a bit more memory.

We're not "hand-waving" or trying to confer blame when we point out issues in apps. We're pointing out a misuse of the Vulkan API that we realize hasn't been a problem so far and hence probably comes as a surprise. We're trying to work with you and other application developers to make fixes and hence maintain the integrity and purpose of the Vulkan spec: To keep Vulkan a minimal layer on top of the hardware, and in the case of WSI, underlying window system presentation support. I realize it's not a simple fix, but the alternative would have been an API where we couldn't give applications access to static handles for swapchain images at all and they would have had to re-build their command buffers from scratch on every frame. Like many things in Vulkan, this design was a compromise that allows applications to bake things in for the steady-state in return for enduring some pain during periods of setup and transition.

As usual feel, free to bring any issues/complaints/ideas you have for the Vulkan spec to the attention of Khronos by filing an issue against Vulkan-Docs @KhronosGroup or any other venue you prefer.

Novum commented 7 years ago

Thanks for the longer response. It doesn't really come as a surprise. I have seen this error before and actually choose to ignore it, because it's a mess to implement the correct handling on application side.

Rebuilding the CB every frame for final swap chain output is a way smaller problem than this. In fact, I have never found real world applications for prerecorded CBs anyways.

I tell you, this will cause way more problems down the line, because people choose to ignore this. It's basically DX9 "surface lost" all over again.

Novum commented 7 years ago

Anyway, closing this, we can continue the conversation, but it's resolved at least for vkQuake.

Novum commented 6 years ago

There were more lingering bugs that have been fixed with 652a092a4b78f7b2361e176e3554574488d654d8

Release 0.97.0 also includes all the changes.

Novum / vkQuake

vkQuake does not handle VK_ERROR_OUT_OF_DATE_KHR #114