Overv / VulkanTutorial

Tutorial for the Vulkan graphics and compute API
https://vulkan-tutorial.com
Creative Commons Attribution Share Alike 4.0 International
3.08k stars 513 forks source link

imagesInFlight implementation frequently does not achieve the MAX_FRAMES_IN_FLIGHT #226

Open DavidEGrayson opened 3 years ago

DavidEGrayson commented 3 years ago

Hello. I am very new to Vulkan, but I was reading the Frames in flight section and I think I have found something that could be improved. (Thanks for the really useful tutorial, by the way! It already helped me immensely to get my first triangle drawn and understand what I'm doing.)

The tutorial makes it sound like we want to submit multiple frames at once when possible in order to increase the frame rate of our application. So it defines MAX_FRAMES_IN_FLIGHT as 2.

On my laptop, which has Windows 10 and an Nvidia GPU, the code in this tutorial ends up using 3 swapchain images, and the order of images returned by vkAcquireNextImageKHR seems to be 0, 1, 2, 0, 1, 2, etc. By design, after the tutorial submits frame 0 for the second time, the next image it acquires is image 0, and so it has to wait for imagesInFlight[0], which happens to be the same fence that it just submitted frame 0 with. So at that time, it ends up waiting for frame 0 to be fully done before it attempts to submit frame 1, which defeats the point of trying to maintain 2 frames in flight.

I compiled 15_hello_triangle.cpp with some extra debugging code to analyze what is going on with the fences and prove that the problem really does happen. (I compiled it in MSYS2 with 64-bit MinGW GCC and linked against the Vulkan loader from VulkanSDK 1.2.162.0.) Here is the output I got, which shows how the problem happens, step by step:

Frame 0: waited for fence 6269300000000015, acquired image 0, waited for fence 0000000000000000, submitted with fence 6269300000000015
Frame 1: waited for fence 892cdd0000000018, acquired image 1, waited for fence 0000000000000000, submitted with fence 892cdd0000000018
Frame 0: waited for fence 6269300000000015, acquired image 2, waited for fence 0000000000000000, submitted with fence 6269300000000015
Frame 1: waited for fence 892cdd0000000018, acquired image 0, waited for fence 6269300000000015, submitted with fence 892cdd0000000018

If we look carefully at the last 2 lines, we see that frame 0 was submitted with fence 62..., but then we had to wait for that same fence before submitting frame 1. The same type of thing actually happens on every single frame submission after that one too, so there is almost never more than 1 frame in flight.

Before I propose solutions, does this seem like an issue to anyone else with more Vulkan experience?

jstroh commented 2 years ago

I agree! I’ve been staring at this bit of code for a couple days.

https://github.com/SaschaWillems/Vulkan/blob/07211790038d0ab0d97058a8d0fb08df0d171fac/examples/triangle/triangle.cpp#L338

This makes more sense to me. Have an array of fences equal to the number of swapchain images and wait after acquiring an image.

skalldri commented 2 years ago

@DavidEGrayson @jstroh I agree with both your conclusions! I believe this chunk of code:

// Check if a previous frame is using this image (i.e. there is its fence to wait on)
if (imagesInFlight[imageIndex] != VK_NULL_HANDLE) {
    vkWaitForFences(device, 1, &imagesInFlight[imageIndex], VK_TRUE, UINT64_MAX);
}
// Mark the image as now being in use by this frame
imagesInFlight[imageIndex] = inFlightFences[currentFrame];

is incorrect. It can be completely removed from the rendering engine and things continue to operate as expected (I've tested this experimentally as well, and see no problems in my rendering engine).

Consider the following pseudocode.

Let Frame be an object that represents a set of imageAvailableSemaphore, renderFinishedSemaphore, and an inFlightFence. With MAX_FRAMES_IN_FLIGHT = 2, we have two instances of the Frame object. Let us also assume that our Vulkan implementation wants us to use at least 2 images in our swapchain, so in our example we will allocate 3 swapchain framebuffers and 3 command buffers. Our imagesInFlight variable also contains 3 slots to track the state of each of the 3 framebuffer + command buffer pairs:

Frame frame0, frame1;
VkCommandBuffer commandBuffers[3] = ...;
Frame* imagesInFlight[3] = {NULL}

Our initial sequence of frames looks like this:

// No wait, imagesInFlight[0] == NULL
vkQueueSubmit(commandBuffers[0]);
imagesInFlight[0] = &frame0;

// No wait, imagesInFlight[1] == NULL
vkQueueSubmit(commandBuffers[1]);
imagesInFlight[1] = &frame1;

// No wait, imagesInFlight[2] == NULL
vkQueueSubmit(commandBuffers[2]);
imagesInFlight[2] = &frame0;

waitFor(frame0); // imagesInFlight[0] == frame0
// We didn't need to wait: commandBuffer[0] cannot be in use anymore, since we submitted frame0 with commandBuffer[2] after submitting with commandBuffer[0], and we waited on the frame0 fence before submitting commandBuffer[2].
vkQueueSubmit(commandBuffers[0]);
imagesInFlight[0] = &frame1;

waitFor(frame1); // imagesInFlight[1] == frame1
// We didn't need to wait: commandBuffer[1] cannot be in use anymore, since we submitted frame1 with commandBuffer[2] after submitting with commandBuffer[1], and we waited on the frame1 fence before submitting commandBuffer[1].
vkQueueSubmit(commandBuffers[1]);
imagesInFlight[1] = &frame0;

waitFor(frame0); // imagesInFlight[2] == frame0
// We didn't need to wait: commandBuffer[2] cannot be in use anymore, since we submitted frame0 with commandBuffer[1] after submitting with commandBuffer[2], and we waited on the frame0 fence before submitting commandBuffer[2].
vkQueueSubmit(commandBuffers[2]);
imagesInFlight[2] = &frame1;

// Etc...

Ultimately, this logic seems to be replicating what is accomplished by the imageAvailableSemaphores[]: we instruct the rendering pipeline that it needs to wait for the imageAvailableSemaphore to be signaled before starting to execute the command buffers. That semaphore is raised automatically when the image is available for use: we don't need to synchronize it externally. All we need to do is ensure we don't exceed our MAX_FRAMES_IN_FLIGHT value when submitting work, which is accomplished by checking that inFlightFences[currentFrame] has posted before submitting work for this frame.