andy-thomason / Vookoo

A set of utilities for taking the pain out of Vulkan in header only modern C++
MIT License
521 stars 52 forks source link

Make the vku::framework and vku::Window more flexible #30

Open FunMiles opened 4 years ago

FunMiles commented 4 years ago

I realize that the vku_framework.hpp was intended as a way to easily get started with the examples. However, I've been using it in a new project in which I want to be able to run more than one Vulkan driven window. Doing so, I discovered some weaknesses that for the most part can be remedied with a fairly limited amount of work. I am sure I will discover more issues to be worked on but here are the first things that come to mind:

As of now, it is the first point that is important to my use case and I have already tested the wrapper solution. Is that something of interest to other users?

andy-thomason commented 4 years ago

I'm happy for you to customise it as you wish. Real use cases will always drive software development. Best of all, develop a game engine or 3D application.

FunMiles commented 4 years ago

@andy-thomason I am definitely using the framework for my new code. I am discovering quite a few things to adapt and generalize. By the way, I had stated that I had a working threading but eventually discovered it was buggy. This led me to stackoverflow where a patient soul helped figure out the issue. I will be submitting a pull request with a threaded demo and will update all the demos to cache the queue, rather than asking for it on each draw call. It will take me some time though, as I need to address the framework singleton aspect. I am also working on generalizing the buffer system to easily adapt/use the popular Vulkan Memory Allocator without forcing to use it as I'm sure @lhog wouldn't like that! ;P .

FunMiles commented 4 years ago

After discovering the issue for the queue and threading, I am left wondering what approach to take. Here are my notes:

So the question becomes: (:heavy_check_mark: indicates current choices)

Comments/opinions are welcome.

Something totally off-topic but that probably will benefit from multi-threaded rendering: I discovered ImGui and have started exploring it with Vulkan for the rendering. I think I may do an open-source demo app for image processing using Vookoo for rendering and ImGui on top for the GUI.

FunMiles commented 4 years ago

I have a fairly complete system for synchronized queue access at https://github.com/FunMiles/Vookoo/tree/lock_guard_queues There is one example opening two windows: 06-parallelTriangles. I am ready to do a PR. I have two pending PRs that have not been merged into the mainline though. Should merge them into my code and have a single full PR?

andy-thomason commented 4 years ago

That looks very good. The example is relatively short, which is very good.

It is worth timing the calls with rdtsc as mutexes can be expensive if implemented badly. At best they are a spin lock on an atomic boolean - at worst an O/S call. One of the downfalls of OpenGL was the need for a mutex on every call.

On Thu, Apr 23, 2020 at 4:04 PM FunMiles notifications@github.com wrote:

I have a fairly complete system for synchronized queue access at https://github.com/FunMiles/Vookoo/tree/lock_guard_queues There is one example opening two windows: 00-parallelTriangles. There is no cleanup for closing the windows. I need to do that before submitting a PR.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/andy-thomason/Vookoo/issues/30#issuecomment-618448863, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAL36XEWG44BF227WG7R7NLROBKHTANCNFSM4MKEOL5A .

FunMiles commented 4 years ago

I do not have experience measuring mutex costs on Windows yet, but I've dealt with them a lot on Unix based system and they are cheap. The important thing to remember is that they should cost only if there is contention. And a good implementation should spin-lock for a bit before going to the OS. The Vulkan layers for example actually use mutexes to synchronize access to maps that record access to the Vulkan objects from various threads. They did two things to optimize:

One thing you may notice in the last update I made is that rather than submit the static and dynamic command buffers to the queue separately, I've merged them into a single submission to reduce the number of lock/unlock of the queue mutex.

On my Mac, there are three queue families, all capable of doing graphics/compute/transfer. Each only has one queue at most. One could try to better pick queue families/queues and try to create multiple queue to assign to various cases to minimize contention likelihood. A round-robin distribution maybe?

FunMiles commented 4 years ago

I have started doing rdtsc timings and on the Mac. Preliminary timings. They are a measurement of the whole timing for a call to the queue submit routine. Not a timing of the cost of just the mutex itself as this is something fairly ill-defined since a mutex can only be acquired when it has been released by any other user.

The Mac timings are very strange. I may have to raise that with the MoltenVK development team. Some other messages about swapchain image request misuse is puzzling me when i replace the argument to submitKHR with a pointer rather than the reference, I get:

00000008 debugCallback: [ VUID-vkAcquireNextImageKHR-swapchain-01802 ] Object: 0x40000000004 (Type = 27) | vkAcquireNextImageKHR: Application has already previously acquired 3 images from swapchain. Only 2 are available to be acquired using a timeout of UINT64_MAX (given the swapchain has 3, and VkSurfaceCapabilitiesKHR::minImageCount is 2). The Vulkan spec states: If the number of currently acquired images is greater than the difference between the number of images in swapchain and the value of VkSurfaceCapabilitiesKHR::minImageCount as returned by a call to vkGetPhysicalDeviceSurfaceCapabilities2KHR with the surface used to create swapchain, timeout must not be UINT64_MAX (https://www.khronos.org/registry/vulkan/specs/1.1-extensions/html/vkspec.html#VUID-vkAcquireNextImageKHR-swapchain-01802)