Missed case where a swapchain is recreated. If a new swapchain becomes the low latency chain, inherit the device-global mode.
Add some extra debug logging when latency debug is enabled.
Fix some odd scenarios with internal frame latency handle + low latency. If GPU queue is deemed to be too deep, and low latency is enabled, don't try to sleep ourselves, as it seems to throw LL2 off, where it starts thinking it shouldn't sleep.
The heuristic here is pretty simple, but works well in practice. Query the blit timeline semaphore if number of requested submits is significantly larger than the completed counter, it means the GPU queue is deep.
There are some scenarios where we trigger latency limiter ourselves:
We are FIFO bound. Low latency (at least not NV's implementation of it) will generally not kick in here, so we should use latency fences. When GPU drains the queue faster than display can, complete counter will be pretty close to submitted counter. If drivers actually do FIFO aware low-latency at some point, then we'll never end up blocking in our internal wait handles anyway.
We are CPU bound. Irrelevant what we do since we are implicitly low latency at this point.
The heuristic here is pretty simple, but works well in practice. Query the blit timeline semaphore if number of requested submits is significantly larger than the completed counter, it means the GPU queue is deep.
There are some scenarios where we trigger latency limiter ourselves: