godotengine / godot

Godot Engine – Multi-platform 2D and 3D game engine
https://godotengine.org
MIT License
86.55k stars 19.28k forks source link

Forward Plus Renderer causes frameskips/jitter/judder/stutter, GPU frames aren't sorted correctly on Windows #84137

Open Cyangmou opened 8 months ago

Cyangmou commented 8 months ago

Godot version

4.1.3

System information

Windows 11, NVIDIA GeForce GTX 1070, i7-7700K CPU 4.20GHz, 1920x1080 60Hz IPS monitor & a 2560x1440 adaptive 100Hz monitor

Issue description

The forward plus renderer, causes jitters in movement. It's especially visible when V-Sync is switched on. This happens in 2D and 3D Those jitters are more or less pronounced, depending on which screen you use, they might not be very visible on a 60HZ screen but are still there.

!

when i talk about "jitter" i am talking about this effect happening in the video above or clearly observable in this older video here at 5, 8, 11 and 13 seconds: Video

The problem seems to be tha tthe renderer has a cache of 4 GPU frames and that the sorting of those frames is jumbled up. So basically by design we seem to have an reoccuring order of 0, 1, 3, 2, 3, ... 4, 5, 8, 7, 8... This means some frames don't get shown, others get shown twice and the judder is caused by a step back

Currently it's not clear if it's a problem with the way how the frames are put together, or if it's a driver related issue related to memory. It could be 2 bugs, as comment sin the thread show other bugs play into this

Steps to reproduce

Can this be circumvented?

No. It's a very critical and very deep sitting bug. With the forward plus renderer there is no way to circumvent this and it will happen on any hardware. THe higher the resolution, the less obvious the bug is, however it's always there. The forward plus renderer is simply broken and can't be used in the current state.

You could Use the gl_compatibility renderer, which doesn't have this problem, but this is not a good solution either.

Minimal reproduction project:

I made a simple project in which the character can only run left and right per key input. That's literally all the code we need to test the issue properly, the setup described above is key for making it visible.

Conclusion

  1. Only by enabling the gl_compatibility renderer the jitter seems to be solved. The visibility of the problem and the intensity the skips are happening is however different based on the settings.
  2. It happens in 2D and 3D
  3. Disabling V-Sync with the forward plus renderer makes issues less pronounced than with having V-Sync enabled...
  4. ...especially on a 60Hz screen on windows disabling the V-Sync leads to way less experienced jitter, it's still there though
  5. The problem happens with process function and with physics_process() function.

Hypothesis

I think the core of the problem lies with the forward plus renderer (There however might be additional problems with V_Sync, Camera or Physics)

Minimal reproduction project

Project Download This includes a small platformer project. Project settings need to be adjusted according to my tests to reproduce the issues.

thygrrr commented 8 months ago

Thanks for carrying out this testing! I recommend repeating the same tests with #80566. There are known issues with how the swapchain is managed in master, and that PR fixes this issue.

@Calinou I tested it and the PR version does NOT fix the jitter issue. But it's a good starting point for me to look into.

thygrrr commented 8 months ago

I have tried ruling out the swapchain being the problem (and the issue being perhaps in RenderingServerDefault) but there is exactly 1 render target being blitted, from exactly 1 render thread.

thygrrr commented 8 months ago

jittery-vsync-bug-repro.zip

Here's an update of the test project, now with the frame counter. It will also show a very obvious display if the engine frames would ever deviate from the ones counted by _process. (they never do, however)

thygrrr commented 8 months ago

Stab in the dark:

There's a super weird cache for render targets that I don't understand what it's for. (apparently it's pretty clutch...)

RID TextureStorage::RenderTarget::get_framebuffer() {
    // Note that if we're using an overridden color buffer, we're likely cycling through a texture chain.
    // this is where our framebuffer cache comes in clutch..

    if (msaa != RS::VIEWPORT_MSAA_DISABLED) {
        return FramebufferCacheRD::get_singleton()->get_cache_multiview(view_count, color_multisample, overridden.color.is_valid() ? overridden.color : color);
    } else {
        return FramebufferCacheRD::get_singleton()->get_cache_multiview(view_count, overridden.color.is_valid() ? overridden.color : color);
    }
}

...and there's also some resizing code that appears to overwrite the index/id of the target).

void TextureStorage::render_target_set_size(RID p_render_target, int p_width, int p_height, uint32_t p_view_count) {
    RenderTarget *rt = render_target_owner.get_or_null(p_render_target);
    ERR_FAIL_NULL(rt);
    if (rt->size.x != p_width || rt->size.y != p_height || rt->view_count != p_view_count) {
        rt->size.x = p_width;
        rt->size.y = p_height;
        rt->view_count = p_view_count;
        _update_render_target(rt);
    }
}

I'm checking if that hash function has a collision, and next will check if this weird resize code doesn't clobber some of our targets together.

Because I can actually see repeat hashes when I dump the resolved table entries for each draw to screen operation.

thygrrr commented 8 months ago

All that hashing seems excessive, but is fine; no collisions, no different behaviour for prime number offsets (was fearing that 0 was an issue).

There MAY be a relation somewhere (because the actual frame buffer index will often NOT be identical to the frame index), but the main theory still stands: 3 sets of indices to juggle semaphores, fences, frame buffers, command buffers, render targets, and swap chain images seems like a likely cause for the frame shuffle. I'm definitely at the end of my understanding of the engine, though. :)

darksylinc commented 8 months ago

My attention was drawn to this issue.

First I want to explain this:

  • FIRST index wrapped at 2 (FRAME_LAG)
  • SECOND index wraps at 4 (frame_count = swapchain_images + 1)
  • THIRD doesn't wrap, but is in [0..3[ (or minimum swapchain image count of Vulkan)

Yeah this is messed up and fixed by my PR. But some of that actually makes sense:

  1. Godot writes data to a ring buffer. When FRAME_LAG = 2, that means that while GPU is reading from [region 0] the CPU is writing to [region 1]. And then the CPU is supposed to wait until GPU is done reading from [region 0], so that the CPU can start writing to it, while the GPU moves over to read from [region 1]. If FRAME_LAG = 3, the same cycle happens but they keep looping in [region 0], [region 1] and [region 2].
    • Without my PR Godot is actually using x2 the amount of regions, so the cycle is [region 0] [region 1] [region 2] and [region 3] and never actually waits to see if the GPU is done. My PR fixes that.
  2. The number of swapchains is independent of this. Think FRAME_LAG = 2 as "number of kitchens" and swapchain_count = 3 as "number of pizzas generated". When the GPU is cooking a pizza in [kitchen 0], the CPU is filling [kitchen 1] with all the ingredients. Then the GPU puts the pizza on a tray, and goes to [kitchen 1] to bake another pizza. Then CPU refills [kitchen 0] with ingredients. If the delivery guy never arrives and swapchain_count = 3, up to 3 pizzas may pile up in the tray. If that happens then both the GPU and CPU must stop what they're doing until the delivery guy arrives and starts removing pizzas. If swapchain_count = 10, then up to 10 pizzas could accumulate and the GPU and CPU could still work with only 2 kitchens.

Potential data races

Everyone's been focusing on the swapchain but the issue may be somewhere completely else.

  1. MSAA is seriously broken on Godot. Work is being done to resolve this. Make sure all testing is done with MSAA disabled on both 2D and 3D. If the issue goes away, then this is it.
    • The problem is fundamentally a race condition / barrier issue. The GPU starts sampling from an MSAA texture before the previous pass is done rendering to it.
  2. CPU-wise Godot can call the render thread from anywhere at any time. I lost my mind when I found about this. Because of how CommandQueueMT::flush_if_pending works, there are no guarantees that commands are replayed in the exact order they were submitted. Make sure it never enters if (unlikely(command_mem.size() > 0))_flush();.
    • flush_if_pending is fundamentally flawed. If the render thread is doing some work and suddenly calls flush_if_pending and it wasn't empty, then it will start doing something else entirely that was supposed to be done later.
  3. CPU/GPU wise, I am waiting on the ARG (Acyclic Render Graph) to be ready to fix various GPU synchronization issues. A "simple" problem could be that we start rendering before the vkCmdBufferCopy finishes (or a vkCmdBufferCopy starts before the render pass that precedes it). Therefore we may render using data from a previous frame, or we start rendering with data that is supposed to be for next pass or frame.
    • Toggling the validation layer and selecting "Synchronization" in vkconfig is one way of checking for this, but we are already aware there are legit errors emitted by this layer (which could indeed explain this problem).
  4. Unrelated to Godot: NVIDIA has two presentation engines for Vulkan: Legacy and DXGI in Windows. Try toggling this option.
  5. Corruption: Some users keep reporting Godot stops working after a while. Specially on NVIDIA, but can happen on other systems. We haven't been able to repro this problem, but it strongly suggests a swapchain or renderpass is being deleted from the wrong thread.
Cyangmou commented 8 months ago

in the video "Frames are often skipped, shown out of order, or repeated"

We seem to move from frame 226 to 227 to 226 that's a hickup with forwards and backwards and forwards again. then we move from 230 to 233 which omits 2 frames, but we just used up 2 frames more in the mistake before. so 1 back, 3 forward. 235, 234, 235(1 back), 236, 239 (3 forward), 242, 243, 242 (1back), 245 (3 forward)

There are 4 frames in the chain.

And analyzing the skips in the video we have a consistent repeating pattern of 1 back, 3 forward in the steps. This at least seems to me like it's designed to be sorted that way and not an "accident" Maybe just the way a frame is inserted is before and not after... can't say which loop. But the visual pattern supports this.

darksylinc commented 8 months ago

I just had a thought: Could you repeat the video experiments but adding a visible counter Label?

i.e. the label gets incremented every frame so that it reads 0, 1, 2, 3, 4, 5, ...

I want to see their value when it stutters. If the value changes normally while stuttering, it means it's definitely not a presentation problem. If the value remains the same (when it should've increased) or goes back in time when stuttering then it doesn't mean anything definitive but it supports the theory it's a presentation problem.

Gnumaru commented 8 months ago

But he already did that. The label in the video is indeed a counter variable, not the result of Engine.get_process_frames() or Engine.get_frames_drawn(). It just so happens that the video was not recorded from frame 0.

darksylinc commented 8 months ago

D'oh. Let me check again then.

Update: OK I completely missed some posts, sorry about that. I will check this during the weekend, I'll try to repro and if successful analyze what's wrong with it.

Update 2: I was able to repro the bug on Linux AMD RADV and AMDVLK. On Godot 4.1.3 it ends up crashing really bad. On Godot 4.2.x on my PR it doesn't crash, but it is stuttering. I will look deeply into this on Saturday. Thank you for the repro!

Update 3: I'm using a 60fps camera and while the counter is consistent (ie. it never goes backwards in time), I can see some visible stutter on the cube movements. It's definitely worth researching (though that problem may be related to #82222). Furthermore that crash in Godot 4.1.3 looks highly suspicious and related (I suspect the crash got fixed but not the underlying issue).

thygrrr commented 8 months ago

For me, I see this is frame skip/shuffle on Windows 11 Pro with a Geforce 3080, various driver versions, both game ready and studio (I've seen this judder since I started with Godot in mid-September). I use a 120Hz display or 60 Hz display (2 monitors connected at the same time).

It happens on Godot 4.1.2 (haven't tested 4.1.3 yet) and any 4.2, including my own local builds here.

I used a 240/480 fps camera (my phone), there's obviously no frame shuffle in VSYNC_MAILBOX or VSYNC_DISABLED; but when there's this stutter, I do see the shuffle. I was able to see the shuffle right away in Davinci Resolve looking at individual "frames" in the video; but for the one I uploaded here, I also time-stretched it 10x to make it easier to count the frames.

My display is fairly fast, initially I doubted I could see this shuffle/frameskip issue, and it's impossible to get with a direct screen capture. It's hard to capture at 120 fps - I'll give it one more shot today though; Godot's recorder still lives in 1998 and allows 60 fps max, and it also "messes" (it's ok for its purpose) with delta time and vsync, so it doesn't help at all.

I have no reasonable explanation how these skips can happen. (I have some theories as discussed at length in this thread, plus some ideas about maybe semaphore reuse being the actual problem)

thygrrr commented 8 months ago

But he already did that. The label in the video is indeed a counter variable, not the result of Engine.get_process_frames() or Engine.get_frames_drawn(). It just so happens that the video was not recorded from frame 0.

The counter is a standalone variable, yes, but actually has the same value as Engine.get_process_frames() and Engine.get_frames_drawn() (I compared the values across all the stutters etc.)

It would also be a different category of bug if the game though it was a new frame, but the renderer didn't, or vice versa.

thygrrr commented 8 months ago

Here's a 120fps video, allegedly "lossless", taken with OBS. I just verified with Davinci that the frames are indeed out of order.

https://github.com/godotengine/godot/assets/8904620/65f077a4-9298-4a05-9571-fda477fbc84a

Example frames: 80848-80852 has a repeat (49) and a skip (51).

Notes:

Cyangmou commented 8 months ago

I think we should rename the thread once more because basically it seems to be confirmed with all the testing that it's not a "stutter" or "hickup", but a frame ordering issue and also not related to the refresh rate at all.

The video is not working, if you could edit it @thygrrr so it works that'd be lovely.

thygrrr commented 8 months ago

The video works for me on Chrome, you can maybe download it here

It's 120 fps video, that makes it difficult to view in lots of consumer applications. I'll try to make one that plays at 60 fps (same number of frames).

Cyangmou commented 8 months ago

Ah correct, doesn't play in firefox. In chrome it works. Maybe edit in a note to open it with chrome.

Seems to follow indeed the same pattern of repeat & skip we analyzed before. So great video and visualization, confirmation of what has been observed before.

thygrrr commented 8 months ago

It is even worse than was visible on my phone camera 😖 - there is frequent frame shuffling , skipping, and doubling. In full speed motion it becomes more apparent on the bigger skips (and the monitor's response time blurs many of the smaller frame missteps or duplicated frames), but I was also seeing a "copy" (not display ghosting) sometimes leading, sometimes following the cube on the monitor. That seems to come from phases of stable repeating "shuffled" states. (it's weird because with 3 swapchain images, even cabcabcab would not appear shuffled, so it's maybe something like cbcacbcacb)

Here is me going frame by frame through the first couple of frames in the video using Davinci Resolve (manually advancing each frame of the recording with cursor key - order and the motion of the timeline at the bottom matters, not precise "timing")

https://github.com/godotengine/godot/assets/8904620/49a8af27-6859-4423-b404-df69ed453625

I am more and more certain this has something to do with vkAcquireNextImageKHR (fpAcquireNextImageKHR) returning VKImages / "framebuffers" out of order at the discretion of the Vulkan implementation (this is correct and expected behaviour) and one of the index sets or cached frame buffers just looking at the wrong associated buffers, fences, or semaphores. But I was unable to find any obvious cause in about 20 hours of testing theories and poking at code so far, so I unfortunately have to leave this to the renderer experts. :)

Reproducibility is about 95%, (I would have said 100% but certain console output calls and sometimes just by itself, the bug will no longer occur - I cannot relibly reproduce that either though, so there's no "workaround" or "related setting" I could describe, other than wildly toggling monitors on and off or something - it's probably just how cranky the nvidia driver feels and how lucky the application is to get swapchain images in a particular order or something)

thygrrr commented 8 months ago

Umm, speaking of reproducibility

(spoiler: This is a mitigation, not a workaround)

I just found a driver setting that seems 70-90% related.

image

(this might merely mean that the Vulkan implementation is less prone to reorder the images in one vs. the other setting, or it's just a memory layout thing - but it MIGHT help, especially since the "broken" state seems to have something to do with resources or values initialized when Godot starts)

Calinou commented 8 months ago

Check the value of the Vulkan/OpenGL present method 3D setting for the Godot executable: https://github.com/godotengine/godot-proposals/issues/5692#issuecomment-1405829216 The NVIDIA driver can promote Vulkan/OpenGL apps to DXGI even if they don't use it, but this isn't done by default unless the app has a profile for it. This is how Vulkan-based games on Windows can use HDR.

Cyangmou commented 8 months ago

Tried to reproduce and in my case (NVIDIA GeForce GTX 1070) I only got the setting of 8bpc, nothing else and still have the judder.

However as I switched from standard to to Nvidia settings and restarted godot and played, the Judder did not appear for the first 10 seconds of testing, where it was running smooth. Then the same classical stutter/frameskip appeared.

This means maybe the setting is not the problem, but it felt like some kind of cache is running full for pictures.

thygrrr commented 8 months ago

I think the bpc setting only causes a different memory layout and slightly different blit speeds, it's also not 100%. So it may just be "something that makes the race condition less bad".

This 10bpc isn't 10bit HDR though AFAIK, it's pure color depth.

It's funny, I can keep one app that is jittering and one app that works smoothly on the same screen for a while. However, they will eventually approach a smooth state. (yes, that means a WORKAROUND to the judder issue is just running the game twice on my system)

(it will slowly "heal", with less and less stutters, almost to full recovery - it's still a bit cyclic, especially if the application loses focus)

This hints at some memory reorganization going on in the Driver.

Swap chain was Auto. image

If I set it to prefer Layered on DXGI Swapchain, I get the full judder intensity.

If I set it to prefer Native, I get MUCH LESS stutter, but it still happens occasionally, and it seems it's still the frame ordering issue. That is a decent workaround for my development work, because VSYNC_DISABLE and VSYNC_MAILBOX really melt my GPU because they render thousands of fps.

The remaining frame jumble is likely still an obnoxious bug but it isn't nearly as intense. (maybe the frame jumble is just too subtle, though - I'm starting to go blind from looking at pixel seams etc.)

thygrrr commented 8 months ago

https://github.com/godotengine/godot/assets/8904620/06e53ca6-eff4-466b-a049-68d9a16b1c24

With the settings discussed above, and also with the "OS Default Color Setting" (which are 8 bpc in windows display settings), there's very rare frame order issue, but there are still a lot of stuck and skipped frames. It just doesn't show as much becuase I was used to the much harsher effect before.

The stuck/held and skipped frames also are somewhat regular now.

Why is this still bad?

In case you wonder about the frame skips (isn't that normal?)

It's not. With VSYNC_ENABLED; there should never be a skipped frame. (once the swapchain is full, the game will just wait.)

VSYNC_ADAPTIVE may decide to skip if a fresher frame is already there.

And because the test game is simple, there should also never be a held frame (unless something in the engine is waiting on the wrong fence / semaphore)

Needless to say, under no circumstances should a previous frame be shown after one of its successors have been shown.

more theory...

My money is on memory layout affecting the timing beneficially, but not fixing the underlying issue, which is Godot likely using the swapchain images and their resources incorrectly when the images are returned in a different order by Vulkan (which is 100% expected according to spec).

There may also be a case that some of us see these frame issues, while others just can't see or perceive them:

Some people's cFFF is fairly high, average for humans is 35-40 Hz, in Uni I measured mine around 80 Hz, it can go up into the 90s for some people AFAIK. (not to be confused with the fusion threshold, that has been measured into the hundreds of Hertz for some humans, but that's less relevant for displays and rendering)

That means if the average human has abcababc, chances are they don't even notice the skipped frame c. I think the bug likely affects a lot of Godot users (anyone on Forward+)

Depending on what's the root cause here, this bug could also be related to observed TAA jitter some people complain about (I generally use MSAA; but not for the test program here, of course).

thygrrr commented 8 months ago

Working on my real game with a higher per-scene render load, I can definitely see judder and frame shuffling with Prefer Native for the swapchain, and with 8bpc color depth (so all the mitigations don't have enough of an impact anymore)

Judder is both in Godot editor and in the actual game.

https://github.com/godotengine/godot/assets/8904620/b46d988b-6ab8-4a13-9050-08056acc9cf9

Sorry for very blurry image, the scene is high contrast and it's dark, this is a 8x slow motion of smooth camera motion across a distant gas planet. It's visible the there is judder back and forth from the wrong frame ordering.

Confirmation of older Suspicion:

The judder is also present with VSYNC_MAILBOX; if I set a maximum allowed frame rate in the driver (e.g. 144 Hz) it can be seen rather clearly.

It's not visible with uncapped fps, it might be just the sheer number of frames thrown at that one mailbox slot and their relative temporal proximity to each other; whereas with a limited frame rate in the rough ballpark of the monitor framerate, shuffled frames are more apparent because the deltas per frame are higher.

Cyangmou commented 8 months ago

I updated the title of this bug, the opening description, added the most crucial video and our last findings in the first post of this bug report.

thygrrr commented 8 months ago

I updated the title of this bug, the opening description, added the most crucial video and our last findings in the first post of this bug report.

Thanks! By the way, your example with the insects seemingly also shows frame shuffle (it's .webm, so I can only do this with the E hotkey in VLC, not in my NLE which doesn't support the format without re-encoding):

https://github.com/godotengine/godot/assets/8904620/1cf1e3cb-8681-4547-b482-6720fea640a9

I also noticed that if I set a maximum frame rate in the Nvidia Control Panel for Background Application, the GPU usage goes down. (expected!) When the game is in the foreground (with Mailbox), the GPU is pretty much saturated at 97% (also expected!)

But if I set a MAXIMUM frame rate in Godot, then that minimum frame rate setting from the driver is ignored, and the GPU is always at the same level of load (~27%, because of course it renders far fewer frames).

The judder feels a bit different but it's there for sure (giving some more credence to the assumption that the judder is also there with uncapped VSYNC_MAILBOX or VSYNC_DISABLED)

To rule out a multi-threading race condition (despite being single threaded), I verified that VulkanContext::swap_buffers is indeed always called by the same thread.

darksylinc commented 7 months ago

Unable to repro

I said in my previous comment that was able to repro.

However I spoke too soon. After a thorough research, I traced the problem to my two Linux monitors having the options of TearFree on.

This was causing a consistent jitter every couple of seconds which is not caused by Godot, and is also present in other apps. After I disabled it, your sample was smooth.

Crash in Godot 4.1.x is unrelated

Your sample was getting heavily corrupted and then crashing after a few seconds. I traced this problem and found it was fixed in Fix dangling pointers in _clean_up_swap_chain.

The problem is that in my specific system, when switching VSYNC modes to Mailbox, fpGetSwapchainImagesKHR returns 5 while swapchainImageCount = 3. Hence this fails:

if (swapchainImageCount == 0) {
    // Assign here for the first time.
    swapchainImageCount = sp_image_count;
} else {
    ERR_FAIL_COND_V(swapchainImageCount != sp_image_count, ERR_BUG); // <-- fail condition is triggered
}

So the routine _update_swap_chain is unable to properly recreate the swapchain and ends up crashing. The fix in 4.2.x fixed it because now swapchainImageCount is correctly reset to 0.

Definitions

Stutter: Movement that is uneven and doesn't feel smooth. e.g. instead of taking 1 step forward every frame, sometimes it takes more or 0. instead of: 11111111 we end up with 1120113101. But it never goes backwards in time.

Shuffling: When a previous frame is presented, which makes it go backwards in time. For example if frames should have been presented as: 200, 201, 202; then we end up with 200, 202, 201. Because shuffling feels "stuttery", we'll exclusively refer to shuffling when frames are presented out of order and we will avoid using the word "sutter".

Tests

For the sake of the tests I set smoothing to 0, so that the cube below would stutter as much as possible. I wanted to compare the "rawest" form possible.

Linux

My system specs are:

AMD Ryzen 9 5900X 32GB RAM
AMD Radeon RX 6800 XT 16GB
RADV driver

For framerate visualization I used MangoHud.

Here's a video of it running:

https://github.com/godotengine/godot/assets/3395130/af99d50c-5fe6-49c0-a46f-7e17dbd0e721

It doesn't look like there is anything to fix here. The stutter on the bottom box is to be expected given that I disabled smoothing (and the way it is calcualted) and the framerate has a small bump at the exact moment.

Windows

My system specs are:

Windows 10 22H2 19045.3570
Intel i7 7700 32GB RAM
NVIDIA GeForce GTX 1060 3GB (plugged to a real monitor)
AMD Radeon RX 560 2GB (plugged to a real monitor)
Intel HD Graphics 630 (not plugged, but active)

For framerate visualization I used MSI Afterburner with RTSS server.

The following video was captured running on NVIDIA GeForce GTX 1060 3GB while observed from the monitor plugged to NVIDIA:

https://github.com/godotengine/godot/assets/3395130/9248611f-a5f5-4ca4-878e-4ac0adce9fee

How can you help diagnose this problem

Since I was unable to even reproduce this issue, I need you who are able to repro this problem to do the following:

Post your full OS version

For example mine is Windows 10 22H2 19045.3570. I want to overrule the chance this is OS version specific.

Upload vulkaninfo.log

  1. Download the Vulkan SDK and install it.
  2. In the installation directory (usually C:\VulkanSDK) you will find the app "vulkaninfo.exe".
  3. Run from a cmd prompt: vulkaninfo.exe > vulkaninfo.log.
  4. Compress vulkaninfo.log and upload it here.

Upload a GPUView capture

GPUView is an extremely powerful tool for debugging timing and presentation.

These are the instructions from Microsoft.

  1. Download and install the Windows ADK (Assessment and Deployment Kit)
  2. GPUView should be in C:\Program Files (x86)\Windows Kits\10\\Windows Performance Toolkit\gpuview\log.cmd
  3. Close apps irrelevant to this test. I don't want them interferring with the measure.
  4. Launch the Godot sample that has stutter and/or shuffling.
  5. Open a cmd prompt w/ admin privileges and navigate to C:\Program Files (x86)\Windows Kits\10\Windows Performance Toolkit\gpuview\
  6. Run Log.cmd
    • IMPORTANT: If your OS does not use the English locale, you will have to manually edit the Log.cmd script as I explain in my blog. Otherwise the script will fail with a "4000" error.
  7. Wait until the shuffling manifest or heavy stutter begins.
  8. As fast as possible as that happens, run Log.cmd again. This will save the capture to disk.
    • The reason it needs to be as fast as possible is to make it much easier to identify the messed up commands by going to the end of the capture. If it's in the middle, it can take time until we find it (if at all).
  9. Compress it and upload it. Please tell me if it contains shuffling or just stutter.
  10. If you think and are worried about the file containing sensitive information you can send it to me to info@yosoygames.com.ar (if it's a big file you may have to use a sharing platform like Dropbox, MEGA, etc).
  11. You can of course analyze the capture yourself in GPUView and see if you can make sense of it; run multiple captures, etc.

Other things to try

Toggle HPET (High Precision Event Timer)

Time going backwards in time would explain shuffled presentation, which may itself be explained by broken motherboards (this happens a lot!!!).

If HPET was enabled, you should try disabling it. If HPET was disabled, you should try enabling it.

This Youtube video shows how to toggle HPET on Windows 10 (you must reboot after doing it):

REM Stock HPET
bcdedit /deletevalue useplatformclock
REM  Disable HPET
bcdedit /set useplatformclock false
REM Enable HPET
bcdedit /set useplatformclock true

IMPORTANT: HPET needs to be enabled on the Bios. This is usually found in Chipset -> High Precision Timer:

image

Run LatencyMon

LatencyMon is a useful tool figuring out if something is seriously wrong with your system.

Running it may shed some light if it happens to find something.

darksylinc commented 7 months ago

YESSS!!! SUCCESSFULLY REPRODUCED

After a couple suggestions from @reduz I managed to repro some stutter but the repro steps are insane and it reeks of driver bug, but I will research further. Since I don't have any monitor > 60hz but I have a monitor that has HDR support (actually it's a lie because the panel is cheap but as long as the GPU believes it):

Stutter only

Set Vulkan Presentation to "Legacy" in NVIDIA control panel.

  1. Connect monitor 1 to NVIDIA, 60Hz, 8bpp. Make this one the main monitor (important)
  2. Connect monitor 2 to NVIDIA, 50hz (not 60), 12bpp, tell Windows 10 to use HDR (important).
  3. Start demo.
  4. Move the window to Monitor 2, and maximize it.
  5. So far everything is smooth.
  6. Start moving the mouse cursor. Everything starts going stuttery (but it doesn't shuffle) because it misses quite a lot of frames.

The same repros if I tweak it a little:

  1. Connect monitor 1 to NVIDIA, 60Hz, 8bpp.
  2. Connect monitor 2 to NVIDIA, 50hz (not 60), 12bpp, tell Windows 10 to use HDR (important). Make this one the main monitor (important)
  3. Start demo.
  4. Move the window to Monitor 1, and maximize it.
  5. It is always stuttery (but it doesn't shuffle) as it misses roughly 1 frame every 2 frames.

So it seems the conditions to trigger are:

  1. Window must be running on the non-main monitor.
  2. Both monitors must be at different Hz.
  3. At least one monitor must be in HDR mode, the other in SDR (it is unknown if both monitors can be in HDR and still would repro)

Shuffe too

Exact same as Stutter:

Set Vulkan Presentation to "DXGI" in NVIDIA control panel (Important!!).

  1. Connect monitor 1 to NVIDIA, 60Hz, 8bpp.
  2. Connect monitor 2 to NVIDIA, 50hz (not 60), 12bpp, tell Windows 10 to use HDR (important). Make this one the main monitor (important)
  3. Start demo.
  4. Move the window to Monitor 1, and maximize it.
  5. It shuffles.

So it seems the conditions to trigger are:

  1. Window must be running on the non-main monitor.
  2. Both monitors must be at different Hz.
  3. At least one monitor must be in HDR mode, the other in SDR (it is unknown if both monitors can be in HDR and still would repro).
  4. Presentation model must be DXGI.
darksylinc commented 7 months ago

Quick Update

After fixing various issues in Godot, the bug remained. I tried a different Vulkan app (my own, from OgreNext) and it exhibitted the same bug.

The chances of this happening to two (quite different) Vulkan apps and being a an app bug is almost 0. This looks like a driver bug.

I am preparing a repro for NVIDIA now (not now, I am really tired; I've been at this for 19hs during the weekend). I will send it over the week.

thygrrr commented 7 months ago

Regarding the HPET stuff:

Time itself is not running backwards:

I still believe it's the engine inadvertently reusing the wrong frame or command buffer.

The chances of this happening to two (quite different) Vulkan apps and being a an app bug is almost 0. This looks like a driver bug.

You see shuffle in another app? Extremely interesting!

Can you provide the ogre-based test application for me to run a repro as well? Because I do not see the problem with other Vulkan renderers (i.e. Unity, and some games). I don't have the time to write a full test suite in other frameworks right now, very sorry. :(

I have done some extensive web searching and there's no indication that this issue affects non-godot developers. Otherwise, thousands of gamers would be talking about it. (this issue has existed for at least 2 months)

Amazing list of repro factors to twiddle around with, thank you for looing into this!

I think they may still vary from system to system, but it's good to know different "constellations" of settings. I certainly did not use HDR, just 10bpc color depth, to see the strongest shuffle behaviour. I still get an occasional shuffle frame even with what currently works best.

"Window must be running on the non-main monitor." <-- I can falsify this. :) I tested even changing the main monitor around.

"Both monitors must be at different Hz." <-- this is LIKELY a factor (windows compositor is single-refresh or something), however, I can see the shuffle with just 1 monitor attached, at 120 Hz. However, your great research prompted me to check both for HPET setting (however, I think my system is rather well setup, all things considered, and I just updated bios yesterday); and what it does if Windows started up with just 1 screen. (edit: 1 screen still shows jitter, sometimes the others, sometimes the app jitters for a while; and HDR on or off seems unrelated to the jittering)

All in all, of course I'd prefer this to be a driver bug, so I could hate on team green instead of my favourite game engine.

thygrrr commented 7 months ago

Here's a tweaked version of the repro that:

jitter-frame-repro-longer.zip

darksylinc commented 7 months ago

@thygrrr Here's the OgreNext demo that managed to trigger the same bug: OgreNextRepro.zip

Video: https://github.com/godotengine/godot/assets/3395130/ae4e20fd-f6bd-4b7f-809c-38f3651b457e

I have done some extensive web searching and there's no indication that this issue affects non-godot developers. Otherwise, thousands of gamers would be talking about it. (this issue has existed for at least 2 months)

Because:

  1. Vulkan games are not that common (many are D3D12 or D3D11)
  2. It happens to you easily but it happens to me too after I hammered REALLY hard trying to reproduce it. Many gamers run on the standard single monitor experience (or both monitors with the exact same setting).
    • I've used heterogeneous monitors in the past and it was always the source of pain. Gamers may have experienced the bug before, noticed it goes away if they unplug one of their monitors and attribute the problem to one of the many issues they're used to.
  3. Godot makes the repro really easy, but OgreNext not so much. It happens randomly. In the video I'm uploading, I was lucky that it reproduced like crazy, but it took me several tries (also resizing the window or moving it back and forth between monitors helps). Because there are many times where my OgreNext demo goes butter smooth (or maybe it shuffles every once in a while, but small enough to think it's an app bug).
A-Lamia commented 7 months ago

I'm having this issue on forward + it takes a couple of tries to get it to happen but no problems on compatibility.

Specs

OS: Windows 10 GPU: AMD 6750XT. Driver settings: Default. Monitor: 60hz Godot: v4.2-B6

Project Description

Fresh 4.2-B6 project, no settings that should impact this issue changed. 2 assets a tile map and the player. 2 scenes the player and the level. 1 addon, godot-4-importality. (Imports aseprite files)

test_stage.tscn: image

player.tscn: image

Image of outliner: image

Bug Detail

The bug takes a couple of tries to reproduce. Had to record with phone, when recording from desktop the issue is not present as in not just on the video but in real time as well. (yes the flashing is apart of the issue).

Evidence

Forward + Bug:

https://github.com/godotengine/godot/assets/6450181/9249ecaf-6c51-452d-a6eb-02836e57cebc

Forward + No Bug:

https://github.com/godotengine/godot/assets/6450181/4122e1d4-cc68-4cb8-8d63-5650022a578a

darksylinc commented 7 months ago

The fact that it happens on an AMD GPU is deeply disturbing. I was wondering the possibility of this being a Windows bug. Do you have one monitor or multiple monitors plugged in @A-lamia?

Is MSAA enabled on your project?

On the other hand, I did fix a few issues that could explain it (i.e. there's the chance it is both a Godot and NV bug). The repro I sent to NV had that bug fixed, but I was planning on working more on that tomorrow so it can be submitted as PR.

A-Lamia commented 7 months ago

@darksylinc No MSAA. 2 monitors exact same models.

darksylinc commented 7 months ago

Thanks. One more question: I posted my own app here https://github.com/godotengine/godot/issues/84137#issuecomment-1811544904

Does the problem with that app reproduce for you?

A-Lamia commented 7 months ago

@darksylinc no i ran it a bunch of times i don't have any issues.

Calinou commented 7 months ago

This issue may be related to https://github.com/godotengine/godot/issues/80941 (which is difficult to reproduce because the Steam Deck has its own Windows drivers).

A-Lamia commented 7 months ago

This issue may be related to #80941 (which is difficult to reproduce because the Steam Deck has its own Windows drivers).

Could be, though i don't have issues in the editor it's like a 1 in 5 chance that the bug happens when i run a scene.

Assuming that the steam deck uses the same sort of drivers as desktops.

darksylinc commented 7 months ago

AMD results

After upgrading drivers to 23.11.1 (from 22.11.2) I was able to repro @A-Lamia behavior and found something extremely interesting. Suddenly some of these reports (including those in other tickets) start to make sense.

The rig is: AMD Ryzen 5900X 32GB AMD Radeon HD 6800 XT 16GB

What I found when using Godot 4.1.3 is that the screen would sometimes flash black and shuffling would occur if I hover the mouse cursor over the Minimize, Maximize and Close buttons until the tooltip appears. Additionally, Godot would sometimes shuffle when switching VSync modes (though shuffling when switching VSync would be reasonably acceptable, but I suspect it's another symptom of the same bug):

Shuffle when switching VSync: https://github.com/godotengine/godot/assets/3395130/430d20fe-9388-4843-ba44-0321dfaf1bb5

Shuffle when hovering over the min/max/close buttons plus tooltips: https://github.com/godotengine/godot/assets/3395130/5484fe32-b3da-49f6-a67c-29efd746fac5

The good news is that when I tried my custom build (which is based on #80566 plus a few more fixes I did as an attempt to fix it on NVIDIA) none of these problems manifested on AMD.

I haven't yet pinpointed why my build fixes the problem, as it could be because of #80566, the new fixes, or because of an older swapchain fix I submitted in #80571.

Why does this explain several reported tickets? Because clearly this variant of the bug (the one in AMD where there is black screen flashing and shuffling, not the one in NV) mostly (but not only) appears when the tooltip overlay is drawn on top of Godot's window. In some users' computers, simply a 3rd party program or specific driver could be causing the same (but invisible) events that trigger this bug. This can easily explain why some users constantly have this problem while others are unable to repro.

Question for @A-Lamia : I have uploaded a custom version of my Godot build here. Could you tell me if the bug is still present if you try to run your project using that exe?

darksylinc commented 7 months ago

Update

After thorough testing, I believe #82768, #80941, #81795 and this bug are all the same bug.

My current theory given how recent these reports are is that Godot is triggering a bug in dwm.exe (Windows' compositor), probably introduced recently via Windows Update, and it probably has to do with how Godot handles the window proc in WNDCLASSEX::lpfnWndProc.

An easy way to trigger this bug on AMD RDNA2 with a single monitor is to launch the flicker test demo a lot of times (e.g. 10 instances if necessary, more if you have to) until a few of them or many start to flicker to black or shuffle like crazy. Moving the window or maximizing increases the chances of triggering the bug.

You can monitor all windows to see which one flickers using Super + Tab

I suspect it's a dwm.exe bug because in GPUView it can be easily seen that Godot started late after VSync (because of dwm.exe) yet it delivered its work on time with lots of time to spare, but dwm.exe waited far too long for the next present and missed the vblank.

It is clearly visible in GPUView's timeline that dwm.exe's workload is off and does not look how it's supposed to look for a workload that is running at the monitor's frequency..

Also since this bug triggered on NVIDIA on ogre-next (but it's very rare), indicates it's not a Godot-specific issue. I tried to do the same tricks with a Unity sample and ended up with a TDR. So, something is seriously messed up and it seems Godot was just unlucky to trigger this issue more frequently without doing much (there's still the chance Godot is doing things wrong though), but it seems to affect every app.

I'm still trying to gut Godot's DisplayServerWindows::WndProc to try to pinpoint what is triggering this bug.

Oh, I forgot to mention this bug also happens with my PR (even with all the fixes). It is much harder to trigger, but the bug is still triggered if you try hard enough.

A-Lamia commented 7 months ago

@darksylinc yeap i was still able to get the bug took 7 attempts.

HybridEidolon commented 7 months ago

I have a feeling this is connected to the symptoms I describe in #85547.

The good news is that when I tried my custom build (which is based on #80566 plus a few more fixes I did as an attempt to fix it on NVIDIA) none of these problems manifested on AMD.

@darksylinc The flickering window shadows I describe in that issue are also not present in your build (compared to 4.2 stable), using the same GPU.

LimestaX commented 6 months ago

2 different users having this issue w/ VSYNC enabled and both of us use 5700XT graphics cards, forward+, vulkan (on mine, not sure on theirs) https://www.youtube.com/watch?v=06wlTIDRx3U&t=4s

A-Lamia commented 6 months ago

I have been playing a game called halls of torment and got this bug and immediately went to google to find it was made in godot.

TestSubject06 commented 6 months ago

2 different users having this issue w/ VSYNC enabled and both of us use 5600XT graphics cards, forward+, vulkan (on mine, not sure on theirs) https://www.youtube.com/watch?v=06wlTIDRx3U&t=4s

Oh hey, I was wondering where that recording went. Yes I'm also forward+ vulkan on a 5700 XT with VSync enabled.

LimestaX commented 6 months ago

Sorry 5700XT, and yeah I was following up on this bug and I figured we'd want it tagged.

On Thu, Dec 21, 2023, 6:31 PM TestSubject06 @.***> wrote:

2 different users having this issue w/ VSYNC enabled and both of us use 5600XT graphics cards, forward+, vulkan (on mine, not sure on theirs) https://www.youtube.com/watch?v=06wlTIDRx3U&t=4s

Oh hey, I was wondering where that recording went. Yes I'm also forward+ vulkan on a 5700 XT with VSync enabled.

— Reply to this email directly, view it on GitHub https://github.com/godotengine/godot/issues/84137#issuecomment-1867050117, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKYWUIVYY3EMT6III2XYAM3YKTBEJAVCNFSM6AAAAAA6UYGWNKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRXGA2TAMJRG4 . You are receiving this because you commented.Message ID: @.***>

HybridEidolon commented 4 months ago

The minimal reproduction does not appear to have the issue as of 3be3d50 (#87340).

vpellen commented 4 months ago

My Vulkan vsync stutters seem similarly fixed by the above, although it's worth noting I had to go into my NVidia control panel and set "Vulkan/OpenGL present method" to "Prefer layered on DXGI Swapchain", which probably makes sense to somebody smarter than me.

DarioSamo commented 4 months ago

My Vulkan vsync stutters seem similarly fixed by the above, although it's worth noting I had to go into my NVidia control panel and set "Vulkan/OpenGL present method" to "Prefer layered on DXGI Swapchain", which probably makes sense to somebody smarter than me.

This is not that odd of a fix, I've been considering that we could get much more consistent behavior if we used a DXGI swap chain instead even when using Vulkan. It'd get lower latency and a more consistent presentation. That control panel option does essentially do that for you at the low level AFAIK.

Calinou commented 4 months ago

I've been considering that we could get much more consistent behavior if we used a DXGI swap chain instead even when using Vulkan. It'd get lower latency and a more consistent presentation.

See https://github.com/godotengine/godot-proposals/issues/5692 where I originally proposed this. It would also allow for HDR output to be implemented in Vulkan-based rendering methods, as DXGI is the only way to achieve HDR output on Windows.

Using DXGI directly also allows NvTrueHDR to work on Vulkan apps without requiring changes in the NVIDIA Control Panel to force layered DXGI presentation.