Open Cyangmou opened 8 months ago
Thanks for carrying out this testing! I recommend repeating the same tests with #80566. There are known issues with how the swapchain is managed in
master
, and that PR fixes this issue.
@Calinou I tested it and the PR version does NOT fix the jitter issue. But it's a good starting point for me to look into.
I have tried ruling out the swapchain being the problem (and the issue being perhaps in RenderingServerDefault) but there is exactly 1 render target being blitted, from exactly 1 render thread.
Here's an update of the test project, now with the frame counter. It will also show a very obvious display if the engine frames would ever deviate from the ones counted by _process. (they never do, however)
Stab in the dark:
There's a super weird cache for render targets that I don't understand what it's for. (apparently it's pretty clutch...)
RID TextureStorage::RenderTarget::get_framebuffer() {
// Note that if we're using an overridden color buffer, we're likely cycling through a texture chain.
// this is where our framebuffer cache comes in clutch..
if (msaa != RS::VIEWPORT_MSAA_DISABLED) {
return FramebufferCacheRD::get_singleton()->get_cache_multiview(view_count, color_multisample, overridden.color.is_valid() ? overridden.color : color);
} else {
return FramebufferCacheRD::get_singleton()->get_cache_multiview(view_count, overridden.color.is_valid() ? overridden.color : color);
}
}
...and there's also some resizing code that appears to overwrite the index/id of the target).
void TextureStorage::render_target_set_size(RID p_render_target, int p_width, int p_height, uint32_t p_view_count) {
RenderTarget *rt = render_target_owner.get_or_null(p_render_target);
ERR_FAIL_NULL(rt);
if (rt->size.x != p_width || rt->size.y != p_height || rt->view_count != p_view_count) {
rt->size.x = p_width;
rt->size.y = p_height;
rt->view_count = p_view_count;
_update_render_target(rt);
}
}
I'm checking if that hash function has a collision, and next will check if this weird resize code doesn't clobber some of our targets together.
Because I can actually see repeat hashes when I dump the resolved table entries for each draw to screen operation.
All that hashing seems excessive, but is fine; no collisions, no different behaviour for prime number offsets (was fearing that 0 was an issue).
There MAY be a relation somewhere (because the actual frame buffer index will often NOT be identical to the frame index), but the main theory still stands: 3 sets of indices to juggle semaphores, fences, frame buffers, command buffers, render targets, and swap chain images seems like a likely cause for the frame shuffle. I'm definitely at the end of my understanding of the engine, though. :)
My attention was drawn to this issue.
First I want to explain this:
- FIRST index wrapped at 2 (FRAME_LAG)
- SECOND index wraps at 4 (frame_count = swapchain_images + 1)
- THIRD doesn't wrap, but is in [0..3[ (or minimum swapchain image count of Vulkan)
Yeah this is messed up and fixed by my PR. But some of that actually makes sense:
FRAME_LAG = 2
, that means that while GPU is reading from [region 0] the CPU is writing to [region 1]. And then the CPU is supposed to wait until GPU is done reading from [region 0], so that the CPU can start writing to it, while the GPU moves over to read from [region 1]. If FRAME_LAG = 3
, the same cycle happens but they keep looping in [region 0], [region 1] and [region 2].
Everyone's been focusing on the swapchain but the issue may be somewhere completely else.
CommandQueueMT::flush_if_pending
works, there are no guarantees that commands are replayed in the exact order they were submitted. Make sure it never enters if (unlikely(command_mem.size() > 0))_flush();
.
in the video "Frames are often skipped, shown out of order, or repeated"
We seem to move from frame 226 to 227 to 226 that's a hickup with forwards and backwards and forwards again. then we move from 230 to 233 which omits 2 frames, but we just used up 2 frames more in the mistake before. so 1 back, 3 forward. 235, 234, 235(1 back), 236, 239 (3 forward), 242, 243, 242 (1back), 245 (3 forward)
There are 4 frames in the chain.
And analyzing the skips in the video we have a consistent repeating pattern of 1 back, 3 forward in the steps. This at least seems to me like it's designed to be sorted that way and not an "accident" Maybe just the way a frame is inserted is before and not after... can't say which loop. But the visual pattern supports this.
I just had a thought: Could you repeat the video experiments but adding a visible counter Label?
i.e. the label gets incremented every frame so that it reads 0, 1, 2, 3, 4, 5, ...
I want to see their value when it stutters. If the value changes normally while stuttering, it means it's definitely not a presentation problem. If the value remains the same (when it should've increased) or goes back in time when stuttering then it doesn't mean anything definitive but it supports the theory it's a presentation problem.
But he already did that. The label in the video is indeed a counter variable, not the result of Engine.get_process_frames() or Engine.get_frames_drawn(). It just so happens that the video was not recorded from frame 0.
D'oh. Let me check again then.
Update: OK I completely missed some posts, sorry about that. I will check this during the weekend, I'll try to repro and if successful analyze what's wrong with it.
Update 2: I was able to repro the bug on Linux AMD RADV and AMDVLK. On Godot 4.1.3 it ends up crashing really bad. On Godot 4.2.x on my PR it doesn't crash, but it is stuttering. I will look deeply into this on Saturday. Thank you for the repro!
Update 3: I'm using a 60fps camera and while the counter is consistent (ie. it never goes backwards in time), I can see some visible stutter on the cube movements. It's definitely worth researching (though that problem may be related to #82222). Furthermore that crash in Godot 4.1.3 looks highly suspicious and related (I suspect the crash got fixed but not the underlying issue).
For me, I see this is frame skip/shuffle on Windows 11 Pro with a Geforce 3080, various driver versions, both game ready and studio (I've seen this judder since I started with Godot in mid-September). I use a 120Hz display or 60 Hz display (2 monitors connected at the same time).
It happens on Godot 4.1.2 (haven't tested 4.1.3 yet) and any 4.2, including my own local builds here.
I used a 240/480 fps camera (my phone), there's obviously no frame shuffle in VSYNC_MAILBOX or VSYNC_DISABLED; but when there's this stutter, I do see the shuffle. I was able to see the shuffle right away in Davinci Resolve looking at individual "frames" in the video; but for the one I uploaded here, I also time-stretched it 10x to make it easier to count the frames.
My display is fairly fast, initially I doubted I could see this shuffle/frameskip issue, and it's impossible to get with a direct screen capture. It's hard to capture at 120 fps - I'll give it one more shot today though; Godot's recorder still lives in 1998 and allows 60 fps max, and it also "messes" (it's ok for its purpose) with delta time and vsync, so it doesn't help at all.
I have no reasonable explanation how these skips can happen. (I have some theories as discussed at length in this thread, plus some ideas about maybe semaphore reuse being the actual problem)
But he already did that. The label in the video is indeed a counter variable, not the result of Engine.get_process_frames() or Engine.get_frames_drawn(). It just so happens that the video was not recorded from frame 0.
The counter is a standalone variable, yes, but actually has the same value as Engine.get_process_frames() and Engine.get_frames_drawn() (I compared the values across all the stutters etc.)
It would also be a different category of bug if the game though it was a new frame, but the renderer didn't, or vice versa.
Here's a 120fps video, allegedly "lossless", taken with OBS. I just verified with Davinci that the frames are indeed out of order.
https://github.com/godotengine/godot/assets/8904620/65f077a4-9298-4a05-9571-fda477fbc84a
Example frames: 80848-80852 has a repeat (49) and a skip (51).
Notes:
I think we should rename the thread once more because basically it seems to be confirmed with all the testing that it's not a "stutter" or "hickup", but a frame ordering issue and also not related to the refresh rate at all.
The video is not working, if you could edit it @thygrrr so it works that'd be lovely.
The video works for me on Chrome, you can maybe download it here
It's 120 fps video, that makes it difficult to view in lots of consumer applications. I'll try to make one that plays at 60 fps (same number of frames).
Ah correct, doesn't play in firefox. In chrome it works. Maybe edit in a note to open it with chrome.
Seems to follow indeed the same pattern of repeat & skip we analyzed before. So great video and visualization, confirmation of what has been observed before.
It is even worse than was visible on my phone camera 😖 - there is frequent frame shuffling , skipping, and doubling. In full speed motion it becomes more apparent on the bigger skips (and the monitor's response time blurs many of the smaller frame missteps or duplicated frames), but I was also seeing a "copy" (not display ghosting) sometimes leading, sometimes following the cube on the monitor. That seems to come from phases of stable repeating "shuffled" states. (it's weird because with 3 swapchain images, even cabcabcab would not appear shuffled, so it's maybe something like cbcacbcacb)
Here is me going frame by frame through the first couple of frames in the video using Davinci Resolve (manually advancing each frame of the recording with cursor key - order and the motion of the timeline at the bottom matters, not precise "timing")
https://github.com/godotengine/godot/assets/8904620/49a8af27-6859-4423-b404-df69ed453625
I am more and more certain this has something to do with vkAcquireNextImageKHR
(fpAcquireNextImageKHR
) returning VKImages / "framebuffers" out of order at the discretion of the Vulkan implementation (this is correct and expected behaviour) and one of the index sets or cached frame buffers just looking at the wrong associated buffers, fences, or semaphores. But I was unable to find any obvious cause in about 20 hours of testing theories and poking at code so far, so I unfortunately have to leave this to the renderer experts. :)
Reproducibility is about 95%, (I would have said 100% but certain console output calls and sometimes just by itself, the bug will no longer occur - I cannot relibly reproduce that either though, so there's no "workaround" or "related setting" I could describe, other than wildly toggling monitors on and off or something - it's probably just how cranky the nvidia driver feels and how lucky the application is to get swapchain images in a particular order or something)
Umm, speaking of reproducibility
(spoiler: This is a mitigation, not a workaround)
(this might merely mean that the Vulkan implementation is less prone to reorder the images in one vs. the other setting, or it's just a memory layout thing - but it MIGHT help, especially since the "broken" state seems to have something to do with resources or values initialized when Godot starts)
Check the value of the Vulkan/OpenGL present method 3D setting for the Godot executable: https://github.com/godotengine/godot-proposals/issues/5692#issuecomment-1405829216 The NVIDIA driver can promote Vulkan/OpenGL apps to DXGI even if they don't use it, but this isn't done by default unless the app has a profile for it. This is how Vulkan-based games on Windows can use HDR.
Tried to reproduce and in my case (NVIDIA GeForce GTX 1070) I only got the setting of 8bpc, nothing else and still have the judder.
However as I switched from standard to to Nvidia settings and restarted godot and played, the Judder did not appear for the first 10 seconds of testing, where it was running smooth. Then the same classical stutter/frameskip appeared.
This means maybe the setting is not the problem, but it felt like some kind of cache is running full for pictures.
I think the bpc setting only causes a different memory layout and slightly different blit speeds, it's also not 100%. So it may just be "something that makes the race condition less bad".
This 10bpc isn't 10bit HDR though AFAIK, it's pure color depth.
It's funny, I can keep one app that is jittering and one app that works smoothly on the same screen for a while. However, they will eventually approach a smooth state. (yes, that means a WORKAROUND to the judder issue is just running the game twice on my system)
(it will slowly "heal", with less and less stutters, almost to full recovery - it's still a bit cyclic, especially if the application loses focus)
This hints at some memory reorganization going on in the Driver.
Swap chain was Auto.
If I set it to prefer Layered on DXGI Swapchain
, I get the full judder intensity.
If I set it to prefer Native
, I get MUCH LESS stutter, but it still happens occasionally, and it seems it's still the frame ordering issue. That is a decent workaround for my development work, because VSYNC_DISABLE and VSYNC_MAILBOX really melt my GPU because they render thousands of fps.
The remaining frame jumble is likely still an obnoxious bug but it isn't nearly as intense. (maybe the frame jumble is just too subtle, though - I'm starting to go blind from looking at pixel seams etc.)
https://github.com/godotengine/godot/assets/8904620/06e53ca6-eff4-466b-a049-68d9a16b1c24
With the settings discussed above, and also with the "OS Default Color Setting" (which are 8 bpc in windows display settings), there's very rare frame order issue, but there are still a lot of stuck and skipped frames. It just doesn't show as much becuase I was used to the much harsher effect before.
The stuck/held and skipped frames also are somewhat regular now.
In case you wonder about the frame skips (isn't that normal?)
It's not. With VSYNC_ENABLED; there should never be a skipped frame. (once the swapchain is full, the game will just wait.)
VSYNC_ADAPTIVE may decide to skip if a fresher frame is already there.
And because the test game is simple, there should also never be a held frame (unless something in the engine is waiting on the wrong fence / semaphore)
Needless to say, under no circumstances should a previous frame be shown after one of its successors have been shown.
My money is on memory layout affecting the timing beneficially, but not fixing the underlying issue, which is Godot likely using the swapchain images and their resources incorrectly when the images are returned in a different order by Vulkan (which is 100% expected according to spec).
There may also be a case that some of us see these frame issues, while others just can't see or perceive them:
Some people's cFFF is fairly high, average for humans is 35-40 Hz, in Uni I measured mine around 80 Hz, it can go up into the 90s for some people AFAIK. (not to be confused with the fusion threshold, that has been measured into the hundreds of Hertz for some humans, but that's less relevant for displays and rendering)
That means if the average human has abcababc, chances are they don't even notice the skipped frame c. I think the bug likely affects a lot of Godot users (anyone on Forward+)
Depending on what's the root cause here, this bug could also be related to observed TAA jitter some people complain about (I generally use MSAA; but not for the test program here, of course).
Working on my real game with a higher per-scene render load, I can definitely see judder and frame shuffling with Prefer Native
for the swapchain, and with 8bpc color depth (so all the mitigations don't have enough of an impact anymore)
Judder is both in Godot editor and in the actual game.
https://github.com/godotengine/godot/assets/8904620/b46d988b-6ab8-4a13-9050-08056acc9cf9
Sorry for very blurry image, the scene is high contrast and it's dark, this is a 8x slow motion of smooth camera motion across a distant gas planet. It's visible the there is judder back and forth from the wrong frame ordering.
The judder is also present with VSYNC_MAILBOX; if I set a maximum allowed frame rate in the driver (e.g. 144 Hz) it can be seen rather clearly.
It's not visible with uncapped fps, it might be just the sheer number of frames thrown at that one mailbox slot and their relative temporal proximity to each other; whereas with a limited frame rate in the rough ballpark of the monitor framerate, shuffled frames are more apparent because the deltas per frame are higher.
I updated the title of this bug, the opening description, added the most crucial video and our last findings in the first post of this bug report.
I updated the title of this bug, the opening description, added the most crucial video and our last findings in the first post of this bug report.
Thanks! By the way, your example with the insects seemingly also shows frame shuffle (it's .webm, so I can only do this with the E hotkey in VLC, not in my NLE which doesn't support the format without re-encoding):
https://github.com/godotengine/godot/assets/8904620/1cf1e3cb-8681-4547-b482-6720fea640a9
I also noticed that if I set a maximum frame rate in the Nvidia Control Panel for Background Application, the GPU usage goes down. (expected!) When the game is in the foreground (with Mailbox), the GPU is pretty much saturated at 97% (also expected!)
But if I set a MAXIMUM frame rate in Godot, then that minimum frame rate setting from the driver is ignored, and the GPU is always at the same level of load (~27%, because of course it renders far fewer frames).
The judder feels a bit different but it's there for sure (giving some more credence to the assumption that the judder is also there with uncapped VSYNC_MAILBOX or VSYNC_DISABLED)
To rule out a multi-threading race condition (despite being single threaded), I verified that VulkanContext::swap_buffers
is indeed always called by the same thread.
I said in my previous comment that was able to repro.
However I spoke too soon. After a thorough research, I traced the problem to my two Linux monitors having the options of TearFree
on
.
This was causing a consistent jitter every couple of seconds which is not caused by Godot, and is also present in other apps. After I disabled it, your sample was smooth.
Your sample was getting heavily corrupted and then crashing after a few seconds. I traced this problem and found it was fixed in Fix dangling pointers in _clean_up_swap_chain.
The problem is that in my specific system, when switching VSYNC modes to Mailbox, fpGetSwapchainImagesKHR returns 5 while swapchainImageCount = 3. Hence this fails:
if (swapchainImageCount == 0) {
// Assign here for the first time.
swapchainImageCount = sp_image_count;
} else {
ERR_FAIL_COND_V(swapchainImageCount != sp_image_count, ERR_BUG); // <-- fail condition is triggered
}
So the routine _update_swap_chain
is unable to properly recreate the swapchain and ends up crashing.
The fix in 4.2.x fixed it because now swapchainImageCount is correctly reset to 0.
Stutter: Movement that is uneven and doesn't feel smooth. e.g. instead of taking 1 step forward every frame, sometimes it takes more or 0. instead of: 11111111 we end up with 1120113101. But it never goes backwards in time.
Shuffling: When a previous frame is presented, which makes it go backwards in time. For example if frames should have been presented as: 200, 201, 202; then we end up with 200, 202, 201. Because shuffling feels "stuttery", we'll exclusively refer to shuffling when frames are presented out of order and we will avoid using the word "sutter".
For the sake of the tests I set smoothing
to 0, so that the cube below would stutter as much as possible. I wanted to compare the "rawest" form possible.
My system specs are:
AMD Ryzen 9 5900X 32GB RAM
AMD Radeon RX 6800 XT 16GB
RADV driver
For framerate visualization I used MangoHud.
Here's a video of it running:
https://github.com/godotengine/godot/assets/3395130/af99d50c-5fe6-49c0-a46f-7e17dbd0e721
It doesn't look like there is anything to fix here. The stutter on the bottom box is to be expected given that I disabled smoothing (and the way it is calcualted) and the framerate has a small bump at the exact moment.
My system specs are:
Windows 10 22H2 19045.3570
Intel i7 7700 32GB RAM
NVIDIA GeForce GTX 1060 3GB (plugged to a real monitor)
AMD Radeon RX 560 2GB (plugged to a real monitor)
Intel HD Graphics 630 (not plugged, but active)
For framerate visualization I used MSI Afterburner with RTSS server.
The following video was captured running on NVIDIA GeForce GTX 1060 3GB while observed from the monitor plugged to NVIDIA:
https://github.com/godotengine/godot/assets/3395130/9248611f-a5f5-4ca4-878e-4ac0adce9fee
Since I was unable to even reproduce this issue, I need you who are able to repro this problem to do the following:
For example mine is Windows 10 22H2 19045.3570
. I want to overrule the chance this is OS version specific.
vulkaninfo.exe > vulkaninfo.log
.GPUView is an extremely powerful tool for debugging timing and presentation.
These are the instructions from Microsoft.
C:\Program Files (x86)\Windows Kits\10\\Windows Performance Toolkit\gpuview\log.cmd
C:\Program Files (x86)\Windows Kits\10\Windows Performance Toolkit\gpuview\
Log.cmd
Log.cmd
again. This will save the capture to disk.
info@yosoygames.com.ar
(if it's a big file you may have to use a sharing platform like Dropbox, MEGA, etc).Time going backwards in time would explain shuffled presentation, which may itself be explained by broken motherboards (this happens a lot!!!).
If HPET was enabled, you should try disabling it. If HPET was disabled, you should try enabling it.
This Youtube video shows how to toggle HPET on Windows 10 (you must reboot after doing it):
REM Stock HPET
bcdedit /deletevalue useplatformclock
REM Disable HPET
bcdedit /set useplatformclock false
REM Enable HPET
bcdedit /set useplatformclock true
IMPORTANT: HPET needs to be enabled on the Bios. This is usually found in Chipset -> High Precision Timer:
LatencyMon is a useful tool figuring out if something is seriously wrong with your system.
Running it may shed some light if it happens to find something.
After a couple suggestions from @reduz I managed to repro some stutter but the repro steps are insane and it reeks of driver bug, but I will research further. Since I don't have any monitor > 60hz but I have a monitor that has HDR support (actually it's a lie because the panel is cheap but as long as the GPU believes it):
Set Vulkan Presentation to "Legacy" in NVIDIA control panel.
The same repros if I tweak it a little:
So it seems the conditions to trigger are:
Exact same as Stutter:
Set Vulkan Presentation to "DXGI" in NVIDIA control panel (Important!!).
So it seems the conditions to trigger are:
After fixing various issues in Godot, the bug remained. I tried a different Vulkan app (my own, from OgreNext) and it exhibitted the same bug.
The chances of this happening to two (quite different) Vulkan apps and being a an app bug is almost 0. This looks like a driver bug.
I am preparing a repro for NVIDIA now (not now, I am really tired; I've been at this for 19hs during the weekend). I will send it over the week.
Time itself is not running backwards:
I still believe it's the engine inadvertently reusing the wrong frame or command buffer.
The chances of this happening to two (quite different) Vulkan apps and being a an app bug is almost 0. This looks like a driver bug.
Can you provide the ogre-based test application for me to run a repro as well? Because I do not see the problem with other Vulkan renderers (i.e. Unity, and some games). I don't have the time to write a full test suite in other frameworks right now, very sorry. :(
I have done some extensive web searching and there's no indication that this issue affects non-godot developers. Otherwise, thousands of gamers would be talking about it. (this issue has existed for at least 2 months)
I think they may still vary from system to system, but it's good to know different "constellations" of settings. I certainly did not use HDR, just 10bpc color depth, to see the strongest shuffle behaviour. I still get an occasional shuffle frame even with what currently works best.
"Window must be running on the non-main monitor." <-- I can falsify this. :) I tested even changing the main monitor around.
"Both monitors must be at different Hz." <-- this is LIKELY a factor (windows compositor is single-refresh or something), however, I can see the shuffle with just 1 monitor attached, at 120 Hz. However, your great research prompted me to check both for HPET setting (however, I think my system is rather well setup, all things considered, and I just updated bios yesterday); and what it does if Windows started up with just 1 screen. (edit: 1 screen still shows jitter, sometimes the others, sometimes the app jitters for a while; and HDR on or off seems unrelated to the jittering)
All in all, of course I'd prefer this to be a driver bug, so I could hate on team green instead of my favourite game engine.
Here's a tweaked version of the repro that:
@thygrrr Here's the OgreNext demo that managed to trigger the same bug: OgreNextRepro.zip
Video: https://github.com/godotengine/godot/assets/3395130/ae4e20fd-f6bd-4b7f-809c-38f3651b457e
I have done some extensive web searching and there's no indication that this issue affects non-godot developers. Otherwise, thousands of gamers would be talking about it. (this issue has existed for at least 2 months)
Because:
I'm having this issue on forward + it takes a couple of tries to get it to happen but no problems on compatibility.
OS: Windows 10 GPU: AMD 6750XT. Driver settings: Default. Monitor: 60hz Godot: v4.2-B6
Fresh 4.2-B6 project, no settings that should impact this issue changed. 2 assets a tile map and the player. 2 scenes the player and the level. 1 addon, godot-4-importality. (Imports aseprite files)
test_stage.tscn:
player.tscn:
Image of outliner:
The bug takes a couple of tries to reproduce. Had to record with phone, when recording from desktop the issue is not present as in not just on the video but in real time as well. (yes the flashing is apart of the issue).
Forward + Bug:
https://github.com/godotengine/godot/assets/6450181/9249ecaf-6c51-452d-a6eb-02836e57cebc
Forward + No Bug:
https://github.com/godotengine/godot/assets/6450181/4122e1d4-cc68-4cb8-8d63-5650022a578a
The fact that it happens on an AMD GPU is deeply disturbing. I was wondering the possibility of this being a Windows bug. Do you have one monitor or multiple monitors plugged in @A-lamia?
Is MSAA enabled on your project?
On the other hand, I did fix a few issues that could explain it (i.e. there's the chance it is both a Godot and NV bug). The repro I sent to NV had that bug fixed, but I was planning on working more on that tomorrow so it can be submitted as PR.
@darksylinc No MSAA. 2 monitors exact same models.
Thanks. One more question: I posted my own app here https://github.com/godotengine/godot/issues/84137#issuecomment-1811544904
Does the problem with that app reproduce for you?
@darksylinc no i ran it a bunch of times i don't have any issues.
This issue may be related to https://github.com/godotengine/godot/issues/80941 (which is difficult to reproduce because the Steam Deck has its own Windows drivers).
This issue may be related to #80941 (which is difficult to reproduce because the Steam Deck has its own Windows drivers).
Could be, though i don't have issues in the editor it's like a 1 in 5 chance that the bug happens when i run a scene.
Assuming that the steam deck uses the same sort of drivers as desktops.
After upgrading drivers to 23.11.1 (from 22.11.2) I was able to repro @A-Lamia behavior and found something extremely interesting. Suddenly some of these reports (including those in other tickets) start to make sense.
The rig is: AMD Ryzen 5900X 32GB AMD Radeon HD 6800 XT 16GB
What I found when using Godot 4.1.3 is that the screen would sometimes flash black and shuffling would occur if I hover the mouse cursor over the Minimize, Maximize and Close buttons until the tooltip appears. Additionally, Godot would sometimes shuffle when switching VSync modes (though shuffling when switching VSync would be reasonably acceptable, but I suspect it's another symptom of the same bug):
Shuffle when switching VSync: https://github.com/godotengine/godot/assets/3395130/430d20fe-9388-4843-ba44-0321dfaf1bb5
Shuffle when hovering over the min/max/close buttons plus tooltips: https://github.com/godotengine/godot/assets/3395130/5484fe32-b3da-49f6-a67c-29efd746fac5
The good news is that when I tried my custom build (which is based on #80566 plus a few more fixes I did as an attempt to fix it on NVIDIA) none of these problems manifested on AMD.
I haven't yet pinpointed why my build fixes the problem, as it could be because of #80566, the new fixes, or because of an older swapchain fix I submitted in #80571.
Why does this explain several reported tickets? Because clearly this variant of the bug (the one in AMD where there is black screen flashing and shuffling, not the one in NV) mostly (but not only) appears when the tooltip overlay is drawn on top of Godot's window. In some users' computers, simply a 3rd party program or specific driver could be causing the same (but invisible) events that trigger this bug. This can easily explain why some users constantly have this problem while others are unable to repro.
Question for @A-Lamia : I have uploaded a custom version of my Godot build here. Could you tell me if the bug is still present if you try to run your project using that exe?
After thorough testing, I believe #82768, #80941, #81795 and this bug are all the same bug.
My current theory given how recent these reports are is that Godot is triggering a bug in dwm.exe (Windows' compositor), probably introduced recently via Windows Update, and it probably has to do with how Godot handles the window proc in WNDCLASSEX::lpfnWndProc
.
An easy way to trigger this bug on AMD RDNA2 with a single monitor is to launch the flicker test demo a lot of times (e.g. 10 instances if necessary, more if you have to) until a few of them or many start to flicker to black or shuffle like crazy. Moving the window or maximizing increases the chances of triggering the bug.
You can monitor all windows to see which one flickers using Super + Tab
I suspect it's a dwm.exe bug because in GPUView it can be easily seen that Godot started late after VSync (because of dwm.exe) yet it delivered its work on time with lots of time to spare, but dwm.exe waited far too long for the next present and missed the vblank.
It is clearly visible in GPUView's timeline that dwm.exe's workload is off and does not look how it's supposed to look for a workload that is running at the monitor's frequency..
Also since this bug triggered on NVIDIA on ogre-next (but it's very rare), indicates it's not a Godot-specific issue. I tried to do the same tricks with a Unity sample and ended up with a TDR. So, something is seriously messed up and it seems Godot was just unlucky to trigger this issue more frequently without doing much (there's still the chance Godot is doing things wrong though), but it seems to affect every app.
I'm still trying to gut Godot's DisplayServerWindows::WndProc
to try to pinpoint what is triggering this bug.
Oh, I forgot to mention this bug also happens with my PR (even with all the fixes). It is much harder to trigger, but the bug is still triggered if you try hard enough.
@darksylinc yeap i was still able to get the bug took 7 attempts.
I have a feeling this is connected to the symptoms I describe in #85547.
The good news is that when I tried my custom build (which is based on #80566 plus a few more fixes I did as an attempt to fix it on NVIDIA) none of these problems manifested on AMD.
@darksylinc The flickering window shadows I describe in that issue are also not present in your build (compared to 4.2 stable), using the same GPU.
2 different users having this issue w/ VSYNC enabled and both of us use 5700XT graphics cards, forward+, vulkan (on mine, not sure on theirs) https://www.youtube.com/watch?v=06wlTIDRx3U&t=4s
I have been playing a game called halls of torment and got this bug and immediately went to google to find it was made in godot.
2 different users having this issue w/ VSYNC enabled and both of us use 5600XT graphics cards, forward+, vulkan (on mine, not sure on theirs) https://www.youtube.com/watch?v=06wlTIDRx3U&t=4s
Oh hey, I was wondering where that recording went. Yes I'm also forward+ vulkan on a 5700 XT with VSync enabled.
Sorry 5700XT, and yeah I was following up on this bug and I figured we'd want it tagged.
On Thu, Dec 21, 2023, 6:31 PM TestSubject06 @.***> wrote:
2 different users having this issue w/ VSYNC enabled and both of us use 5600XT graphics cards, forward+, vulkan (on mine, not sure on theirs) https://www.youtube.com/watch?v=06wlTIDRx3U&t=4s
Oh hey, I was wondering where that recording went. Yes I'm also forward+ vulkan on a 5700 XT with VSync enabled.
— Reply to this email directly, view it on GitHub https://github.com/godotengine/godot/issues/84137#issuecomment-1867050117, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKYWUIVYY3EMT6III2XYAM3YKTBEJAVCNFSM6AAAAAA6UYGWNKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRXGA2TAMJRG4 . You are receiving this because you commented.Message ID: @.***>
The minimal reproduction does not appear to have the issue as of 3be3d50 (#87340).
My Vulkan vsync stutters seem similarly fixed by the above, although it's worth noting I had to go into my NVidia control panel and set "Vulkan/OpenGL present method" to "Prefer layered on DXGI Swapchain", which probably makes sense to somebody smarter than me.
My Vulkan vsync stutters seem similarly fixed by the above, although it's worth noting I had to go into my NVidia control panel and set "Vulkan/OpenGL present method" to "Prefer layered on DXGI Swapchain", which probably makes sense to somebody smarter than me.
This is not that odd of a fix, I've been considering that we could get much more consistent behavior if we used a DXGI swap chain instead even when using Vulkan. It'd get lower latency and a more consistent presentation. That control panel option does essentially do that for you at the low level AFAIK.
I've been considering that we could get much more consistent behavior if we used a DXGI swap chain instead even when using Vulkan. It'd get lower latency and a more consistent presentation.
See https://github.com/godotengine/godot-proposals/issues/5692 where I originally proposed this. It would also allow for HDR output to be implemented in Vulkan-based rendering methods, as DXGI is the only way to achieve HDR output on Windows.
Using DXGI directly also allows NvTrueHDR to work on Vulkan apps without requiring changes in the NVIDIA Control Panel to force layered DXGI presentation.
Godot version
4.1.3
System information
Windows 11, NVIDIA GeForce GTX 1070, i7-7700K CPU 4.20GHz, 1920x1080 60Hz IPS monitor & a 2560x1440 adaptive 100Hz monitor
Issue description
The forward plus renderer, causes jitters in movement. It's especially visible when V-Sync is switched on. This happens in 2D and 3D Those jitters are more or less pronounced, depending on which screen you use, they might not be very visible on a 60HZ screen but are still there.
!
when i talk about "jitter" i am talking about this effect happening in the video above or clearly observable in this older video here at 5, 8, 11 and 13 seconds: Video
The problem seems to be tha tthe renderer has a cache of 4 GPU frames and that the sorting of those frames is jumbled up. So basically by design we seem to have an reoccuring order of 0, 1, 3, 2, 3, ... 4, 5, 8, 7, 8... This means some frames don't get shown, others get shown twice and the judder is caused by a step back
Currently it's not clear if it's a problem with the way how the frames are put together, or if it's a driver related issue related to memory. It could be 2 bugs, as comment sin the thread show other bugs play into this
Steps to reproduce
Can this be circumvented?
No. It's a very critical and very deep sitting bug. With the forward plus renderer there is no way to circumvent this and it will happen on any hardware. THe higher the resolution, the less obvious the bug is, however it's always there. The forward plus renderer is simply broken and can't be used in the current state.
You could Use the gl_compatibility renderer, which doesn't have this problem, but this is not a good solution either.
Minimal reproduction project:
I made a simple project in which the character can only run left and right per key input. That's literally all the code we need to test the issue properly, the setup described above is key for making it visible.
Conclusion
Hypothesis
I think the core of the problem lies with the forward plus renderer (There however might be additional problems with V_Sync, Camera or Physics)
Minimal reproduction project
Project Download This includes a small platformer project. Project settings need to be adjusted according to my tests to reproduce the issues.