DadSchoorse / vkBasalt

a vulkan post processing layer for linux
zlib License
1.24k stars 52 forks source link

Higher input lag in Doom when the game presents from a compute queue #34

Open aufkrawall opened 4 years ago

aufkrawall commented 4 years ago

GCN can present frames from a hardware compute queue to improve performance. Doom uses this when the game's anti-aliasing is either set to disabled or 8xTSSAA. When combined with vkBasalt, these artifacts occur (tested with radv): Screenshot_20191108_142236 The white stripes disappear when turning off async compute by switching to a different AA mode like FXAA.

It would be nice if these artifacts could be fixed without decreasing performance. Doom is not the only example, this e.g. also applies to Rage 2 (at least it should, as it uses async compute as well).

Mesa overlay also has issues with it: https://gitlab.freedesktop.org/mesa/mesa/issues/946#note_246418

DadSchoorse commented 4 years ago

Sadly there is no way I can debug this. I have neither the game nor the hardware. Maybe the mesa guys find a fix that I can look at.

DadSchoorse commented 4 years ago

Maybe i found the problem 6f541b1a9ec5d7acf1c29309caf52daa8d5d1952. Could you try this build: vkBasalt.tar.gz

aufkrawall commented 4 years ago

I've patched the commit into 0.1.0, but it unfortunately doesn't affect the issue.

Edit: With the source you've provided, Doom and SotTR (Linux native) crash at start. This also applies to the SMAA branch, which the linked source probably is? :)

DadSchoorse commented 4 years ago

Does the master branch work? There is a build: https://github.com/DadSchoorse/vkBasalt/issues/30#issuecomment-552176075 Oh and please make sure you do not use an old vkBasalt.conf

aufkrawall commented 4 years ago

Master branch crashes too, also when ~/.local/share/vkBasalt/vkBasalt.conf is deleted. Unfortunately, there doesn't seem to be any interesting terminal output when it happens.

DadSchoorse commented 4 years ago

Oh sorry the ~/.local/share/vkBasalt/vkBasalt.conf should stay. That file is changed on install of a new build. I was referring to vkBasalt.conf in the game folder. And please give me the terminal output. You also could try enabling the validation layers with VK_INSTANCE_LAYERS=VK_LAYER_KHRONOS_validation if you have those installed.

aufkrawall commented 4 years ago

Afair the AUR PKGBUILD doesn't create the user config at that path by default, but I've copied the updated example config there manually.

Also vkcube crashes: vkbasalt.log

aufkrawall commented 4 years ago

Reading the crash log and looking at the files of the package: There is only a full_screen_rect.vert.spv included, but not a full_screen_triangle.vert.spv. And for some reasons, it looks for that file at the .local/share path, where only the config is located?

DadSchoorse commented 4 years ago

So you do not have a file called ~/.local/share/vkBasalt/shader/full_screen_triangle.vert.spv?

DadSchoorse commented 4 years ago

Yeah don't use the PKGBUILD with newer versions...

aufkrawall commented 4 years ago

So you do not have a file called ~/.local/share/vkBasalt/shader/full_screen_triangle.vert.spv?

Nope. :) I'll try compiling without PKGBUILD.

aufkrawall commented 4 years ago

@DadSchoorse It works with the source you provided in https://github.com/DadSchoorse/vkBasalt/issues/34#issuecomment-554237618 , fantastic. :)

Funny observation: It also "fixes" the Mesa overlay, and the fps are still as expected with async compute. It just doesn't "fix" the Steam overlay fps counter, it still reduces performance in Doom.

Do you think the Mesa overlay could implement something similar for the overlay?

DadSchoorse commented 4 years ago

Well vkBasalt waits until the game wants to present the rendered frame and does all it needs to do after that. Then it presents the frame. I don't know how the mesa overlay works but that should be possible for any layer manipulating the final frame.

aufkrawall commented 4 years ago

@DadSchoorse We now unfortunately have another issue: The input lag gets increased by at least one frame now with async compute.

DadSchoorse commented 4 years ago

How did you measure input lag?

aufkrawall commented 4 years ago

I don't measure it, but the difference with mouse input in the game is very obvious when switching between 8xTSSAA (async compute on) and FXAA (async compute off). This is not the case without vkbasalt.

DadSchoorse commented 4 years ago

Maybe I see something really obvious in the code but I don't think I have a solution for that.

mcoffin commented 4 years ago

@aufkrawall How does the async-compute-based presentation work? Can you link to any docs on it?

I'm looking for something to do and might tackle this. I have both GCN and Navi cards, and I can pick up the game if need be.

aufkrawall commented 4 years ago

@mcoffin There recently was an explanation by @Plagman in the Mesa ticket conversation: https://gitlab.freedesktop.org/mesa/mesa/issues/946#note_404377 Should be much more illuminating than what I could ever contribute on the matter.

Perhaps Navi isn't affected in the same way as GCN, at least there are differences with Navi vs. GCN on Windows regarding RTSS overlay in this regard.

I don't have an AMD card installed anymore, so I unfortunately can't try out whatever comes up.

DadSchoorse commented 4 years ago

this issue is unsolvable unless we implement a rasterizer that only uses compute

aufkrawall commented 4 years ago

@DadSchoorse

this issue is unsolvable unless we implement a rasterizer that only uses compute

Would this be within scope or would it require a ton of efforts?

For what it's worth: ReShade Vulkan compatibility has increased nicely. Unlike vkbasalt, it doesn't have increased input latency when a game presents from compute queue, but instead it reduces performance further.

On the GTX 1070 GPU when enabling just LUT shader: -7% in Doom Eternal (uses async compute and presents from compute queue) vs. -2.3% in Strange Brigade (uses async compute, but doesn't present from compute queue). I suppose it could cost more with GPUs that have better async compute capabilities. May I ask if you're aware of it, @crosire ?

Edit: As a side note: Steam and RTSS (beta) overlays recently have received improved compatibility with async present, they now can draw onto the frame buffer without noteworthy performance hit.

crosire commented 4 years ago

ReShade synchronizes its rendering queue (which is always the first graphics queue) with the queue being presented on. If the game submitted additional work to the same graphics queue in the meantime, then that will make performance drop (since more work is executed before the actual present). This is probably what is happening in DOOM. I suppose this could be fixed by creating a dedicated graphics queue just for ReShade and synchronizing that with the present queue. But then things get a lot more complicated with ensuring resources are synchronized between these queues (e.g. when accessing the depth buffer, which belongs to the game, in a ReShade shader). So for ReShade it is much simpler the way it works right now, which assures correct behavior, but at the cost of some performance.

Plagman commented 4 years ago

A dedicated queue is unlikely to completely fix the problem, as the hardware only has one graphics pipe. Your additional queue would potentially let the OS schedule you into an earlier submission gap, but the game submissions tend to be chunky, so I think you'd see more or less the same thing.

DadSchoorse commented 4 years ago

Unlike vkbasalt, it doesn't have increased input latency when a game presents from compute queue, but instead it reduces performance further.

I could probably also achieve that, with the question being if input latency or a performance loss is worse.

A dedicated queue is unlikely to completely fix the problem, as the hardware only has one graphics pipe.

And only NVIDIA exposes more than one graphics queue anyway.

this issue is unsolvable unless we implement a rasterizer that only uses compute

Would this be within scope or would it require a ton of efforts?

The build in effects always draw a full screen triangle, making those work in compute is pretty easy (The first release drew with a compute shader already). The problem is that reshade shaders can draw arbitrary triangles (and other primitives) so that is pretty much impossible to do in a generic, high performant way.

Edit: As a side note: Steam and RTSS (beta) overlays recently have received improved compatibility with async present, they now can draw onto the frame buffer without noteworthy performance hit.

I suspect that these are drawing with compute shaders now.

aufkrawall commented 4 years ago

Thanks for your responses!

RTSS developer Unwinder shared some information about the general concept. Can't judge if it contains any hint for something that isn't already general knowledge:

Added alternate asynchronous On-Screen Display renderer for Vulkan applications presenting frames from compute queue (id Tech 6 and newer engine games like Doom 2016 and Dooom Eternal). The implementation is using original AMD’s high performance concept of asynchronous offscreen overlay rendering and the principle of asynchronically combining it with framebuffer directly from compute pipeline with compute shader without stalling compute/graphics pipelines. ...

https://forums.guru3d.com/threads/rtss-6-7-0-beta-1.412822/page-118#post-5776692

Ahmed-E-86 commented 4 years ago

OMG!! Good to know that this issue is related to Vkbasalt. It is unbearable to play the game like this.

aufkrawall commented 4 years ago

Yeah, I think it would be better to lose a bit performance instead.

aufkrawall commented 4 years ago

@DadSchoorse Would it be possible to consider this in your current rewrite approach?