godotengine / godot

Godot Engine – Multi-platform 2D and 3D game engine
https://godotengine.org
MIT License
86.9k stars 19.46k forks source link

Unexpectedly low performance of temporal antialiasing due to high GPU cost #61905

Open Calinou opened 2 years ago

Calinou commented 2 years ago

Godot version

4.0.alpha (b9375ea7fc135f5468dcfb8c80b51a945ac14155)

System information

Fedora 36, GeForce GTX 1080 (NVIDIA 510.68.02)

Issue description

In Godot, enabling TAA on a GTX 1080 and a 2560×1440 viewport takes up more than 1.2 ms of GPU time, even in an empty scene with just a Camera3D. The performance impact seems fairly constant regardless of the scene's contents:

Empty scene

TAA disabled TAA enabled
2022-06-10_17 25 36 2022-06-10_17 25 44

1 BoxMesh

TAA disabled TAA enabled
2022-06-10_17 24 39 2022-06-10_17 24 44

1 BoxMesh + 1 DirectionalLight3D with shadows + Default Environment

TAA disabled TAA enabled
2022-06-10_17 25 09 2022-06-10_17 25 15

According to the View Frame Time panel, nearly all of the rendering cost is on the GPU. CPU frame time barely changes when TAA is enabled, at least in simple scenes. Therefore, motion vector generation isn't the bottleneck. The actual TAA shader is more likely to be a bottleneck.

Replacing the main() function's contents with just return; in the taa_resolve.glsl shader results in a black image, but still increases GPU time by 0.7 mspf compared to TAA disabled.

Renderdoc confirms that TAA is indeed taking around 1.2 ms of GPU time. (I don't have access to actual Vulkan profiling tools on this GPU, as it's too old to use Vulkan profiling on NSight.)

At lower resolutions, the performance impact of TAA is much less noticeable:

575×310

TAA disabled TAA enabled
2022-06-10_17 33 47 2022-06-10_17 33 54

905×550

TAA disabled TAA enabled
2022-06-10_17 34 32 2022-06-10_17 34 38

This doesn't compare favorably to the TAA implementation in other open source rendering engines. For example, in Tesseract, in a semi-complex scene with many objects and lights, the frame time difference between TAA disabled and enabled is only ~0.2 ms ((1.0/333 - 1.0/357) * 1000):

TAA disabled in Tesseract TAA enabled in Tesseract
2022-05-24_22 49 49_complex_edit 2022-05-24_22 49 45_complex_edit

There are many technical differences between Godot 4 and Tesseract's rendering engines:

Still, I feel TAA should not be this expensive on the GPU in Godot.

Steps to reproduce

Minimal reproduction project

test_taa_performance.zip

Zireael07 commented 2 years ago

Tesseract uses a seemingly simpler (yet effective) form of TAA called TQAA (temporal quincunx antialiasing).

Interesting. Sounds like something Godot should try, especially for GLES3 backend imho

Calinou commented 2 years ago

Interesting. Sounds like something Godot should try, especially for GLES3 backend imho

The OpenGL backend will probably never get TAA, as it's intended for old/low-end hardware where TAA is too expensive. A TAA implementation also adds a lot of complexity to a renderer, and I think the OpenGL renderer is best kept simple so we can focus on making it stable.

I think Godot using a forward renderer will penalize it for TAA (compared to a deferred renderer), but there are probably some optimizations we can figure out.

mrjustaguy commented 2 years ago

To my Knowledge, Forward vs Deferred rendering should have No impact on the TAA Costs.

Calinou commented 2 years ago

JFonS said that the high TAA cost is expected due to how it works currently. It's a separate pass that requires a full-screen copy, which is expensive in itself. I suppose avoiding this copy could halve the GPU cost of TAA, if not more.

To make TAA cheaper, it should avoid performing this copy. For instance, this can be done by moving TAA to the tonemapping shader, but doing so will break FXAA and glow (so they won't be usable at the same time as TAA). Another solution needs to be found – let us know if you can think of one :slightly_smiling_face:

Calinou commented 1 year ago

We discussed this issue in today's rendering meeting and concluded on possible optimizations:

mrjustaguy commented 1 year ago

I've been trying to wrap my head around the TAA to tonemap migration for a while, and I don't get it.. Why would that break FXAA and glow? I mean the TAA step could be done before the two, and you'd apply those to the TAA result... Right?

I mean based on the post above I'm guessing currently TAA works something like scenario 1:

1) Creates copy of Full Screen 2) Does TAA 3) Passes result to Tonemapper 4) Tonemap applies Glow/FXAA if enabled

or less likely, scenario 2 like:

1) Tonemapper applies Glow/FXAA if enabled 2) TAA Creates copy of Full Screen based on Tonemapper output 3) Does TAA

Whereas TAA in Tonemapper would work something like this:

1) Does TAA 2) Modifies output with Glow/FXAA after the TAA step is done

The main differences are that FXAA/Glow would possibly change from being applied before TAA to after TAA, which for Glow wouldn't change things much, but for FXAA could result in visible differences, as it'd look kind of like applying a filter to TAA, whereas if FXAA image is getting TAA it'd smear it a little

jclounge commented 4 months ago

Not sure if this is the same exact issue, but on my iMac with AMD 560X GPU and Godot 4.2.1 mono, enabling TAA slows everything down severely. With a completely empty 3D scene the editor becomes sluggish even without any anti-aliasing being visible in the viewport. Adding only a camera to the scene and then running the game makes it render at around 40fps with TAA enabled, compared to around 400fps with TAA disabled. Other AA modes seem to run fine. EDIT: This is without retina mode enabled, so the screen resolution is only 2048x1152. If I make the game window full-screen, the fps plummets to around 10fps.