godotengine / godot

Godot Engine – Multi-platform 2D and 3D game engine
https://godotengine.org
MIT License
91.2k stars 21.21k forks source link

Write / reading `frag_color` multiple times in shader seems to cause performance problem #84526

Open lawnjelly opened 1 year ago

lawnjelly commented 1 year ago

Godot version

3.6 beta 3 (but likely existed for years)

System information

Linux Mint 21.1, Intel Core i7-13700T, Intel a780 GPU

Issue description

After spending some time tracking down a performance problem when working on core shaders in the 3D platformer demo, I tracked down a large drop in fill rate performance to modifying frag_color multiple times in the fragment shader.

I'm half expecting this to be an error on my part but I'd rather flag it and feel silly :blush: than let it pass and potentially miss a good speedup.

GLES2

Adding the second line here:

gl_FragColor = vec4(ambient_light + diffuse_light + specular_light, alpha);
gl_FragColor *= 1.0001;

dropped the frame rate from 127fps to 62fps in GLES2.

Whereas using this instead:

vec4 temp_frag_color = vec4(ambient_light + diffuse_light + specular_light, alpha);
temp_frag_color *= 1.0001;
gl_FragColor = temp_frag_color;

kept the frame rate at 127fps.

This was very surprising to me, and I was expecting it to be an artifact, but it seems very repeatable on my hardware. It may come down to the drivers, but if it happens for me it likely happens on other setups.

This paradigm of writing to frag_color multiple times is used in GLES2 and GLES3 (for at least fog and emission), and may be dropping performance unnecessarily. It may also be a problem in 4.x (haven't examined but I've mentioned this in rocket chat).

GLES3

I also tried the same in GLES3.

Again, adding the second line here:

    frag_color = vec4(ambient_light + diffuse_light + specular_light, alpha);
    frag_color.rgb *= 1.0001;

dropped performance from 88fps to 30fps. That's almost a 3 fold drop in performance.

GLES3 seems less susceptible currently, as it only modifies frag_color when adding emission:

#ifdef USE_FORWARD_LIGHTING //ubershader-runtime
    frag_color.rgb += emission;
#endif //ubershader-runtime

Steps to reproduce

See above.

Minimal reproduction project

3D Platformer demo, modify scene.glsl shaders as above, and recompile engine.

Discussion

I'm not a shader guru by any means. I had a vague memory of this as a possible thing to watch for, but I'm not up to date. I haven't previously touched this part of the shaders, I just noticed while running experiments with blob shadows.

It could be that modifying the existing frag_color is causing some kind of round trip, or preventing this being optimized away into a local register, whereas using a temporary variable prevents side effects so doesn't have the performance drop. Alternatively it could be a problem introduced by our own shader "translator" code (I haven't examined the final glsl).

I also considered the problem was due to modification of alpha, but using a line like gl_FragColor.rgb *= 0.999; also creates the problem.

I'm still not absolutely sure it isn't some artifact I've created somehow - it would be nice to see it independently verified. It may not occur on all hardware / drivers.

It is possible this could also occur in user shaders.

If confirmed perhaps we can take some (hopefully) simple steps to eliminate this problem, by standardizing our shaders to use temporaries, and only write to the final GL builtin once, at the end.

Update

My small PR for GLES2 would seem to confirm this is a valid issue, it gives 2-3x increase in fps in scenes which tax fill rate and use emission / fog. :+1: It is likely that the 3D platformer is taxing on fill rate perhaps because of reflection probes. Will try this in TPS demo too.

Calinou commented 1 year ago

Intel a780 GPU

That GPU was never released, are you mistaking it for the Arc A770 or A380?

lawnjelly commented 1 year ago

Intel a780 GPU

That GPU was never released, are you mistaking it for the Arc A770 or A380?

That's what I read too, but my system info says "Intel Corporation Device a780" :rofl:

Anyway the CPU is 13th Gen Intel© Core™ i7-13700T × 16, so it is whatever integrated GPU comes with that. :+1:

EDIT: Intel® UHD Graphics 770 apparently according to google. https://www.intel.com/content/www/us/en/products/sku/230492/intel-core-i713700t-processor-30m-cache-up-to-4-90-ghz/specifications.html

0xA780 is the Device ID it looks like.

jknightdoeswork commented 1 year ago

Does this come into play when doing this in user created shaders:

COLOR = vec4(1.0,0.0,0.0,1.0)
COLOR = vec4(0.0,1.0,0.0,1.0)

Does this get transpiled to multiple assigns to gl_fragColor?

Any idea on webgl implication?

Is this a super easy fix? Just change a color .glsl files?

lawnjelly commented 1 year ago

Does this come into play when doing this in user created shaders:

Possibly, I'm not super familiar with the translation yet. But on the plus side it should be relatively easy to fix (write to temporary, then write gl_FragColor once at the end).

I've not tested outside my dev machine yet, but in theory if it occurs on one machine, it's likely it could occur on multiple similar setups. (Only further testing will reveal this.)

lawnjelly commented 1 year ago

Further clue this morning - while testing #84529 I discovered that in GLES3 adding this second line:

frag_color_final = frag_color;
frag_color_final *= 0.999;

Does NOT result in the drop in performance. This suggests that there is some kind of interaction with the previous code, like it is breaking a fast path, but only in some circumstances depending on the previous code.

Truly a very strange bug. :grin:

lawnjelly commented 1 year ago

This is not perfect but if you run this project in 3.x it should hopefully show whether your GPU has this slowdown (in GLES2). This may not be reliable on super fast GPU, it shows approx 100fps on my integrated GPU.

If the FPS label is approx the same with fog on and off, there is no slowdown. If it is approx halved with fog on, then your GPU has the slowdown.

The reason it exposes the problem is that fog modifies an existing gl_FragColor. With the PR fixed version, there is no slowdown, but on vanilla Godot, the slowdown should be exposed.

glFragColor_test_gpu.zip

lawnjelly commented 1 year ago

Will post results here of testing as I get it. Using the above project.

Linux Mint 21.1, Intel Core i7-13700T, Intel UHD Graphics 770 GPU

Fog on 46fps Fog off 104fps (approx same figures when running Godot under wine) Very susceptible.

Linux Mint 21.2, Intel Core i3 2377M, 2nd Gen Core integrated graphics

Fog on 16fps Fog off 60fps Very susceptible.

Android Galaxy Tab S6 Lite

(modified material to turn on ambient occlusion and reflections to bring under 60fps) Fog on 53fps Fog off 54fps Not susceptible.

lawnjelly commented 1 month ago

This is fixed in 3.x with #84529 but the problem may still occur in 4.x (where my PR is now out of date #83697 ). Changing milestone to 4.x to represent this, and let anyone else pick this up if they get the problem in 4.x.