godotengine / godot

Godot Engine – Multi-platform 2D and 3D game engine
https://godotengine.org
MIT License
90.82k stars 21.14k forks source link

Barriers after return in some compute shaders #94820

Open clayjohn opened 3 months ago

clayjohn commented 3 months ago

Tested versions

All released version of 4.x including the soon to be released 4.3

System information

All systems

Issue description

In many of our compute shaders we utilize memory barriers to synchronize threads before proceeding. Barriers must be uniform in terms of control flow, which means that they cannot be used in situations where one thread in a group may not reach them.

A common way where one thread may not reach a barrier in a group is if the barrier is behind an if statement and the condition is non-uniform. Another common way is if the shader returns before the barrier conditionally (non-uniform).

The latter is a common pattern in Godot. We do an early return for threads which won't contribute to the output of the shader.

The consequence of returning non-uniformly is that it is possible for the shader to crash, fail to run, or hang indefinitely. We first encountered this problem while working on HDDAGI and testing it on Intel devices.

This problem is likely responsible for a lot of the "Device Lost" errors that we get and should be remedied as soon as possible.

The solution is to move the non-uniform control flow statements to after all the barriers have been executed. For example, for the jumpflood stage of SDFGI, we could take the return here:

https://github.com/godotengine/godot/blob/607b230ffe120b2757c56bd3d52a7a0d4e502cfe/servers/rendering/renderer_rd/shaders/environment/sdfgi_preprocess.glsl#L377-L384

And simply move it to be after the barrier.

Steps to reproduce

Run Godot with one of the following effects:

  1. Auto Exposure
  2. SDFGI
  3. TAA
  4. Glow
  5. VoxelGI
  6. SSAO, SSR, SSIL, DoF
  7. GPUParticles

Minimal reproduction project (MRP)

The problem is internal to Godot

clayjohn commented 3 months ago

The places with potential issues right now are:

luminance_reduce.glsl

Seems okay, control flow appears to be uniform

sdfgi_preprocess

Will be fixed by https://github.com/godotengine/godot/pull/86267, but might be worth fixing now anyway

MODE_JUMPFLOOD_OPTIMIZED https://github.com/godotengine/godot/blob/607b230ffe120b2757c56bd3d52a7a0d4e502cfe/servers/rendering/renderer_rd/shaders/environment/sdfgi_preprocess.glsl#L378-L384 Need to move the branch below the barrier

MODE_OCCLUSION https://github.com/godotengine/godot/blob/607b230ffe120b2757c56bd3d52a7a0d4e502cfe/servers/rendering/renderer_rd/shaders/environment/sdfgi_preprocess.glsl#L499-L503

Need to move branch below the barriers. Last barrier here: https://github.com/godotengine/godot/blob/607b230ffe120b2757c56bd3d52a7a0d4e502cfe/servers/rendering/renderer_rd/shaders/environment/sdfgi_preprocess.glsl#L795-L798

taa_resolve.glsl

Seems okay, control flow is after all barriers

copy.glsl

Seems okay, the only non-uniform control flow happens after the barriers (i.e. inside the MODE_GAUSSIAN_BLUR branch)

ss_effects_downsample.glsl

Seems okay, Looks like there is no control flow. This should be checked to ensure there is no out of bounds memory access.

sort.glsl

Control flow appears to be uniform