DaemonEngine / Daemon

The Dæmon game engine. With some bits of ioq3 and XreaL.
https://unvanquished.net
BSD 3-Clause "New" or "Revised" License
301 stars 60 forks source link

Cull dynamic lights on CPU, switch to clustered rendering #1042

Open VReaperV opened 8 months ago

VReaperV commented 8 months ago

Currently, dynamic lights tiles are computed on the GPU, and uses branching on non-dynamically uniform expressions (while this distinction is made for GLSL versions >= 4.0, it is likely to be close to what modern drivers will branch well on), so the fragment shader invocations likely execute all branches. It also uses a different FBO, which is an expensive state change. See: https://github.com/DaemonEngine/Daemon/blob/master/src/engine/renderer/glsl_source/lighttile_fp.glsl and https://github.com/DaemonEngine/Daemon/blob/c5f8539fec2b29e7477c924e30c171d3e9fd2f65/src/engine/renderer/tr_backend.cpp#L2803

Performance would likely improve if lights are culled and assigned to tiles on the CPU.

Additionally, using clustered rendering instead of tiled will likely increase performance as well.

Overview: Tiled and clustered rendering are techniques that divide the view frustum into tiles (on the XY plane)/frustum-shaped clusters (adds depth slices) respectively and store lights (and potentially things like decals or probes) only in the tiles/clusters they should affect. The lighting code in fragment shader then fetches the list of lights from a buffer object or a texture using the fragments position and only computes those lights. Clustered rendering reduces the amount of light computations compared to tiled for a small CPU cost.

For a more in-depth explanation and implementation example see https://www.humus.name/Articles/PracticalClusteredShading.pdf and https://advances.realtimerendering.com/s2016/Siggraph2016_idTech6.pdf (p. 5-9).

illwieckz commented 8 months ago

Currently, dynamic lights tiles are computed on the GPU, and uses branching on non-dynamically uniform expressions (while this distinction is made for GLSL versions >= 4.0, it is likely to be close to what modern drivers will branch well on), so the fragment shader invocations likely execute all branches. It also uses a different FBO, which is an expensive state change.

Anything that can help the game to run on more lower-end machines is welcome! 🤓️