godotengine / godot

Godot Engine – Multi-platform 2D and 3D game engine
https://godotengine.org
MIT License
89.04k stars 20.19k forks source link

Occlusion culling causes CPU-related frametime spikes when OccluderInstance3D nodes' visibility is toggled #70373

Open Calinou opened 1 year ago

Calinou commented 1 year ago

Related to https://github.com/godotengine/godot-proposals/issues/5967.

Godot version

4.0.beta9

System information

Fedora 36, Vulkan Forward Plus, AMD Radeon RX 6900 XT

Issue description

Occlusion culling causes CPU frametime spikes when OccluderInstance3D nodes with BoxOccluder shapes are hidden and shown. This occurs both with V-Sync enabled and disabled, and is noticeable in both cases. A fully optimized engine binary (with LTO enabled) was used to reproduce this issue. I also tried setting the BVH Build Quality advanced project setting to Low, to no avail.

The MRP toggles visibility of static OccluderInstance3D to disable them when 4 doors start opening, and re-enables them when the doors are done closing. This is done to avoid moving OccluderInstance3D nodes every frame, which would trigger unnecessary BVH rebuilds (on top of overocclusion).

All doors open/close at the same time in the MRP, which makes the issue more noticeable. However, it still occurs with a single door present in the scene (with the other 3 doors removed entirely, not just hidden). As a workaround, making sure multiple occluders are never toggled on the same frame can help.

According to the visual profiler, Cull Scene (highlighted in white) is the most expensive operation during those spikes, not Update Occlusion Buffer:

2022-12-20_22 52 01

Steps to reproduce

Minimal reproduction project

occlusion_culling_mesh_lod.zip (same as https://github.com/godotengine/godot-demo-projects/pull/807)

clayjohn commented 1 year ago

Before implementing the Embree-based occlusion culling we discussed using Intel's Masked Software Occlusion Culling and now I can't remember why we decided not to use it. It certainly seems like it would improve performance on our target hardware

Calinou commented 1 year ago

This part from the issue description makes me wonder if we could queue updates on a frame and ensure only one BVH rebuild occurs per frame:

All doors open/close at the same time in the MRP, which makes the issue more noticeable. However, it still occurs with a single door present in the scene (with the other 3 doors removed entirely, not just hidden). As a workaround, making sure multiple occluders are never toggled on the same frame can help.

It's worth digging out a profiler and checking if the rebuild function is called more times than needed.

mrjustaguy commented 1 year ago

I'm running 4.2 dev3, and I've locked my CPU frequency to 3.7 GHz (i3 10105f) using Power plans in Windows 11, and I'm seeing a totally different result that is totally fine IMHO..

While there are spikes, the delta between highest and lowest points on the graph is about 2x, not like 10x, and it looks like a flat brick with an occasional spike that is only sometimes created when toggling the doors.

I've tested with 512, 4096 and 16384 rays, and the behavior is fairly consistent 1ms avg with 2ms spikes, 2.5ms avg with 5ms spikes, 6ms avg with 13ms spikes

I'm only getting a graph resembling this when allowing for dynamic clock speeds, however the spikes consistently remain under 16ms per frame, across all 3 ray counts, all of them reaching similar spike durations.