bevyengine / bevy

A refreshingly simple data-driven game engine built in Rust
https://bevyengine.org
Apache License 2.0
36.11k stars 3.56k forks source link

substantial performance degredation in bevy 0.10 #7982

Closed ruabmbua closed 1 year ago

ruabmbua commented 1 year ago

Bevy version

bevy 0.10

[Optional] Relevant system information

CPU:
  Info: 12-core model: AMD Ryzen 9 5900X bits: 64 type: MT MCP cache:
    L2: 6 MiB
  Speed (MHz): avg: 2479 min/max: 2200/4950 cores: 1: 3700 2: 2200 3: 2200
    4: 2200 5: 3009 6: 2200 7: 2200 8: 2200 9: 2200 10: 2200 11: 2200 12: 2200
    13: 3599 14: 2200 15: 2200 16: 2200 17: 3700 18: 3700 19: 2200 20: 2200
    21: 2200 22: 2200 23: 2200 24: 2200
Graphics:
  Device-1: AMD Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT]
    driver: amdgpu v: kernel
  Display: wayland server: X.org v: 1.21.1.7 with: Xwayland v: 22.1.8
    compositor: sway v: 1.8.1 driver: X: loaded: modesetting dri: radeonsi
    gpu: amdgpu resolution: 1: 2560x1440~144Hz 2: 1920x1080~60Hz
  API: OpenGL v: 4.6 Mesa 22.3.6 renderer: AMD Radeon RX 5700 XT (navi10
    LLVM 15.0.7 DRM 3.49 6.2.1-arch1-1)
AdapterInfo { name: "AMD Radeon RX 5700 XT (RADV NAVI10)", vendor: 4098, device: 29471, device_type: DiscreteGpu, driver: "radv", driver_info: "Mesa 22.3.6", backend: Vulkan }

What you did

I upgraded to bevy 0.10

What went wrong

After upgrading my project to bevy-0.10, the performance suffered a lot. Release mode builds slowed down in some situations from 140+fps to below 100. In debug builds with dependency crates still set to opt level 3 (for development speed), the performance degraded from 140+fps to below 20 in some situations, an never touching more than 60.

I know that changing engine related code can result in slower performance in some situations, but I did not expect it to slowdown quite as much. Especially since before even IGPU could run my project easily with 60fps.

Additional information

I changed my project a bit to make situations better reproducible (disabled dynamic map generation), and made two profiles for before / after comparison.

Here is the file: https://github.com/ruabmbua/bevy-traces/blob/main/traces.tar.xz

First look at the traces tells me that CommandEncoder::run_render_pass is the problem. Its 20ms instead of 900µs.

alice-i-cecile commented 1 year ago

@superdump @james7132 you two will be interested in this.

Thanks for the traces!

ruabmbua commented 1 year ago

Note that this is my correct GPU:

AdapterInfo { name: "AMD Radeon RX 5700 XT (RADV NAVI10)", vendor: 4098, device: 29471, device_type: DiscreteGpu, driver: "radv", driver_info: "Mesa 22.3.6", backend: Vulkan }

GH issues replaced my paste with something else??

ruabmbua commented 1 year ago

Update: it turns out DirectionalLight is the problem. When I disable the only one in the scene (for simulating the sun), it seems to be fixed. There are no issues with other types of lights.

james7132 commented 1 year ago

The first thing I'm noticing is the amount of time spent encoding shadow passes, as you've mentioned in the issue description. This seems like it's an immediate result of the cascaded shadow map changes.

image

Elabajaba commented 1 year ago

Update: it turns out DirectionalLight is the problem. When I disable the only one in the scene (for simulating the sun), it seems to be fixed. There are no issues with other types of lights.

Do the other lights cast shadows?

ruabmbua commented 1 year ago

Nope, but I just enabled them and there seems to be no perf impact.

ruabmbua commented 1 year ago

Here is a gpu trace, can be viewed with the RadeonGPUProfiler:

bevy-voxel-experiment_2023.03.08_22.26.55.zip

ruabmbua commented 1 year ago

grafik

Seems to me like all the shadow cascade passes have barely any gpu utilization, but I am too much of a noob in RGP to figure out why its stalled.

ruabmbua commented 1 year ago

Looking at the passes, it seems only vertex shader units are utilized, but the actual hw utilization is about nicely 0%-1% for memory, cache, alu, load-store. And digging into instruction timing, there are some s_waitcnt instructions which show that most of the time is spent waiting in them.

superdump commented 1 year ago

I was going to ask about your scene and materials. I see from the Tracy trace that you’re using voxels- can you please provide a lot more information about how exactly your voxels are rendered, the scene being rendered (number of entities with meshes), what material(s) you’re using.

ruabmbua commented 1 year ago

Sry for the late update:

And here are some stats of the test scene:

total meshes: 5506 total primitives: 1963452 avg primitives / mesh: 356

And here a picture of the camera for comparing the engine versions, and where I captured the traces:

image

ruabmbua commented 1 year ago

I just changed around some of the settings for my chunk renderer. I reduced the number of chunks by 4 times, and compensated by making the chunks larger. The goal was to reduce number of total meshes.

It seems that partially fixes the performance problem, while still rendering somewhat the same geometry.

Was my previous approach with so many unique meshes unreasonable? I do not have much experience with this stuff.

superdump commented 1 year ago

Currently the way bevy draws meshes is one draw command per mesh. I’ve been experimenting, learning, and looking into what are called ‘batching’ and ‘instancing’ #89 to use far fewer draw commands for drawing the same things because fewer draw commands for drawing the same things can bring very large performance benefits. Within the constraints of current bevy rendering with one draw per mesh, trying to merge meshes and reduce the draw count will improve performance. I’m speaking loosely here as there are many things that can improve rendering performance and voxels in particular have many different possible rendering techniques. How many voxels are there in your scene?

ruabmbua commented 1 year ago

The "volume" of loaded voxels in the scene consists of 67108864 voxels. Or course much of that is empty and does not actually generate any meshes.

I know about instancing, but I doubt it would be helpful in this "minecraft" kind of voxel rendering. I think its not possible to have different geometry for instancing?

Merging the meshes however seems to make sense, since it already improved by a lot by adjusting my parameters.

ruabmbua commented 1 year ago

I think the gpu might just struggle to actually use all of its resources for vertex shading, when there are too many draw calls with not enough actual primitives inside of them. I compared the the utilization in gpu profiler, and it got 2-4 times better now.

What is strange however is the fact that old bevy was still a lot better, even with the smaller chunks & more total meshes. I guess the extra render passes which all invoke the vertex shaders again makes the cost of it just go up a lot.

Maybe its possible to just run the render passes of cascade shadow map at the same time? Not sure how exactly the algorithm looks like in bevy, but I imagine they could run in parallel, and increase the GPU utilization again to fix the performance?

ruabmbua commented 1 year ago

@superdump I think this can be closed, its still possible to get good performance in bevy 0.10, it just needs some tweaking in the game code.

And I prefer the new cascaded shadow maps a lot over the old directional lights. Now it actually affects the whole scene I have, not just a large part of it.