Closed ruabmbua closed 1 year ago
@superdump @james7132 you two will be interested in this.
Thanks for the traces!
Note that this is my correct GPU:
AdapterInfo { name: "AMD Radeon RX 5700 XT (RADV NAVI10)", vendor: 4098, device: 29471, device_type: DiscreteGpu, driver: "radv", driver_info: "Mesa 22.3.6", backend: Vulkan }
GH issues replaced my paste with something else??
Update: it turns out DirectionalLight is the problem. When I disable the only one in the scene (for simulating the sun), it seems to be fixed. There are no issues with other types of lights.
The first thing I'm noticing is the amount of time spent encoding shadow passes, as you've mentioned in the issue description. This seems like it's an immediate result of the cascaded shadow map changes.
Update: it turns out DirectionalLight is the problem. When I disable the only one in the scene (for simulating the sun), it seems to be fixed. There are no issues with other types of lights.
Do the other lights cast shadows?
Nope, but I just enabled them and there seems to be no perf impact.
Here is a gpu trace, can be viewed with the RadeonGPUProfiler:
Seems to me like all the shadow cascade passes have barely any gpu utilization, but I am too much of a noob in RGP to figure out why its stalled.
Looking at the passes, it seems only vertex shader units are utilized, but the actual hw utilization is about nicely 0%-1% for memory, cache, alu, load-store. And digging into instruction timing, there are some s_waitcnt instructions which show that most of the time is spent waiting in them.
I was going to ask about your scene and materials. I see from the Tracy trace that you’re using voxels- can you please provide a lot more information about how exactly your voxels are rendered, the scene being rendered (number of entities with meshes), what material(s) you’re using.
Sry for the late update:
And here are some stats of the test scene:
total meshes: 5506 total primitives: 1963452 avg primitives / mesh: 356
And here a picture of the camera for comparing the engine versions, and where I captured the traces:
I just changed around some of the settings for my chunk renderer. I reduced the number of chunks by 4 times, and compensated by making the chunks larger. The goal was to reduce number of total meshes.
It seems that partially fixes the performance problem, while still rendering somewhat the same geometry.
Was my previous approach with so many unique meshes unreasonable? I do not have much experience with this stuff.
Currently the way bevy draws meshes is one draw command per mesh. I’ve been experimenting, learning, and looking into what are called ‘batching’ and ‘instancing’ #89 to use far fewer draw commands for drawing the same things because fewer draw commands for drawing the same things can bring very large performance benefits. Within the constraints of current bevy rendering with one draw per mesh, trying to merge meshes and reduce the draw count will improve performance. I’m speaking loosely here as there are many things that can improve rendering performance and voxels in particular have many different possible rendering techniques. How many voxels are there in your scene?
The "volume" of loaded voxels in the scene consists of 67108864 voxels. Or course much of that is empty and does not actually generate any meshes.
I know about instancing, but I doubt it would be helpful in this "minecraft" kind of voxel rendering. I think its not possible to have different geometry for instancing?
Merging the meshes however seems to make sense, since it already improved by a lot by adjusting my parameters.
I think the gpu might just struggle to actually use all of its resources for vertex shading, when there are too many draw calls with not enough actual primitives inside of them. I compared the the utilization in gpu profiler, and it got 2-4 times better now.
What is strange however is the fact that old bevy was still a lot better, even with the smaller chunks & more total meshes. I guess the extra render passes which all invoke the vertex shaders again makes the cost of it just go up a lot.
Maybe its possible to just run the render passes of cascade shadow map at the same time? Not sure how exactly the algorithm looks like in bevy, but I imagine they could run in parallel, and increase the GPU utilization again to fix the performance?
@superdump I think this can be closed, its still possible to get good performance in bevy 0.10, it just needs some tweaking in the game code.
And I prefer the new cascaded shadow maps a lot over the old directional lights. Now it actually affects the whole scene I have, not just a large part of it.
Bevy version
bevy 0.10
[Optional] Relevant system information
What you did
I upgraded to bevy 0.10
What went wrong
After upgrading my project to bevy-0.10, the performance suffered a lot. Release mode builds slowed down in some situations from 140+fps to below 100. In debug builds with dependency crates still set to opt level 3 (for development speed), the performance degraded from 140+fps to below 20 in some situations, an never touching more than 60.
I know that changing engine related code can result in slower performance in some situations, but I did not expect it to slowdown quite as much. Especially since before even IGPU could run my project easily with 60fps.
Additional information
I changed my project a bit to make situations better reproducible (disabled dynamic map generation), and made two profiles for before / after comparison.
Here is the file: https://github.com/ruabmbua/bevy-traces/blob/main/traces.tar.xz
First look at the traces tells me that CommandEncoder::run_render_pass is the problem. Its 20ms instead of 900µs.