bevyengine / bevy

A refreshingly simple data-driven game engine built in Rust
https://bevyengine.org
Apache License 2.0
34.28k stars 3.35k forks source link

Poor performance on basic rectangle benchmark #8100

Open SUPERCILEX opened 1 year ago

SUPERCILEX commented 1 year ago

Bevy version

0.10

[Optional] Relevant system information

AdapterInfo { name: "Intel(R) UHD Graphics (CML GT2)", vendor: 32902, device: 39876, device_type: IntegratedGpu, driver: "Intel open-source Mesa driver", driver_info: "Mesa 22.3.5", backend: Vulkan }
SystemInfo { os: "Linux 22.04 Pop!_OS", kernel: "6.2.0-76060200-generic", cpu: "Intel(R) Core(TM) i7-10875H CPU @ 2.30GHz", core_count: "8", memory: "62.5 GiB" }

What you did

https://github.com/SUPERCILEX/bevy-vs-pixi

Web is broken right now, so just use native cargo r --release (and probably remove lto so the build is faster).

What went wrong

Performance is unacceptably bad (I can't even hit 30fps with 2000 rectangles even though Pixi can handle 8000 rectangles at ~50 fps) and I'm not sure why unfortunately. It seems like a lot of time in a trace is spent during extraction? https://github.com/bevyengine/bevy/blob/b6b549e3ff1ace18712ca771ee6233976074800b/crates/bevy_render/src/extract_component.rs#L123

I'd really appreciate if someone could investigate this. Otherwise, any pointers on what could be the root cause would be great.

rparrett commented 1 year ago

Bevy is lot faster at drawing rectangles with SpriteBundle because it can batch draw calls. That's not yet implemented (see #89) for Mesh2d which bevy_prototype_lyon uses under the hood.

But a single SpriteBundle wouldn't quite be apples-to-apples because the other benchmark is drawing rectangles with borders.

If I were trying to game this benchmark, I would try a version with two SpriteBundles for each object. One for the border and another just above for the fill.

SUPERCILEX commented 1 year ago

Like this? https://github.com/SUPERCILEX/bevy-vs-pixi/commit/e94e9a6ebc8eaa46c7e365af5f48236e7ab1a603

Sadly no dice. Now the trace item that seems to be causing stuttering is bevy_core_pipeline::core_2d::main_pass_2d_node.

rparrett commented 1 year ago

Like this? https://github.com/SUPERCILEX/bevy-vs-pixi/commit/e94e9a6ebc8eaa46c7e365af5f48236e7ab1a603

Precisely. That's a ~2x speedup on my machine, so "some dice," perhaps.

I think we're also seeing poor batching due to rectangles being given random z values over the entire range of f32 though.

edit: or perhaps as a result of the "throwing another sprite of a different color on top" thing, actually.

It seems like we're seeing poor batching for... some reason though.

rparrett commented 1 year ago

Haha,

Bevy is batching poorly because one of the sprites is Color::WHITE, which is treated differently because (it's the default for textured sprites and so it doesn't need vertex colors?).

Make your rectangles GRAY for another ~2x speedup.

rparrett commented 1 year ago

With that change, this is now (on my machine) 2x faster than pixi (native, no lto), but 4x slower than pixi (web)

But we are at a disadvantage due to drawing two things per rectangle.

SUPERCILEX commented 1 year ago

over the entire range of f32 though.

gen is [0, 1) for floats. But this is actually it!

Make your rectangles GRAY and they should go brr.

Wat. Lol, this also fixes it.

But we are at a disadvantage due to drawing two things per rectangle.

Yeah, once instancing is a thing I'll go back to lyon.


Tweak comparisons:

All of the "Layered sprites, 0 Z" categories are about the same +- some noise, and same goes for Zs with GRAY.

Fixing the WHITE bug will bring uniformity to the Zs, but I do find the two tests I highlighted pretty odd. Why would offsetting the base sprite lead to such a significant drop in performance? Is there a better way to glue these sprites together?

rparrett commented 1 year ago

gen is [0, 1) for floats. But this is actually it!

Yeah, I think that forcing fewer layers just masked the "white color" problem.

SUPERCILEX commented 1 year ago

Does that explain the (less severe) perf drop with gray though?

rparrett commented 1 year ago

I think there may be an additional interesting thing happening with regard to the "colored" vs "non-colored" sprites breaking up batches and that not being factored into the pre-batch sorting.

I know that sorting itself also usually comes up when profiling this, and that may have very different characteristics when the set of z values is mostly random vs. mostly the same.

rparrett commented 10 months ago

Checking in on this in light of recent rendering changes. Situation doesn't seem great.

M1 Max (native), LTO disabled, 64k rects

version fps
0.11.2 52
main e8b3892 (pre-#9236) 52
main 4f1d9a6 (#9236) 30
main 87f7d01 (latest) 38
batching #9685 33

The good news is that the "white minus epsilon hack" is no longer needed. (same fps with white)

superdump commented 10 months ago

I think I’m going to have to have a look at what this example does because bevymark ended up faster than main pre-#9236 from my previous testing.

rparrett commented 10 months ago

I think that the difference can be attributed to sorting, although I don't understand how sorting behavior would have actually changed in 9236.

But if I modify bevymark to use a random z value instead of an incremental one, I see the same dive in performance after 9236.

superdump commented 10 months ago

You mean you see in traces that it is due to sorting?

superdump commented 10 months ago

Ok. #9236 did two things:

rparrett commented 10 months ago

Checking in again, bevy 0.12-dev is looking pretty solid!

64k rects, mac m1 max, chrome 118, no lto, no strip, no opt-s.

note: bevy is actually drawing 128k sprites here.

engine fps
pixi 20.8
bevy 11.2 webgl2 15.43
bevy 11.2 webgpu 14.30
bevy 11.2 native 46.31
bevy main webgl2 27.83
bevy main webgpu 29.15
bevy main native 59.07
bevy main native (single mesh2d) 11.08
bevy main webgpu (single mesh2d) 6.46*
bevy main webgl2 (single mesh2d) 3.19

*crashed

Unfortunately, it doesn't seem like automatic batching has helped out much with mesh2d for this particular benchmark.

For reference in my mesh2d tests I'm just building meshes on demand with this function and spawning a MaterialMesh2dBundle.

rparrett commented 10 months ago

For reference in my mesh2d tests I'm just building meshes on demand with this function and spawning a MaterialMesh2dBundle.

It seems that this strategy was never going to work -- every entity having a unique mesh is a dealbreaker for batching. Maybe a custom material with a simple vertex shader would help, (or something more involved to do instancing?) but that's not really "stock bevy" anymore so perhaps two sprites per box is as good as it gets.

SUPERCILEX commented 8 months ago

I think this is fixed?