Open SUPERCILEX opened 1 year ago
Bevy is lot faster at drawing rectangles with SpriteBundle
because it can batch draw calls. That's not yet implemented (see #89) for Mesh2d
which bevy_prototype_lyon
uses under the hood.
But a single SpriteBundle
wouldn't quite be apples-to-apples because the other benchmark is drawing rectangles with borders.
If I were trying to game this benchmark, I would try a version with two SpriteBundles
for each object. One for the border and another just above for the fill.
Like this? https://github.com/SUPERCILEX/bevy-vs-pixi/commit/e94e9a6ebc8eaa46c7e365af5f48236e7ab1a603
Sadly no dice. Now the trace item that seems to be causing stuttering is bevy_core_pipeline::core_2d::main_pass_2d_node
.
Like this? https://github.com/SUPERCILEX/bevy-vs-pixi/commit/e94e9a6ebc8eaa46c7e365af5f48236e7ab1a603
Precisely. That's a ~2x speedup on my machine, so "some dice," perhaps.
I think we're also seeing poor batching due to rectangles being given random z values over the entire range of f32
though.
edit: or perhaps as a result of the "throwing another sprite of a different color on top" thing, actually.
It seems like we're seeing poor batching for... some reason though.
Haha,
Bevy is batching poorly because one of the sprites is Color::WHITE
, which is treated differently because (it's the default for textured sprites and so it doesn't need vertex colors?).
Make your rectangles GRAY for another ~2x speedup.
With that change, this is now (on my machine) 2x faster than pixi (native, no lto), but 4x slower than pixi (web)
But we are at a disadvantage due to drawing two things per rectangle.
over the entire range of f32 though.
gen
is [0, 1)
for floats. But this is actually it!
Make your rectangles GRAY and they should go brr.
Wat. Lol, this also fixes it.
But we are at a disadvantage due to drawing two things per rectangle.
Yeah, once instancing is a thing I'll go back to lyon.
Tweak comparisons:
All of the "Layered sprites, 0 Z" categories are about the same +- some noise, and same goes for Zs with GRAY.
Fixing the WHITE bug will bring uniformity to the Zs, but I do find the two tests I highlighted pretty odd. Why would offsetting the base sprite lead to such a significant drop in performance? Is there a better way to glue these sprites together?
gen is [0, 1) for floats. But this is actually it!
Yeah, I think that forcing fewer layers just masked the "white color" problem.
Does that explain the (less severe) perf drop with gray though?
I think there may be an additional interesting thing happening with regard to the "colored" vs "non-colored" sprites breaking up batches and that not being factored into the pre-batch sorting.
I know that sorting itself also usually comes up when profiling this, and that may have very different characteristics when the set of z values is mostly random vs. mostly the same.
Checking in on this in light of recent rendering changes. Situation doesn't seem great.
M1 Max (native), LTO disabled, 64k rects
version | fps |
---|---|
0.11.2 | 52 |
main e8b3892 (pre-#9236) | 52 |
main 4f1d9a6 (#9236) | 30 |
main 87f7d01 (latest) | 38 |
batching #9685 | 33 |
The good news is that the "white minus epsilon hack" is no longer needed. (same fps with white)
I think I’m going to have to have a look at what this example does because bevymark ended up faster than main pre-#9236 from my previous testing.
I think that the difference can be attributed to sorting, although I don't understand how sorting behavior would have actually changed in 9236.
But if I modify bevymark to use a random z value instead of an incremental one, I see the same dive in performance after 9236.
You mean you see in traces that it is due to sorting?
Ok. #9236 did two things:
Checking in again, bevy 0.12-dev is looking pretty solid!
64k rects, mac m1 max, chrome 118, no lto, no strip, no opt-s.
note: bevy is actually drawing 128k sprites here.
engine | fps |
---|---|
pixi | 20.8 |
bevy 11.2 webgl2 | 15.43 |
bevy 11.2 webgpu | 14.30 |
bevy 11.2 native | 46.31 |
bevy main webgl2 | 27.83 |
bevy main webgpu | 29.15 |
bevy main native | 59.07 |
bevy main native (single mesh2d) | 11.08 |
bevy main webgpu (single mesh2d) | 6.46* |
bevy main webgl2 (single mesh2d) | 3.19 |
*crashed
Unfortunately, it doesn't seem like automatic batching has helped out much with mesh2d for this particular benchmark.
For reference in my mesh2d tests I'm just building meshes on demand with this function and spawning a MaterialMesh2dBundle
.
For reference in my mesh2d tests I'm just building meshes on demand with this function and spawning a MaterialMesh2dBundle.
It seems that this strategy was never going to work -- every entity having a unique mesh is a dealbreaker for batching. Maybe a custom material with a simple vertex shader would help, (or something more involved to do instancing?) but that's not really "stock bevy" anymore so perhaps two sprites per box is as good as it gets.
I think this is fixed?
Bevy version
0.10
[Optional] Relevant system information
What you did
https://github.com/SUPERCILEX/bevy-vs-pixi
Web is broken right now, so just use native
cargo r --release
(and probably remove lto so the build is faster).What went wrong
Performance is unacceptably bad (I can't even hit 30fps with 2000 rectangles even though Pixi can handle 8000 rectangles at ~50 fps) and I'm not sure why unfortunately. It seems like a lot of time in a trace is spent during extraction? https://github.com/bevyengine/bevy/blob/b6b549e3ff1ace18712ca771ee6233976074800b/crates/bevy_render/src/extract_component.rs#L123
I'd really appreciate if someone could investigate this. Otherwise, any pointers on what could be the root cause would be great.