bevyengine / bevy

A refreshingly simple data-driven game engine built in Rust
https://bevyengine.org
Apache License 2.0
36.05k stars 3.56k forks source link

Performance regression in many_buttons since "render-set reorder" #9971

Closed rparrett closed 2 weeks ago

rparrett commented 1 year ago

Bevy version

since #9236

Relevant system information

AdapterInfo { name: "Apple M1 Max", vendor: 0, device: 0, device_type: IntegratedGpu, driver: "", driver_info: "", backend: Metal }
SystemInfo { os: "MacOS 13.5.1 ", kernel: "22.6.0", cpu: "Apple M1 Max", core_count: "10", memory: "64.0 GiB" }

What you did

Benchmarking in release mode for 500 frames and a tweaked FrameTimeDiagnosticsPlugin that keeps every sample for averaging.

Benchmarked before/after some PRs that seemed like they could affect this. Clearly missed some.

What went wrong

tag fps vs 0.11.3
0d23d71 0.11.3 54.98
e8b3892 pre-9236 55.68 🟒 +1.3%
4f1d9a6 render-set reorder 37.76 πŸ”΄ -31.3%
40c6b3b pre-9497 40.18 πŸ”΄ -26.9%
02b520b4e 9497 split computedvisibility 40.27 πŸ”΄ -26.8%
c8f61c3 pre-9712 41.80 πŸ”΄ -24.0%
73447b6 9712 many_buttons enhancements 43.54 πŸ”΄ -20.8%
e60249e pre-9685 43.48 πŸ”΄ -20.9%
5c884c5 9685 autobatching 42.47 πŸ”΄ -22.8%
35d3213 pre-9903 42.15 πŸ”΄ -23.3%
b6ead2b 9903 entityhashmap 37.15 πŸ”΄ -32.4%
2b5dac4 (unmerged) 9610 ui batching fix 44.47 πŸ”΄ -19.1%
superdump commented 1 year ago

I was going to try to make use of an instance rate vertex buffer for UI in the same way as for sprites but it was a bit too complicated to figure out. @ickshonpe has a bunch of performance PRs open for UI though as I recall.

superdump commented 1 year ago

Moving from the current vertex buffer approach to using an instance-rate vertex buffer gave a 40% improvement, which if it would carry over to UI, and I think it should, would get us back up to 50fps and there'd be no overall regression.

Then longer-term I intend to work on getting rid of as many lookups as possible such that the performance losses from the render set reorder architectural change 'requiring' to do random access lookups in the prepare set, and using EntityHashMap for those lookups, will go away. My goal is to get to iterating over items in Vecs.

superdump commented 1 year ago

So, basically, implement https://github.com/bevyengine/bevy/pull/9597 for UI.

ickshonpe commented 1 year ago

Thinking about this, maybe it's a non-problem and implementing new features should be our focus for now.

I've not seen a single user complaining about Bevy UI's performance. Their requests are always for something missing: border-radius, shadows, improved text rendering (fixing the font texture atlas problems would be huge), text support for relative units, gradients, multi-window support, world space UI, nine-patching, clipping of rotated elements, overflow scroll support, etc.

superdump commented 1 year ago

Well, it depends. Maybe developing a lot of features without paying attention to performance would lead to rewriting a bunch of things later to try to get more performance. Whereas laying a foundation that performs well and building features within those constraints may be long term simpler

ickshonpe commented 12 months ago

Well, it depends. Maybe developing a lot of features without paying attention to performance would lead to rewriting a bunch of things later to try to get more performance. Whereas laying a foundation that performs well and building features within those constraints may be long term simpler

I sort of agree and that's why I've been grinding away producing a lot of UI PRs that fix obscure bugs or marginally improve performance for the last nine months or so. For 0.13 though I think it's been long enough, Bevy UI has existed for years and yet all we support rendering-wise is axis-aligned rectangles and text.

I'm not very knowledgeable about rendering but my naive view of how things work is that we have four main areas that matter for UI performance:

  1. The UI layout and text systems that compute all the geometry and draw order.

  2. Extraction, which should be as simple as possible. Produces a list of ExtractedUiNodes (or whatever) for the UI renderer to draw.

  3. The Bevy rendering systems, queue and prepare and the render pass draw functions. They take all the extracted geometry, convert it into a list of vertices, and package the vertices together with the required rendering assets into batches that are sent to the GPU.

  4. The shader programs that run on the GPU.

And (3) should mostly (ideally?) be completely decoupled from what we do in (1), (2), and (4). In (2) we might be able to better order and group items to reduce the amount of work needed in (3), and in (4) there might be changes to the vertex shaders if we use instancing, but they are relatively trivial differences.

superdump commented 12 months ago

Yup. I agree with that. My goal is to enable the UI rendering to be fast as well as the interface to be flexible for what UI functionality needs to be built. I aim to mostly be concerned about performance and rendering architecture and API usage.

I don’t know much about how UI rendering works really other than from poking at browsers and how they render web pages. I had a chat with @StarArawn on Discord in the ui-dev channel the other week and we seemed aligned about UI being able to render layers and composite them, doing alpha/β€˜opacity’ blending so that groups of things can have the same opacity and be rendered efficiently. I think we realised that UI might need some abstractions for handling common patterns where the camera/view APIs aren’t the most ergonomic design for rendering a group of things to a texture and then compositing those layers.

In any case, perhaps it’s a good idea to make some longer term design. I think @cart wants to do this anyway. And then I can try to support with figuring out how to do the rendering bits to align with how I see the rest of the renderer developing. :)

rparrett commented 2 weeks ago

Don't think these benchmarks are particularly relevant anymore.