BVE-Reborn / rend3

MAINTENCE MODE ---- Easy to use, customizable, efficient 3D renderer library built on wgpu.
https://rend3.rs
Apache License 2.0
1.07k stars 59 forks source link

Performance outliers #579

Open John-Nagle opened 8 months ago

John-Nagle commented 8 months ago

I'm running Tracy on my real Sharpview program now, to see what performance looks like. Here's an overview

redrawtrace2linear

This shows how long the "Redraw" event from Rend3-framework took. Mean time 15ms, which is great. Most of the frame times are close to that. But there are many outliers. Slowest time 110ms, and lots of frames took 50ms-70ms to render.

Basic rendering is probably fast enough. There's concurrent content loading in progress, and the outliers are probably related to that.

I have the whole trace file (25MB) saved, if you want that. Then you can look at it yourself with Tracy.

John-Nagle commented 8 months ago

Looking at frame variation.

Frame 5274 took 78ms. No one thing dominates.

Frame 5340 took 17ms. Essentially the same content, but no content loading in progress.

Frame 5261 took 16ms. Best frame. Tonemapping only ran 4us.

Looking at device_create_bind_group in wgpu core, it's usually very fast, but sometimes it is slow. It doesn't do anything with data-dependent size, but it does lock trackers, with .device.trackers.lock(). Could lock contention with the update threads there be slowing the main thread?

Here's the fastest frame, 16ms:

fastestframe

Here's the slowest frame, 77ms:

slowframe2

John-Nagle commented 8 months ago

Tracy performance trace files can be downloaded from https://animats.com/sl/misc/ The one above is vallone2.trace.tracy. Note, 25MB. This can be read with Tracy Profiler 0.10. That gives detailed access to the performance data.

John-Nagle commented 8 months ago

Places that look slow in the slow example above:

Most of the stalls appear to be associated with Device::create_bind_group.

John-Nagle commented 6 months ago

Possible performance bottlenecks,with GPU culling removed:

Texture create view - seems that the loader threads can stall the render thread for a while at a texture create.

createviewstallcombined

Mesh create conflict on create buffer?

matlmgradd

None of these alone kill performance but each seems to add 5-10ms to a frame.

John-Nagle commented 6 months ago

Looking at

https://github.com/gfx-rs/wgpu/blob/d4f30638b749f269feeb654d7880a7d845a2fff1/wgpu-core/src/device/global.rs#L785

Notes