Open John-Nagle opened 8 months ago
Looking at frame variation.
Frame 5274 took 78ms. No one thing dominates.
Frame 5340 took 17ms. Essentially the same content, but no content loading in progress.
Frame 5261 took 16ms. Best frame. Tonemapping only ran 4us.
Looking at device_create_bind_group in wgpu core, it's usually very fast, but sometimes it is slow. It doesn't do anything with data-dependent size, but it does lock trackers, with .device.trackers.lock(). Could lock contention with the update threads there be slowing the main thread?
Here's the fastest frame, 16ms:
Here's the slowest frame, 77ms:
Tracy performance trace files can be downloaded from https://animats.com/sl/misc/ The one above is vallone2.trace.tracy. Note, 25MB. This can be read with Tracy Profiler 0.10. That gives detailed access to the performance data.
Places that look slow in the slow example above:
Most of the stalls appear to be associated with Device::create_bind_group.
Possible performance bottlenecks,with GPU culling removed:
Texture create view - seems that the loader threads can stall the render thread for a while at a texture create.
Mesh create conflict on create buffer?
None of these alone kill performance but each seems to add 5-10ms to a frame.
Looking at
Notes
I'm running Tracy on my real Sharpview program now, to see what performance looks like. Here's an overview
This shows how long the "Redraw" event from Rend3-framework took. Mean time 15ms, which is great. Most of the frame times are close to that. But there are many outliers. Slowest time 110ms, and lots of frames took 50ms-70ms to render.
Basic rendering is probably fast enough. There's concurrent content loading in progress, and the outliers are probably related to that.
I have the whole trace file (25MB) saved, if you want that. Then you can look at it yourself with Tracy.