floooh / sokol

minimal cross-platform standalone C headers
https://floooh.github.io/sokol-html5
zlib License
6.63k stars 472 forks source link

sg_begin_default_pass on metal sometimes costs 10+ms #906

Closed cai502 closed 10 months ago

cai502 commented 10 months ago
Pasted image 20230922162838

Reproduce step:

  1. build & run
  2. do nothing after app luanched and you will see sg_begin_default_pass cost 10+ms in a few minutes
  3. if you switch between apps, sg_begin_default_pass will behave normal(cost less than 1ms)

There is no such problem on opengl.

I don't know why metal has this behavior, is this a bug or something else?

The sample I run is cube-sapp,here is the code i profiled.

    uint64_t start_time = stm_now();
    sg_begin_default_pass(&pass_action, (int)w, (int)h);
    uint64_t anim_eval_time = stm_since(start_time);
    double passed = stm_ms(anim_eval_time);
    if (passed > 1) {
        printf("sg_begin_default_pass cost %f\n", passed);
    }
floooh commented 10 months ago

IIRC this topic came up in another ticket as well. The sokol_gfx.h Metal backend does the frame synchronization in the first begin-pass of a frame here:

https://github.com/floooh/sokol/blob/b803c9a0214c6ab6dcb9cc6dd9d30d7ace4eda1e/sokol_gfx.h#L11550

E.g. at some point, sokol_gfx.h needs to wait for 'inflight resources' to become available again, and in the Metal backend there are two points where this might happen, either in that 'dispatch_semaphore_wait' call, or in sg_commit() where a new swapchain drawable is requested:

https://github.com/floooh/sokol/blob/b803c9a0214c6ab6dcb9cc6dd9d30d7ace4eda1e/sokol_gfx.h#L11715-L11719

(I guess that when the begin-pass flips to being 'fast', then sg_commit() flips to being 'slow').

TL;DR: at some point in the frame sokol-gfx needs to synchronize with vsync, and this is what you are seeing. If you add more render workload (so that the actual rendering takes longer), than this waiting period you're seeing should also decrease.

floooh commented 10 months ago

Here's that other ticket (that was GL on Linux, but all backends needs to wait for vsync somewhere, in GL it's just much less predictable where exactly that wait happens):

https://github.com/floooh/sokol/issues/867

cai502 commented 10 months ago

Oh I understand it, thanks for your detailed explanation! next question, If I want to measure how much time my code costs in every frame,I should not include sg_commit or sg_begin_default_pass?

And I want to profile sokol metal vs opengl performance on iphone and mac, I wonder is there has some benchmark sample stuff? If no, could you please give me some suggestions?

floooh commented 10 months ago

On GL, measuring performance by putting start/stop timer code around GL function calls is generally tricky, because it's unpredictable where GL might decide to 'flush the pipeline'.

I wrote a drawcall-overhead testing tool recently in the wip webgpu branch (not yet in master):

https://floooh.github.io/sokol-html5/drawcallperf-sapp.html

...idea is that you can roughly see at what point (== number of draw calls) the render loop is no longer able to hit the target frame rate. This "no longer able to hit target frame rate" gives a good idea of the CPU overhead in different backend APIs for specific rendering code.

Source code for this is here: https://github.com/floooh/sokol-samples/blob/sgfx-wgpu/sapp/drawcallperf-sapp.c

If I'm looking for specific peformance hotspots, I use CPU profilers like Instruments on macOS.

I think the Metal debugger in Xcode can also provide some performance numbers.

cai502 commented 10 months ago

I noticed drawcallperf sample. I ran that sample and found metal has much more performance than opengl on mac.

Is the test result means metal is that faster than GL?

Thank you for your advice, I'll try to use these tools.

floooh commented 10 months ago

Yeah, on Mac you're definitely better off with the Metal backend. Apple's OpenGL implementation is only what's minimally needed for backward compatibility with older applications.

I'll close this ticket btw :)

floooh commented 10 months ago

PS: also, when testing peformance, make sure the Metal validation layer is disabled (for instance it is enabled when starting in debug mode within Xcode), the validation layer easily cuts peformance by 10x.