floooh / sokol

minimal cross-platform standalone C headers
https://floooh.github.io/sokol-html5
zlib License
6.82k stars 475 forks source link

Slow default pass begin/end calls #867

Closed ArtemkaKun closed 1 year ago

ArtemkaKun commented 1 year ago

Hi, I'm creating a game with sokol in V (there is an "official" wrappers for sokol called gg). I need to have a stable 144 FPS on mobile and currently my game us struggling with that.

When I profiled a game, it seems sokol's begin/end functions take a lot of time to complete, even if I'm not rendering anything.

I'm not sure if I profiled correctly, but why gg begin/end functions takes so much times to execute?

Function: gg__Context_end, Self Time: 6897599ns, Calls: 4235
Function: sokol__gfx__begin_default_pass, Self Time: 6895704ns, Calls: 4235

Please, ask me questions if you have any, since the area of the problem is so wide that I don't know where to start.

Right now I'm using sokol app and probably sokol_gl "API" instead of raw gfx calls. On mobile I'm using gles 3 (V version of sokol headers are outdated for 7 months, so there is still gles 2 support).

ArtemkaKun commented 1 year ago

Ok, the problem exists because of vsync. I disabled it with vblank_mode=0 env var. Is there a way to disable vsync on Android?

floooh commented 1 year ago

Not hitting the display refresh rate with an empty render pass sounds strange, it shouldn't be related to vsync.

For typical rendering work it's not unusual in GL that expensive actions are delayed until later in the frame though, measuring individual GL calls is pretty much useless unfortunately because it's entirely unpredictable what happens inside (e.g. a command buffer with queued up work might need to be flushed which then causes an otherwise cheap function to look very expensive).

Self Time: 6897599ns, Calls: 4235

Does this mean 6897599ns for one call, or for 4235 calls? If it is for all calls, then I wouldn't call that all that surprising, especially on a mobile device (it would mean 1600 nanoseconds for one call, which is 1.6 microseconds, which is 0.0016 milliseconds for up to around 10 GL calls that might happen inside sg_begin_pass().

If I would analyze the problem I would start with searching for Android graphics debuggers/profilers which could help getting a clearer picture where the time is actually spent on the GPU side (sometimes mobile GPU vendors offer such tools).

PS: not sure how vsync can be disabled programmatically on Android, but I would be careful with that because if not throttled otherwise that's a sure way to heat up the device and empty the battery very quickly.

ArtemkaKun commented 1 year ago

Thanks for the response 👍

Facts so far:

floooh commented 1 year ago

If the 6.9 million nanoseconds (or 6.9 milliseconds) is the time for one call to sg_begin_default_pass() with vsync enabled, then this looks like the underlying GL implementation is most likely waiting for a swapchain surface to become available for clearing (e.g. the GPU has run ahead so much that no free swapchain surfaces are available because they are all currently waiting to be presented, and as soon as a flip happened, the oldest swapchain surface becomes available again and the CPU-side render loop can continue).

This 6.9 milliseconds is exactly the frame duration for a 144Hz refresh rate, so that would totally make sense, and it also explains why the time goes down when disabling vsync (because then the swapchain will just run unthrottled and flip through the surfaces as fast as possible, which will essentially discard most rendered frames without being shown).

This 6.9ms wait doesn't indicate a problem though, at some point the CPU-side render loop needs to be throttled to the vsync frequency even if there's nothing to render (and as I said above, where this waiting happens is pretty much unpredictable in GL). As soon as you start to add some noticeable rendering payload, the time spent waiting in sg_begin_default_pass() should go down as long as the "rendering payload" fits into 6.9ms, above that and you'd start to miss frames.

ArtemkaKun commented 1 year ago

Thanks for extended explanation. So I assume these profile results don't indicate any problem with rendering and this issue can be closed