floooh / sokol

minimal cross-platform standalone C headers
https://floooh.github.io/sokol-html5
zlib License
6.54k stars 469 forks source link

Use retained command buffer in sokol_gfx #981

Closed danielchasehooper closed 4 months ago

danielchasehooper commented 5 months ago

We just ran into a tricky async bug that came down to the fact that sokol_gfx uses metal's commandBufferWithUnretainedReferences.

From looking at sokol_gfx code, I don't see why commandBufferWithUnretainedReferences couldn't be changed to commandBuffer. Making that change would dramatically simplify sokol's memory management and make it easier for users of the library to use the underlying commandBuffer directly. the change would allow the removal of _sg_mtl_release_resource and all related code that extends lifetimes until a certain frame number.

Am I missing something that prevents a switch to commandBuffer from being better all around?

related: #762

floooh commented 5 months ago

There's a pretty big performance overhead with memory-managed command buffers unfortunately.

Best we could do is to add a Metal-specific config flag to sg_setup() which then just replaces the commandBufferWithUnretainedReferences with commandBuffer.

(e.g. next to mtl_force_managed_storage_mode a mtl_use_retained_command_buffers)

I would keep the resource life-time-extension code in place though even when retained command buffers are used to not have the two code paths diverge too much.

PS: that second retained command buffer I added in https://github.com/floooh/sokol/issues/762 didn't actually solve the problem in some cases btw. It might be that swapchain resources are already released by the command buffer before the swapchain code in MTKView is done with them (see https://github.com/floooh/sokol/issues/872)

danielchasehooper commented 5 months ago

What is the overhead due to? Is it more than an increment and decrement of the refcount for each resource?

floooh commented 5 months ago

Here's an old blog post (2016), in a stress-test the overhead (measured in the Instruments CPU profiler) for refcounting (time spent in objc_retain and objc_release calls relative to all other executed code) dropped from 20% (!) to 7% just by switching to commandBufferWithUnretainedReferences (the remaining 7% overhead could then be completely eliminated by only updating buffer offsets instead of "rebinding" buffers for each uniform update - those remaining 7% were mainly a side effect of using ARC though):

https://floooh.github.io/2016/01/14/metal-arc.html

If you just do a couple hundred draw calls, the refcounting overhead shouldn't matter much though (relative to the frame time at least).

(those findings were also the main reason why I changed the sokol headers to not require ARC anymore, because ARC needs so much hand holding to keep performance under control that manual refcounting is actually less hassle).

floooh commented 4 months ago

Note: this will be fixed as 'drive-by-fix' in https://github.com/floooh/sokol/pull/985 via a new runtime config item sg_desc.mtl_use_command_buffer_with_retained_references.