floooh / sokol

minimal cross-platform standalone C headers
https://floooh.github.io/sokol-html5
zlib License
6.82k stars 475 forks source link

Resizing on metal-macos causes metal validation errors when high_dpi is enabled in sokol_app #872

Closed sconybeare closed 1 year ago

sconybeare commented 1 year ago

Hi, thanks for the excellent graphics library!

I noticed that when using the xcode metal debugging tools, my program was crashing on resize. Here's the most minimal reproduction I could find (basically the triangle example with extra stuff removed and high_dpi=true in sokol_main):

#define SOKOL_IMPL
#define SOKOL_METAL
#include "sokol_app.h"
#include "sokol_gfx.h"
#include "sokol_glue.h"
#include "triangle.glsl.h"

static struct {
  sg_pipeline pip;
  sg_bindings bind;
  sg_pass_action pass_action;
} state;

static void init(void) {
  sg_setup(&(sg_desc) {
    .context = sapp_sgcontext(),
  });

  float vertices[] = {
    0.0f, 0.5f, 0.5f, 1.0f, 0.0f, 0.0f, 1.0f,
    0.5f, -0.5f, 0.5f, 0.0f, 1.0f, 0.0f, 1.0f,
    -0.5f, -0.5f, 0.5f, 0.0f, 0.0f, 1.0f, 1.0f,
  };
  state.bind.vertex_buffers[0] = sg_make_buffer(&(sg_buffer_desc) {
    .data = SG_RANGE(vertices),
    .label = "triangle-vertices",
  });
  sg_shader shd = sg_make_shader(triangle_shader_desc(sg_query_backend()));
  state.pip = sg_make_pipeline(&(sg_pipeline_desc) {
    .shader = shd,
    .layout = {
      .attrs = {
        [ATTR_vs_position].format = SG_VERTEXFORMAT_FLOAT3,
        [ATTR_vs_color0].format = SG_VERTEXFORMAT_FLOAT4,
      },
    },
    .label = "triangle-pipeline",
  });

  state.pass_action = (sg_pass_action) {
    .colors[0] = { .load_action=SG_LOADACTION_CLEAR, .clear_value={ .a = 1.0f }},
    .depth = { .load_action=SG_LOADACTION_CLEAR, },
  };
}

static void frame(void) {
  sg_begin_default_pass(&state.pass_action, sapp_width(), sapp_height());
  sg_apply_pipeline(state.pip);
  sg_apply_bindings(&state.bind);
  sg_draw(0, 3, 1);
  sg_end_pass();
  sg_commit();
}

void cleanup(void) {
  sg_shutdown();
}

sapp_desc sokol_main(int argc, char **argv) {
  return (sapp_desc) {
    .width = 640,
    .height = 480,
    .init_cb = init,
    .frame_cb = frame,
    .cleanup_cb = cleanup,
    .high_dpi = true,
  };
}

with this shader code:

@vs vs
in vec4 position;
in vec4 color0;

out vec4 color;

void main() {
    gl_Position = position;
    color = color0;
}
@end

@fs fs
in vec4 color;
out vec4 frag_color;

void main() {
    frag_color = color;
}
@end
@program triangle vs fs

Running this with MTL_DEBUG_LAYER=1 set and then resizing the window results in a crash with the following text printed out:

2023-08-25 00:39:28.618 triangle[82276:3267875] Metal API Validation Enabled
-[MTLDebugDevice notifyExternalReferencesNonZeroOnDealloc:]:2885: failed assertion `The following Metal object is being destroyed while still required to be alive by the command buffer 0x14f02ce00 (label: <no label set>):
<MTLToolsObject: 0x14d750440> -> <AGXG14XFamilyTexture: 0x14d750960>
    label = MTKView Depth Stencil 
    textureType = MTLTextureType2D 
    pixelFormat = MTLPixelFormatDepth32Float_Stencil8 
    width = 1280 
    height = 960 
    depth = 1 
    arrayLength = 1 
    mipmapLevelCount = 1 
    sampleCount = 1 
    cpuCacheMode = MTLCPUCacheModeDefaultCache 
    storageMode = MTLStorageModePrivate 
    hazardTrackingMode = MTLHazardTrackingModeTracked 
    resourceOptions = MTLResourceCPUCacheModeDefaultCache MTLResourceStorageModePrivate MTLResourceHazardTrackingModeTracked  
    usage = MTLTextureUsageRenderTarget 
    shareable = 0 
    framebufferOnly = 0 
    purgeableState = MTLPurgeableStateNonVolatile 
    swizzle = [MTLTextureSwizzleRed, MTLTextureSwizzleGreen, MTLTextureSwizzleBlue, MTLTextureSwizzleAlpha] 
    isCompressed = 1 
    parentTexture = <null> 
    parentRelativeLevel = 0 
    parentRelativeSlice = 0 
    buffer = <null> 
    bufferOffset = 0 
    bufferBytesPerRow = 0 
    iosurface = 0x0 
    iosurfacePlane = 0 
    allowGPUOptimizedContents = YES'
zsh: abort      MTL_DEBUG_LAYER=1 ./triangle

sokol headers used were from sokol commit 47d92ff86298fc96b3b84d93d0ee8c8533d3a2d2, which is the tip of master at the time of posting

floooh commented 1 year ago

Hmm, I've seen this exact same problem after updating to macOS 13.x around the end of 2022:

https://github.com/floooh/sokol/issues/762

...but this issue was fixed by this MR:

https://github.com/floooh/sokol/pull/763

I'll try to reproduce the problem in the evening (it's strange because my Mac is my main development platform, and I should have noticed such an obvious issue).

In the meantime, can you check if the problem still happens when you change the commandBufferWithUnretainedReferences call here:

https://github.com/floooh/sokol/blob/47d92ff86298fc96b3b84d93d0ee8c8533d3a2d2/sokol_gfx.h#L11495C49-L11495C86

...to:

_sg.mtl.cmd_buffer = [_sg.mtl.cmd_queue commandBuffer];

(so that it looks like in the line below where the separate present_cmd_buffer is created.

floooh commented 1 year ago

...cannot reproduce on my Mac with some furious resizing btw (so far I just enabled .high_dpi in the triangle sample in sokol-samples), neither when running from the Xcode debugger (this enables the validation layer by default), or running from the command line with MTL_DEBUG_LAYER enabled.

I'm on an M1 14" MBP with macOS Ventura 13.4.1 (c) (22F770820d), Xcode version is 14.2

PS: do you compile with ARC enabled or disabled? (it shouldn't matter but since it is some sort of lifetime issue it could be related)

floooh commented 1 year ago

...let me update to the latest xcode version...

floooh commented 1 year ago

Ok, I finally got the error too... took a long while of wildly resizing though, and I can't seem to reproduce it reliably.

I also updated to the latest Xcode version, but it's probably not related.

I'll try to investigate later today. I was hoping that just doing the present call in a separate command buffer with retained references would be enough to fix the issue, but looks like that's not enough.

It's also complicated by the fact the resource in question is created and managed by MTKView, which I don't have access to. Worst case would be to do all rendering in a command buffer with retained references, but maybe there's another solution (e.g. maybe I can extract the depth-stencil-surface from the MTLRenderPassDescriptor I'm getting from MTKView, and add a manual retain/release call somewhere in sokol-gfx - but that doesn't sound like a robust solution, I would expect that MTKView takes care of the lifetimes of the objects is manages).

-[MTLDebugDevice notifyExternalReferencesNonZeroOnDealloc:]:2885: failed assertion `The following Metal object is being destroyed while still required to be alive by the command buffer 0x101059800 (label: <no label set>):
<MTLToolsObject: 0x100e49500> -> <AGXG13XFamilyTexture: 0x100e4eab0>
    label = MTKView Depth Stencil 
    textureType = MTLTextureType2D 
    pixelFormat = MTLPixelFormatDepth32Float_Stencil8 
    width = 2892 
    height = 44 
    depth = 1 
    arrayLength = 1 
    mipmapLevelCount = 1 
    sampleCount = 1 
    cpuCacheMode = MTLCPUCacheModeDefaultCache 
    storageMode = MTLStorageModePrivate 
    hazardTrackingMode = MTLHazardTrackingModeTracked 
    resourceOptions = MTLResourceCPUCacheModeDefaultCache MTLResourceStorageModePrivate MTLResourceHazardTrackingModeTracked  
    usage = MTLTextureUsageRenderTarget 
    shareable = 0 
    framebufferOnly = 0 
    purgeableState = MTLPurgeableStateNonVolatile 
    swizzle = [MTLTextureSwizzleRed, MTLTextureSwizzleGreen, MTLTextureSwizzleBlue, MTLTextureSwizzleAlpha] 
    isCompressed = 1 
    parentTexture = <null> 
    parentRelativeLevel = 0 
    parentRelativeSlice = 0 
    buffer = <null> 
    bufferOffset = 0 
    bufferBytesPerRow = 0 
    iosurface = 0x0 
    iosurfacePlane = 0 
    allowGPUOptimizedContents = YES'
floooh commented 1 year ago

Can you give the attempted fix in the branch issue872-metal-validation-error a try? This reverts the previous solution to use a separate command buffer with retained references only for the present call, and instead pins the swapchain resources into memory via the regular 'deferred release queue' which I'm already using for all other Metal resource objects.

See this change: https://github.com/floooh/sokol/compare/master...issue872-metal-validation-error

This means there's no refcounting overhead (because the command buffer is stil created with unretained-references), but slightly increases memory usage during window resize (but that's expected to happen, because swapchain resources which had been released by MTKView during the window resize need to stick around a few frames before it's safe to delete them).

sconybeare commented 1 year ago

That seems to fix it for me, too. Thank you!

floooh commented 1 year ago

Ok, I think I'll merge the fix into master some time over the weekend (or latest on Monday). Will close this issue when the fix is merged.

floooh commented 1 year ago

Closing this because PR https://github.com/floooh/sokol/pull/873 has been merged. Thanks for the bug report!