Open AlexanderDevaikinEnscape opened 3 months ago
Had yet another try to find out what is going on and here is an interesting point: spvDescriptorSetBuffer1
which holds textures has type array<texture2d<float>, 1>
in shader while spvDescriptorSetBuffer1
is much larger in reality. Changing type in shader debugger to e.g. array<texture2d<float>, 30>
also fixes the sampling from texture with index > 0, as expected.
Shader:
Memory:
The restriction to an array length of 1
is generally because the descriptor array is defined as a runtime length array. Since the array length is not truly known at pipeline (and shader) compilation time, we define it to be an array of length 1
. Metal happily indexes beyond that length at runtime.
I am not able to replicate this using the Vulkan Samples descriptor_indexing
sample, which has a similar large array of textures, the length of which is set at runtime:
[mvk-debug] Created VkDescriptorSetLayout 0x6000034aca80 with 1 bindings:
0: VK_DESCRIPTOR_TYPE_SAMPLED_IMAGE with up to 1000000 elements at binding 0
[mvk-debug] Created VkDescriptorPool 0x122051a00 with 2 descriptor sets, and pooled descriptors:
VK_DESCRIPTOR_TYPE_SAMPLED_IMAGE: 2112 (2112 remaining)
----------------
%10 = OpTypeImage %6 2D 0 0 0 1 Unknown
%11 = OpTypeRuntimeArray %10
%12 = OpTypePointer UniformConstant %11
%13 = OpVariable %12 UniformConstant
----------------
struct spvDescriptorSetBuffer0
{
array<texture2d<float>, 1> Textures [[id(0)]];
};
struct spvDescriptorSetBuffer1
{
sampler ImmutableSampler [[id(0)]];
};
struct main0_out
{
float4 out_frag_color [[color(0)]];
};
struct main0_in
{
float2 in_uv [[user(locn0)]];
int in_texture_index [[user(locn1)]];
};
fragment main0_out main0(main0_in in [[stage_in]], constant spvDescriptorSetBuffer0& spvDescriptorSet0 [[buffer(0)]], constant spvDescriptorSetBuffer1& spvDescriptorSet1 [[buffer(1)]])
{
main0_out out = {};
out.out_frag_color = spvDescriptorSet0.Textures[in.in_texture_index].sample(spvDescriptorSet1.ImmutableSampler, in.in_uv);
return out;
}
It would help to see your descriptor definitions, and shader conversions. Can you run your environment with the following environment variables, and post the sections of the log showing the descriptor definitions (as above), and the SPIR-V, MSL, and estimated GLSL for the shader under concern?
MVK_CONFIG_DEBUG=1
MVK_CONFIG_LOG_LEVEL=4
Since you are not able to generate a repo case, can you try to modify the Vulkan Samples descriptor_indexing
sample, to cause it to trigger the same problem, and post the modifications here, so we can replicate it?
Thank you for the reply. I have tried to reproduce it with descriptor_indexing sample without luck :/ Happens only in our shadowmap pass. Array size of 1 is indeed OK, it is also the same in other shaders that do work correctly.
Info collected using debug levels. The problematic shader:
[mvk-debug] Created VkDescriptorSetLayout 0x600003d39b00 with 1 bindings:
0: VK_DESCRIPTOR_TYPE_SAMPLED_IMAGE with up to 65536 elements at binding 1
[mvk-debug] Created VkDescriptorSetLayout 0x600003d39a40 with 6 bindings:
0: VK_DESCRIPTOR_TYPE_STORAGE_BUFFER with 1 elements at binding 0
1: VK_DESCRIPTOR_TYPE_SAMPLER with 1 elements at binding 1
2: VK_DESCRIPTOR_TYPE_STORAGE_BUFFER with 1 elements at binding 2
3: VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER with 1 elements at binding 3
4: VK_DESCRIPTOR_TYPE_STORAGE_BUFFER with 1 elements at binding 4
5: VK_DESCRIPTOR_TYPE_STORAGE_BUFFER with 1 elements at binding 5
[mvk-debug] Created VkPipelineLayout 0x129f26fb0 with 2 descriptor set layouts:
0: 0x600003d39a40
1: 0x600003d39b00
---------------------
%31 = OpTypeImage %13 2D 0 0 0 1 Unknown
%32 = OpTypeRuntimeArray %31
%33 = OpTypePointer UniformConstant %32
%34 = OpVariable %33 UniformConstant
%37 = OpTypePointer UniformConstant %31
%40 = OpTypeSampler
%41 = OpTypePointer UniformConstant %40
%42 = OpVariable %41 UniformConstant
%44 = OpTypeSampledImage %31
--------------------
struct spvDescriptorSetBuffer0
{
constant void* _m0_pad [[id(0)]];
constant void* _m1_pad [[id(1)]];
sampler SceneTextureSampler [[id(2)]];
};
struct spvDescriptorSetBuffer1
{
array<texture2d<float>, 1> SceneTextures [[id(0)]];
};
static inline __attribute__((always_inline))
float4 getTextureLodValue(thread const uint& samplerId, thread const float2& uv, thread const float& lod, constant array<texture2d<float>, 1>& SceneTextures, sampler SceneTextureSampler)
{
uint _36 = samplerId;
return SceneTextures[_36].sample(SceneTextureSampler, uv, level(lod));
}
static inline __attribute__((always_inline))
void opaqueMaskedDoubleSided(constant array<texture2d<float>, 1>& SceneTextures, sampler SceneTextureSampler, thread VertexData& inoutData)
{
uint param = inoutData.maskSamplerId;
bool _61 = isValidSamplerId(param);
bool _79;
if (_61)
{
uint param_1 = inoutData.maskSamplerId;
float2 param_2 = inoutData.uv;
float param_3 = 0.0;
_79 = getTextureLodValue(param_1, param_2, param_3, SceneTextures, SceneTextureSampler).x <= 0.5;
}
else
{
_79 = _61;
}
if (_79)
{
discard_fragment();
}
}
fragment void main0(main0_in in [[stage_in]], constant spvDescriptorSetBuffer0& spvDescriptorSet0 [[buffer(0)]], constant spvDescriptorSetBuffer1& spvDescriptorSet1 [[buffer(1)]])
{
VertexData inoutData = {};
inoutData.uv = in.inoutData_uv;
inoutData.maskSamplerId = in.inoutData_maskSamplerId;
inoutData.vViewPos = in.inoutData_vViewPos;
opaqueMaskedDoubleSided(spvDescriptorSet1.SceneTextures, spvDescriptorSet0.SceneTextureSampler, inoutData);
}
And the working shader that uses the same texture array bound using the same descriptor set layout:
[mvk-debug] Created VkDescriptorSetLayout 0x600003d39b00 with 1 bindings:
0: VK_DESCRIPTOR_TYPE_SAMPLED_IMAGE with up to 65536 elements at binding 1
[mvk-debug] Created VkDescriptorSetLayout 0x600003d46640 with 7 bindings:
0: VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER with 1 elements at binding 0
1: VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER with 1 elements at binding 2
2: VK_DESCRIPTOR_TYPE_STORAGE_BUFFER with 1 elements at binding 3
3: VK_DESCRIPTOR_TYPE_SAMPLER with 1 elements at binding 4
4: VK_DESCRIPTOR_TYPE_STORAGE_BUFFER with 1 elements at binding 5
5: VK_DESCRIPTOR_TYPE_STORAGE_BUFFER with 1 elements at binding 6
6: VK_DESCRIPTOR_TYPE_STORAGE_BUFFER with 1 elements at binding 7
[mvk-debug] Created VkPipelineLayout 0x109f77380 with 2 descriptor set layouts:
0: 0x600003d46640
1: 0x600003d39b00
------------------
%853 = OpTypeImage %6 2D 0 0 0 1 Unknown
%854 = OpTypeRuntimeArray %853
%855 = OpTypePointer UniformConstant %854
%856 = OpVariable %855 UniformConstant
%859 = OpTypePointer UniformConstant %853
%862 = OpTypeSampler
%863 = OpTypePointer UniformConstant %862
%864 = OpVariable %863 UniformConstant
%866 = OpTypeSampledImage %853
------------------
struct spvDescriptorSetBuffer0
{
constant void* _m0_pad [[id(0)]];
constant CameraDataUbo* m_1254 [[id(1)]];
constant GBufferParams* m_883 [[id(2)]];
const device TextureTransformsBuffer* m_716 [[id(3)]];
sampler SceneTextureSampler [[id(4)]];
};
struct spvDescriptorSetBuffer1
{
array<texture2d<float>, 1> SceneTextures [[id(0)]];
};
static inline __attribute__((always_inline))
float4 getTextureLodValue(thread const uint& samplerId, thread const float2& uv, thread const float& lod, constant array<texture2d<float>, 1>& SceneTextures, sampler SceneTextureSampler)
{
uint _858 = samplerId;
return SceneTextures[_858].sample(SceneTextureSampler, uv, level(lod));
}
static inline __attribute__((always_inline))
float getMaskTextureValue(thread const float2& baseUvs, thread const uint& maskSamplerId, thread const MaterialIndexTransformFlag& matIndexTransformFlag, const device TextureTransformsBuffer& _716, constant array<texture2d<float>, 1>& SceneTextures, sampler SceneTextureSampler)
{
...
return getTextureLodValue(param_3, param_4, param_5, SceneTextures, SceneTextureSampler).x;
}
static inline __attribute__((always_inline))
bool isMaskedOut(thread const float2& baseUvs, thread const uint& maskId, thread const MaterialIndexTransformFlag& matIndexTransformFlag, thread const float& visibility, thread const float& maskThreshold, thread const float& NoV, const device TextureTransformsBuffer& _716, constant array<texture2d<float>, 1>& SceneTextures, sampler SceneTextureSampler)
{
float maskVisibility = 1.0;
uint param = maskId;
if (isValidSamplerId(param))
{
float2 param_1 = baseUvs;
uint param_2 = maskId;
MaterialIndexTransformFlag param_3 = matIndexTransformFlag;
float param_4 = getMaskTextureValue(param_1, param_2, param_3, _716, SceneTextures, SceneTextureSampler);
float param_5 = NoV;
maskVisibility = geometricOpacity(param_4, param_5);
}
return (visibility * maskVisibility) < (maskThreshold * 0.800000011920928955078125);
}
fragment main0_out main0(main0_in in [[stage_in]], constant spvDescriptorSetBuffer0& spvDescriptorSet0 [[buffer(0)]], constant spvDescriptorSetBuffer1& spvDescriptorSet1 [[buffer(1)]], bool gl_FrontFacing [[front_facing]], float4 gl_FragCoord [[position]])
{
...
bool discardFragment = isMaskedOut(param_8, param_9, param_10, param_11, param_12, param_13, (*spvDescriptorSet0.m_716), spvDescriptorSet1.SceneTextures, spvDescriptorSet0.SceneTextureSampler) || geometricFragmentDiscard;
...
opaqueShading(...);
...
if (discardFragment)
{
discard_fragment();
}
return out;
}
I don't see any significant difference between working and not-working shaders. Also dumped shaders using version 1.2.6 which we currently use in production where issue is not present. The main difference compared to 1.2.10 is that with 1.2.6 texture array is defined with the size of 65536:
%40 = OpTypeImage %6 2D 0 0 0 1 Unknown
%41 = OpTypeRuntimeArray %40
%42 = OpTypePointer UniformConstant %41
%43 = OpVariable %42 UniformConstant
%46 = OpTypePointer UniformConstant %40
%49 = OpTypeSampler
%50 = OpTypePointer UniformConstant %49
%51 = OpVariable %50 UniformConstant
%53 = OpTypeSampledImage %40
-------------------
struct spvDescriptorSetBuffer0
{
const device TextureTransformsBuffer* m_196 [[id(0)]];
sampler SceneTextureSampler [[id(1)]];
const device TextureHandles* m_201 [[id(2)]];
constant ShadowMapParamsBuffer* m_105 [[id(3)]];
const device InstanceDataBuffer* m_207 [[id(4)]];
const device MaterialsBuffer* m_213 [[id(5)]];
};
struct spvDescriptorSetBuffer1
{
array<texture2d<float>, 65536> SceneTextures [[id(0)]];
};
static inline __attribute__((always_inline))
float4 getTextureLodValue(thread const uint& samplerId, thread const float2& uv, thread const float& lod, constant array<texture2d<float>, 65536>& SceneTextures, sampler SceneTextureSampler)
{
uint _45 = samplerId;
return SceneTextures[_45].sample(SceneTextureSampler, uv, level(lod));
}
struct spvDescriptorSetBuffer1
{
array<texture2d<float>, 65536> SceneTextures [[id(0)]];
};
Not that it likely matters for the posted issue, but when running 1.2.6, what do you define the length of the runtime array to be? Metal validation typically complains when a large array is indicated by the shader, but a smaller argument buffer is passed to it.
It is probable that MoltenVK 1.2.6 was passing an argument buffer large enough to hold 65536 textures, which would avoid the Metal validation error. This has since been changed, because it means a descriptor set & descriptor pool really can't be dynamically sized.
In the end the array has the size equal to the number of actually used textures, which is much less than 65536. No Metal validation errors - neither with 1.2.6, nor with 1.2.10.
Fun fact: when I enable Shader Validation in Xcode, shadow map works in 1.2.10 = mask textures are sampled correctly.
Although it might not matter, I'm not sure I understand what the fragment shader is doing. There is no output, and no writes that I can see. What is it doing?
Can you provide the full source code for the problematic shader under 1.2.10, including SPIR-V, MSL & GLSL?
You can do this by setting the following two environment variables when you run your app:
MVK_CONFIG_DEBUG=1
MVK_CONFIG_LOG_LEVEL=4
then copy the text for the SPIR-V, MSL & GLSL source code logged for the problematic shader.
If you can do that for the 1.2.4 MoltenVK as well, we can better compare what's working and what's not.
Yes I've cut some parts out to not have a wall of text. Here are the complete SPIR-V, MSL and guessed GLSL sources for both 1.2.10 and 1.2.6.
This is a shadow map shader. Based on the material, if cut-out mask is present it samples the mask and discards fragment where mask values are <= 0.5. If not discarded, depth value is written into the attachment. With 1.2.10 sampling mask texture returns 0.0 for textures at index > 0.
Unfortunately, going back to declaring the full size of the array in MSL will result in Metal validation errors like:
struct spvDescriptorSetBuffer0
{
array<texture2d<float>, 1000000> Textures [[id(0)]];
};
-[MTLDebugRenderCommandEncoder validateCommonDrawErrors:]:5775: failed assertion `Draw Errors Validation
Fragment Function(main0): argument spvDescriptorSet0[0] from buffer(0) with offset(16384) and length(16928) has space for 544 bytes, but argument has a length(8000000).'
~What mechanism are you using to update descriptors?~
vkUpdateDescriptorSets
?~vkUpdateDescriptorSetWithTemplate
?~vkCmdPushDescriptorSetKHR
?~vkCmdPushDescriptorSetWithTemplateKHR
?~[Edit:] NM. Upon further checking, this won't make a difference.
It may be yet another weird Metal behaviour we observe sometimes and have to workaround. Like inability to query texture LOD on some GPUs for example. I don't see any reason why the same descriptor set should work in one render pass and not work in another, especially considering the fact that it works in both with shader validation enabled.
Some updates after more debugging.
For reference, all problematic shader versions:
Thank you for your help. I'll try to reproduce it with descriptor_indexing sample.
Thanks for the more detailed info.
Fun fact: when I enable Shader Validation in Xcode, shadow map works in 1.2.10 = mask textures are sampled correctly.
workaround - do not do discard
Hmmm...I think we might be dealing with some kind of bizarreness around the discard_fragment()
path. Possibly a Metal outlier bug.
What happens if you modify the shader to return a color value (maybe transparent black (0, 0, 0 0)), in an effort to try to get it to compile like a more traditional shader?
I'll try to reproduce it with descriptor_indexing sample.
If this is tied to discard_fragment()
behaviour, you might not be able to replicate it there.
discard + writing to RGBA_8 attachment still has the problem.
We have encountered another case of this issue - one compute shader samples mask textures and also gets zeroes for any texture index > 0. But that new one is a compute shader, so no attachments at all :/
In compute shader case it even looks more like undefined behaviour. Some times it works but samples zeros, other times we crash with device lost as soon as mask texture has to be sampled. I'm experimenting with other texture formats and shader code shuffling, but without much luck yet. And similar to graphics pipeline case the problem is gone when I enable Metal shader validation.
Hey @billhollings I have created a repro case with descriptor_indexing sample: https://github.com/KhronosGroup/Vulkan-Samples/compare/main...AlexanderDevaikinEnscape:Vulkan-Samples:Reprocase/MetalDeviceLost
Key points are: have discard and do not use sampled data in output. I don't have compute shader repro case, but as I mentioned above we observe similar issue in compute, plus there is no output in compute shader so we cannot workaround the problem.
P.S. issue is present on m1 and m2 chips + iPhones. m3 is not affected.
Hello. Thank you so much for the recent fixes to argument buffers and especially argument buffers for iOS support! Unfortunately I still have some issues with bindless textures which prevents me from upgrading from version 1.2.4 which was the last one I could use.
Short version: In some renderer passes sampling bindless textures returns 0s with 1.2.10 where it has worked just fine with MoltenVK 1.2.4. macOS 14.5, Macbook 1m Max
Long version: I have an array of textures (the one with update-after-bind flag), and an array of materials where texture index is an index in that array of textures for every material. So, in some cases sampling a texture returns only 0s - e.g. in shadow map pass I get 0s, while in GBuffer pass I get valid values for the same mask texture.
What is different in 1.2.10 compared to 1.2.4 is that in Xcode shader debugger, array of textures is shown to have only one item, while with version 1.2.4 it shows it's complete size with valid textures in all the used indices. It's
spvDescriptorSet1
on the screenshot:Textures addressed by the indices > 0 in this shader return only 0s while sampling. If I hardcode index 0, sampling returns expected values for the texture with index 0. But this happens only in some render passes, although they share the same code for binding resources and binding declaration in shaders with the other render passes that work. Shader debugger shows only one element in array of textures in working passes too though, and it also shows 0s returned by sampling but in the end values are correct. So I don't trust Xcode shader debugger any more.
Interesting is that Xcode capture memory viewer displays the array with textures as expected - they are all present there in 1.2.10. But not in 1.2.4 - although all render passes work, memory viewer shows only a couple on nulls in the array.
Summary:
Help. I don't have a repro case code I can publish unfortunately.