Open lunarpapillo opened 3 weeks ago
For reference, original chat is: https://chat.google.com/room/AAAAOXVAYGg/FL0Vh98x-gM/FL0Vh98x-gM?cls=10
tests crash on all devices in the same place in VkArmBestPracticesLayerTest.ComputeShaderBadSpatialLocalityTest,
This is 99% because VkArm
is alphabetically first and it will crash in any test
I was working on a minimal repro case and got it down to this, note that I'm not even creating a Vulkan instance:
TEST_F(PositiveTooling, Issue8439) {
std::vector<uint32_t> spv = {
0x07230203, 0x00010000, 0x0008000b, 0x00000019, 0x00000000, 0x00020011, 0x00000001, 0x0006000b,
0x00000001, 0x4c534c47, 0x6474732e, 0x3035342e, 0x00000000, 0x0003000e, 0x00000000, 0x00000001,
0x0005000f, 0x00000005, 0x00000004, 0x6e69616d, 0x00000000, 0x00060010, 0x00000004, 0x00000011,
0x00000008, 0x00000008, 0x00000001, 0x00030003, 0x00000002, 0x000001c2, 0x00040005, 0x00000004,
0x6e69616d, 0x00000000, 0x00040005, 0x00000009, 0x756c6176, 0x00000065, 0x00050005, 0x0000000d,
0x6d615375, 0x72656c70, 0x00000000, 0x00040047, 0x0000000d, 0x00000022, 0x00000000, 0x00040047,
0x0000000d, 0x00000021, 0x00000000, 0x00040047, 0x00000018, 0x0000000b, 0x00000019, 0x00020013,
0x00000002, 0x00030021, 0x00000003, 0x00000002, 0x00030016, 0x00000006, 0x00000020, 0x00040017,
0x00000007, 0x00000006, 0x00000004, 0x00040020, 0x00000008, 0x00000007, 0x00000007, 0x00090019,
0x0000000a, 0x00000006, 0x00000001, 0x00000000, 0x00000000, 0x00000000, 0x00000001, 0x00000000,
0x0003001b, 0x0000000b, 0x0000000a, 0x00040020, 0x0000000c, 0x00000000, 0x0000000b, 0x0004003b,
0x0000000c, 0x0000000d, 0x00000000, 0x00040017, 0x0000000f, 0x00000006, 0x00000002, 0x0004002b,
0x00000006, 0x00000010, 0x3f000000, 0x0005002c, 0x0000000f, 0x00000011, 0x00000010, 0x00000010,
0x0004002b, 0x00000006, 0x00000012, 0x00000000, 0x00040015, 0x00000014, 0x00000020, 0x00000000,
0x00040017, 0x00000015, 0x00000014, 0x00000003, 0x0004002b, 0x00000014, 0x00000016, 0x00000008,
0x0004002b, 0x00000014, 0x00000017, 0x00000001, 0x0006002c, 0x00000015, 0x00000018, 0x00000016,
0x00000016, 0x00000017, 0x00050036, 0x00000002, 0x00000004, 0x00000000, 0x00000003, 0x000200f8,
0x00000005, 0x0004003b, 0x00000008, 0x00000009, 0x00000007, 0x0004003d, 0x0000000b, 0x0000000e,
0x0000000d, 0x00070058, 0x00000007, 0x00000013, 0x0000000e, 0x00000011, 0x00000002, 0x00000012,
0x0003003e, 0x00000009, 0x00000013, 0x000100fd, 0x00010038,
};
spv_target_env spirv_environment = SPV_ENV_VULKAN_1_0;
spv_context ctx = spvContextCreate(spirv_environment);
spvtools::ValidatorOptions spirv_val_options;
spv_const_binary_t binary{spv.data(), spv.size()};
spv_diagnostic diag = nullptr;
const spv_result_t spv_valid = spvValidateWithOptions(ctx, spirv_val_options, &binary, &diag);
ASSERT_TRUE(spv_valid == SPV_SUCCESS);
spvDiagnosticDestroy(diag);
spvContextDestroy(ctx);
}
Weird thing is that if I add the same test to the SPIRV-Tools unit tests, it works fine! Same SPIRV-Tools commit, same CMake flags, same NDK.
Weird thing is that if I add the same test to the SPIRV-Tools unit tests, it works fine! Same SPIRV-Tools commit, same CMake flags, same NDK.
Do the SPIRV-Tools unit tests also run on Android?
By default, SPIRV-Tools tests do not run on Android. I was able to run them by commenting out these lines: https://github.com/KhronosGroup/SPIRV-Tools/blob/main/CMakeLists.txt#L315-L317 and then manually pushing and running the test executable using the adb shell.
Weird...
const spv_result_t spv_valid = spvValidateWithOptions(ctx, spirv_val_options, &binary, &diag);
ASSERT_TRUE(spv_valid == SPV_SUCCESS);
spvDiagnosticDestroy(diag);
spvContextDestroy(ctx);
I presume the crash occurs in spvValidateWithOptions()
, as it seems to with the VVL tests, and the stack trace is otherwise similar; I presume you were also running the test in isolation via --gtest_filter
, yes?
Since it works in SPIRV-Tools unit tests, do you have an hypothesis as to why it fails deterministically in VVL? I've got nothing...
I presume the crash occurs in
spvValidateWithOptions()
, as it seems to with the VVL tests, and the stack trace is otherwise similar; I presume you were also running the test in isolation via--gtest_filter
, yes?
Yes and yes. And just like your initial writup, this only affects the Debug build. Release builds make it past the the spvValidateWithOptions()
call and pass the assert.
Since it works in SPIRV-Tools unit tests, do you have an hypothesis as to why it fails deterministically in VVL? I've got nothing...
No real hypothesis yet. The fact that the test code works in one build (SPIRV-Tools) and not the other (VVL) makes me suspect something about how we build/package libSPIRV-Tools
Similar issue: https://github.com/KhronosGroup/glslang/issues/3534
That reporter traced it back to a specific constructor for std::vector and patched around it by constructing the vector using a different method
Environment:
Describe the Issue
When building and testing a
Debug
build using Android NDK 26.3, tests crash on all devices in the same place inVkArmBestPracticesLayerTest.ComputeShaderBadSpatialLocalityTest
, inside an allocator within SPIRV-Tools:The full
ndk-stack
output is available: 008-ndk-stack-info.txtThe crash appears when using a
Debug
build with Android NDK 26.3. It does not appear when using aRelease
build with NDK 26.3, nor (using either aRelease
or aDebug
build) with either NDK 25.2 or NDK 27.0.Given that the code appears to run correctly in a
Release
build, that the crash is device-independent, and that the crash occurs during memory allocation, it's fairly likely that the compiler isn't the issue, and that that something in validation or SPIRV is causing memory corruption that happens to cause a validation crash when memory is laid out "just right". If Address Sanitizer is supported on Android, it might be helpful in uncovering such a corruption.It's possible, though IMHO unlikely, that this is an unknown compiler bug that appeared in NDK 26 and disappeared in NDK 27, as symptoms like this are not listed as known issues: https://github.com/android/ndk/releases
To reproduce the problem, run a manual-Vulkan-ValidationLayers build with: http://tcubuser.lunarg.localdomain:8080/view/Manual/job/manual-Vulkan-ValidationLayers/build
BUILD_MODE
:Debug
ANDROID_ARGS
:--android-ndk 26.3
NODE
:tcubuand1