IGCIT / Intel-GPU-Community-Issue-Tracker-IGCIT

IGCIT is a Community-driven issue tracker for Intel GPUs.
GNU General Public License v3.0
118 stars 4 forks source link

SPIR-V compiler crash when processing geometry shader #551

Closed liamwhite closed 10 months ago

liamwhite commented 1 year ago

Checklist [README]

Application [Required]

yuzu

Processor / Processor Number [Required]

Ryzen 9 5950X

Graphic Card [Required]

Arc A770 16GB

GPU Driver Version [Required]

Rendering API [Required]

Windows Build Number [Required]

Other Windows build number

No response

Intel System Support Utility report

ssu-info.txt

Description and steps to reproduce [Required]

The yuzu Nintendo Switch emulator crashes when running Xenoblade Chronicles with the Vulkan Arc driver due to a null pointer dereference in the SPIR-V shader compiler, related to the presence of geometry shaders. All shader modules were validated with spirv-val from SPIRV-Tools.

Mesa drivers do not crash. Other driver families (AMD, Nvidia) do not crash.

Here is a link to the issue report: https://github.com/yuzu-emu/yuzu/issues/11341

Here is a link to the latest version of yuzu: https://github.com/yuzu-emu/yuzu-mainline/releases


I have provided the source code for a minimal application which reproduces the crash here: geom_crash.zip

Steps for the minimal application:

  1. mkdir build
  2. cd build
  3. cmake ..
  4. Open in VS and build main.cpp
  5. Run and it will immediately crash

Screenshot_2023-10-04_213106

Device / Platform

No response

Crash dumps [Required, if applicable]

No response

Application / Windows logs

No response

Arturo-Intel commented 1 year ago

@liamwhite thanks for the report! Will repro this in the lab, any news I will share it trhu this thread.

-- r2

Arturo-Intel commented 1 year ago

@liamwhite can you share the not compiled shaders files?

liamwhite commented 1 year ago

Shaders in GLSL format: shaders.zip

It's worth noting that recompiling the geometry shader from GLSL with glslangValidator does not result in a driver crash (execution completes successfully), so there is something specific to the binary format of the original SPIR-V that causes the issue.

Arturo-Intel commented 1 year ago

@liamwhite when trying with RTX3060 I got this message:

image

failed to find a suitable GPU!

Is this expected?

liamwhite commented 1 year ago

Whoops. In

const std::vector<const char*> deviceExtensions = {
    "VK_KHR_shader_draw_parameters",
    "VK_EXT_transform_feedback",
    "VK_INTEL_performance_query", // to force selection of intel
};

, comment out the line containing VK_INTEL_performance_query. I added this while testing as my system with the Arc has multiple GPUs in it.

Arturo-Intel commented 1 year ago

Lots of validation errors now:

validation layer: Layer name GalaxyOverlayVkLayer does not conform to naming standard (Policy #LLP_LAYER_3)

validation layer: Layer name GalaxyOverlayVkLayer_VERBOSE does not conform to naming standard (Policy #LLP_LAYER_3)

validation layer: Layer name GalaxyOverlayVkLayer_DEBUG does not conform to naming standard (Policy #LLP_LAYER_3)

validation layer: windows_get_device_registry_files: GUID for 3 is not SoftwareComponent skipping

validation layer: Searching for ICD drivers named .\nvoglv64.dll

validation layer: Loading layer library C:\VulkanSDK\1.3.250.0\Bin\.\VkLayer_khronos_validation.dll

validation layer: Loading layer library C:\Program Files (x86)\RivaTuner Statistics Server\Vulkan\.\RTSSVkLayer64.dll

validation layer: Loading layer library C:\ProgramData\obs-studio-hook\.\graphics-hook64.dll

validation layer: Loading layer library C:\WINDOWS\System32\DriverStore\FileRepository\nv_dispig.inf_amd64_8c8de08a85de4474\.\nvoglv64.dll

Using device name NVIDIA GeForce RTX 3060

validation layer: Validation Error: [ VUID-VkShaderModuleCreateInfo-pCode-08740 ] Object 0: handle = 0x1ec9b64fe20, type = VK_OBJECT_TYPE_DEVICE; | MessageID = 0x6e224e9 | vkCreateShaderModule(): The SPIR-V Capability (DenormFlushToZero) was declared, but none of the requirements were met to use it. The Vulkan spec states: If pname:codeType is ename:VK_SHADER_CODE_TYPE_SPIRV_EXT, and pCode declares any of the capabilities listed in the SPIR-V Environment appendix, one of the corresponding requirements must be satisfied (https://vulkan.lunarg.com/doc/view/1.3.250.0/windows/1.3-extensions/vkspec.html#VUID-VkShaderModuleCreateInfo-pCode-08740)

validation layer: Validation Error: [ VUID-RuntimeSpirv-shaderDenormFlushToZeroFloat32-06300 ] Object 0: handle = 0xf56c9b0000000004, type = VK_OBJECT_TYPE_SHADER_MODULE; | MessageID = 0xee960e96 | Shader requires DenormFlushToZero for bit width 32 but it is not enabled on the device The Vulkan spec states: If shaderDenormFlushToZeroFloat32 is VK_FALSE, then DenormFlushToZero for 32-bit floating-point type must not be used (https://vulkan.lunarg.com/doc/view/1.3.250.0/windows/1.3-extensions/vkspec.html#VUID-RuntimeSpirv-shaderDenormFlushToZeroFloat32-06300)

validation layer: Validation Error: [ VUID-VkShaderModuleCreateInfo-pCode-08740 ] Object 0: handle = 0x1ec9b64fe20, type = VK_OBJECT_TYPE_DEVICE; | MessageID = 0x6e224e9 | vkCreateShaderModule(): The SPIR-V Capability (DenormFlushToZero) was declared, but none of the requirements were met to use it. The Vulkan spec states: If pname:codeType is ename:VK_SHADER_CODE_TYPE_SPIRV_EXT, and pCode declares any of the capabilities listed in the SPIR-V Environment appendix, one of the corresponding requirements must be satisfied (https://vulkan.lunarg.com/doc/view/1.3.250.0/windows/1.3-extensions/vkspec.html#VUID-VkShaderModuleCreateInfo-pCode-08740)

validation layer: Validation Error: [ VUID-RuntimeSpirv-shaderDenormFlushToZeroFloat32-06300 ] Object 0: handle = 0xf443490000000006, type = VK_OBJECT_TYPE_SHADER_MODULE; | MessageID = 0xee960e96 | Shader requires DenormFlushToZero for bit width 32 but it is not enabled on the device The Vulkan spec states: If shaderDenormFlushToZeroFloat32 is VK_FALSE, then DenormFlushToZero for 32-bit floating-point type must not be used (https://vulkan.lunarg.com/doc/view/1.3.250.0/windows/1.3-extensions/vkspec.html#VUID-RuntimeSpirv-shaderDenormFlushToZeroFloat32-06300)

validation layer: Validation Error: [ VUID-vkDestroyDevice-device-00378 ] Object 0: handle = 0x1ec9b64fe20, type = VK_OBJECT_TYPE_DEVICE; Object 1: handle = 0xfa21a40000000003, type = VK_OBJECT_TYPE_DESCRIPTOR_SET_LAYOUT; | MessageID = 0x71500fba | OBJ ERROR : For VkDevice 0x1ec9b64fe20[], VkDescriptorSetLayout 0xfa21a40000000003[] has not been destroyed. The Vulkan spec states: All child objects created on device must have been destroyed prior to destroying device (https://vulkan.lunarg.com/doc/view/1.3.250.0/windows/1.3-extensions/vkspec.html#VUID-vkDestroyDevice-device-00378)

validation layer: Unloading layer library C:\WINDOWS\System32\DriverStore\FileRepository\nv_dispig.inf_amd64_8c8de08a85de4474\.\nvoglv64.dll

validation layer: Unloading layer library C:\ProgramData\obs-studio-hook\.\graphics-hook64.dll

validation layer: Unloading layer library C:\Program Files (x86)\RivaTuner Statistics Server\Vulkan\.\RTSSVkLayer64.dll

validation layer: Unloading layer library C:\VulkanSDK\1.3.250.0\Bin\.\VkLayer_khronos_validation.dll

Is this expected?

liamwhite commented 1 year ago

On Nvidia, errors related to shaderDenormFlushToZeroFloat32 are expected, as they do not support the device property. The final validation error about not destroying the object is simply because I did not include any cleanup code. So yes, both are expected.

Arturo-Intel commented 1 year ago

alright then! thanks for the clarification @liamwhite Will work on the case

thanks --r2

Arturo-Intel commented 1 year ago

@liamwhite Do you know what parameters were used to compile the original spv files ?

liamwhite commented 1 year ago

The original shader files were programmatically generated by yuzu's SPIR-V shader compiler with sirit, a C++ generator for the format. They were not generated with glslangValidator.

If you need me to, I can provide more specifics about the flags used in spirv_emit_context.cpp, but I don't know how productive that would be.

Arturo-Intel commented 1 year ago

No, I dont think that would be necesary. I am just gathering information. Ty again @liamwhite

kunit1 commented 1 year ago

@Arturo-Intel wondering if you have any updates or timeline on a fix for this issue. Thank you.

Arturo-Intel commented 1 year ago

@kunit1 Hey Our devs already found the root cause of this, so I expect to have a build soon and start verifying the issue on my side. --r2

Arturo-Intel commented 11 months ago

@liamwhite can you try with .5074? The fix of this issue is there. Please confirm :)

goldenx86 commented 11 months ago

Won't be able to confirm the game per se for a few days, but I can confirm my Iris Xe still crashes with the provided testcase and driver 5074.

liamwhite commented 11 months ago

Hi @Arturo-Intel,

I gave yuzu a try with the 5074 driver and was surprised to see that it still crashed in igc64.dll. I then retried the test case posted in this issue and found that it also crashed in the same location. I did double check that the driver was installed correctly: the Arc control panel shows 5074 and I also see it listed as the version in yuzu's log output.

I asked another tester with a Xe integrated GPU to try the 5074 driver, and he also confirmed that the zip test case is still crashing.

Could you check if there is an extra step I need to take to see this working?

Arturo-Intel commented 11 months ago

TL;DR: The fix is not in this driver (.5074), but I know the fix is on its way

hmm yes, I see that too, I got confused before because it was stopping before creating the pipeline (failing to open a file)

image

When I copy the files where the program can find them, it crashes on the pipeline creation.

Not seeing the message on the console and the visual studio not complaining at all, I rushed to tell you it was fixed. Apologies, my bad

--r2

Arturo-Intel commented 11 months ago

@goldenx86 this issue also is present on Iris Xe? Can you share to me the SSU info of the system where you tried?

goldenx86 commented 11 months ago

ssu.txt

Is it this?

goldenx86 commented 11 months ago

5122 reports as 5081 for me, issue persists.

Arturo-Intel commented 10 months ago

@liamwhite Hey can you please update your driver to .5186? I am NOT getting the crash on the pipline creation and also getting the same output as expected on comp

Can you verify in your end?

validation layer: windows_get_device_registry_files: GUID for 5 is not SoftwareComponent skipping

validation layer: windows_get_device_registry_files: GUID for 6 is not SoftwareComponent skipping

validation layer: windows_get_device_registry_files: GUID for 7 is not SoftwareComponent skipping

validation layer: Layer name GalaxyOverlayVkLayer does not conform to naming standard (Policy #LLP_LAYER_3)

validation layer: Layer name GalaxyOverlayVkLayer_VERBOSE does not conform to naming standard (Policy #LLP_LAYER_3)

validation layer: Layer name GalaxyOverlayVkLayer_DEBUG does not conform to naming standard (Policy #LLP_LAYER_3)

validation layer: windows_get_device_registry_files: GUID for 5 is not SoftwareComponent skipping

validation layer: windows_get_device_registry_files: GUID for 6 is not SoftwareComponent skipping

validation layer: windows_get_device_registry_files: GUID for 7 is not SoftwareComponent skipping

validation layer: Searching for ICD drivers named .\igvk64.dll

validation layer: Searching for ICD drivers named .\igvk64.dll

validation layer: Layer VK_LAYER_OW_OVERLAY uses API version 1.2 which is older than the application specified API version of 1.3. May cause issues.

validation layer: Layer VK_LAYER_OW_OBS_HOOK uses API version 1.2 which is older than the application specified API version of 1.3. May cause issues.

validation layer: Loading layer library C:\VulkanSDK\1.3.250.0\Bin\.\VkLayer_khronos_validation.dll

validation layer: Loading layer library C:\ProgramData\obs-studio-hook\.\graphics-hook64.dll

validation layer: Loading layer library C:\Program Files (x86)\Overwolf\0.241.0.10\.\ow-graphics-vulkan.dll

validation layer: Loading layer library C:\Program Files (x86)\Overwolf\0.241.0.10\.\owclient.dll

validation layer: Loading layer library C:\Program Files (x86)\RivaTuner Statistics Server\Vulkan\.\RTSSVkLayer64.dll

Using device name Intel(R) Arc(TM) A750 Graphics

validation layer: Validation Error: [ VUID-vkDestroyDevice-device-00378 ] Object 0: handle = 0x1dc88a6b848, type = VK_OBJECT_TYPE_DEVICE; Object 1: handle = 0xfa21a40000000003, type = VK_OBJECT_TYPE_DESCRIPTOR_SET_LAYOUT; | MessageID = 0x71500fba | OBJ ERROR : For VkDevice 0x1dc88a6b848[], VkDescriptorSetLayout 0xfa21a40000000003[] has not been destroyed. The Vulkan spec states: All child objects created on device must have been destroyed prior to destroying device (https://vulkan.lunarg.com/doc/view/1.3.250.0/windows/1.3-extensions/vkspec.html#VUID-vkDestroyDevice-device-00378)

validation layer: Unloading layer library C:\Program Files (x86)\RivaTuner Statistics Server\Vulkan\.\RTSSVkLayer64.dll

validation layer: Unloading layer library C:\Program Files (x86)\Overwolf\0.241.0.10\.\owclient.dll

validation layer: Unloading layer library C:\Program Files (x86)\Overwolf\0.241.0.10\.\ow-graphics-vulkan.dll

validation layer: Unloading layer library C:\ProgramData\obs-studio-hook\.\graphics-hook64.dll

validation layer: Unloading layer library C:\VulkanSDK\1.3.250.0\Bin\.\VkLayer_khronos_validation.dll

C:\_bugs\yuzu\geom_crash\build\Debug\main.exe (process 21216) exited with code 0.
To automatically close the console when debugging stops, enable Tools->Options->Debugging->Automatically close the console when debugging stops.
Press any key to close this window . . .
liamwhite commented 10 months ago

Hi @Arturo-Intel - it is fixed now. Thanks!