IGCIT / Intel-GPU-Community-Issue-Tracker-IGCIT

IGCIT is a Community-driven issue tracker for Intel GPUs.
GNU General Public License v3.0
112 stars 3 forks source link

BoolkaEngine crash when using unaligned vertex struct in mesh shader #759

Open Devaniti opened 2 months ago

Devaniti commented 2 months ago

Checklist [README]

Application [Required]

BoolkaEngine

Processor / Processor Number [Required]

AMD Ryzen 5 3600 6-Core Processor

Graphic Card [Required]

Intel(R) Arc(TM) A770 Graphics

GPU Driver Version [Required]

31.0.101.5382

Other GPU Driver version

No response

Rendering API [Required]

Windows Build Number [Required]

Other Windows build number

No response

Intel System Support Utility report

igcit_ssu.txt

Description and steps to reproduce [Required]

Download latest release of BoolkaEngine - https://github.com/Devaniti/BoolkaEngine/releases/tag/v0.2 Extract all files Run start.bat Application will start and immediately crash

BoolkaEngine is my D3D12 engine pet project. It works just fine on Nvidia/AMD GPUs.

The issue seems to be related to execution of Mesh Shaders There is a commit with a workaround for this crash - https://github.com/Devaniti/BoolkaEngine/commit/02855c222dd194e34ab4dd4d28bc4d720527072b That workaround changes layout of the Vertex struct used with Mesh Shaders Since there are no relevant limitations on the layout of vertex struct, it is highly likely that it is not UB inside BoolkaEngine, but rather mishandling of vertex struct layout in mesh shaders inside Intel driver

Device / Platform

No response

Crash dumps [Required, if applicable]

No response

Application / Windows logs

No response

Karen-Intel commented 2 months ago

@Devaniti hiii and welcome! I provide support for Game/App developers and I will be assisting you in this case Let me confirm this crash and I'll be back with my findings. If I have questions I'll ping you right back :)

Karen

Karen-Intel commented 2 months ago

Heey @Devaniti quick update! I could verify the correct excecution of the scene using the build in my NVIDIA RTX 3050 but unfortunately it crashes in my ARC with driver v.5382. I have also performed a small regression like you suggested and the behavior is the same, so I'll be creating an internal report for this. A couple questions for my report:

  1. Is there an official dx12 doc that you followed to find that the matrix should be returned the way you originally did it? If so, please share
  2. Can you share how many users (give or take) might be impacted?

Thanks, looking forward to hear from you :)

Karen

Devaniti commented 2 months ago
  1. The most relevant document is this one - https://microsoft.github.io/DirectX-Specs/d3d/MeshShader.html#vertex-attributes There is no limit on the structure of the Vertex Attributes, only requirements to specify semantic for each field of the structure and have 4 component element with SV_Position semantic, which are fulfilled in both version before and after the workaround.
  2. This is more of a code sample, and not application that people would actively use. So users that are impacted by the crash in the app itself is about 0, but people may use this code that crashing on Intel ARC as a reference in other projects. Either way I'd expect quite small number of people to be affected by this bug.

As for the workaround having same behavior, on my end workaround does fix the crash on Intel ARC. To ensure that we are running same code in each case, you can build both versions from scratch:

  1. Clone the https://github.com/Devaniti/BoolkaEngine repo
  2. Run HelperScripts/QuickStart.bat on main branch to observe the crash
  3. Run HelperScripts/QuickStart.bat on the IntelArcWorkaround branch to observe it working with the workaround

That script will build the project, download and prepare the scene and run the app.

Karen-Intel commented 2 months ago

Ty @Devaniti I have been able to run both branches, but I'd rather focus on the one without the WA and see what we can do on the driver side. Edit: doing some research. Will update soon

Karen

Arturo-Intel commented 2 months ago

@Devaniti Just a quick comment: Looking at the document you shared image

This function requires a 4-component vector, and looking at the WA code you just did that: change from int to int4 and float2 and float3 to float4, right?

So, it seems to me that the WA it is the correct way to use this function. I assume that Nvidia/AMD drivers are converting those non 4-component vectors into 4 elements valid vectors.

Do you agree?

Devaniti commented 2 months ago

In Vertex Attributes, you are required to have one 4-component attribute with SV_Position semantic. If relevant attribute is not 4-component vector, shader compilation fails with error : SV_Position must be float4.. As you can see, both before and after workaround, there's float4 position : SV_Position;, which satisfies that requirement, and the shaders successfully build in both cases. And since the highlighted requirement does not limit other attributes, it is valid for other attributes to have other sizes.

This official mesh shader code sample uses non 4-component attributes as well - https://github.com/microsoft/DirectX-Graphics-Samples/blob/master/Samples/Desktop/D3D12MeshShaders/src/MeshletCull/MeshletCommon.hlsli https://github.com/microsoft/DirectX-Graphics-Samples/blob/master/Samples/Desktop/D3D12MeshShaders/src/MeshletCull/MeshletMS.hlsl

Arturo-Intel commented 2 months ago

image I was able to run that sample (MeshletCull) using A770... I will run this sample with a NV gpu to see if is there any difference

-- r2