IGCIT / Intel-GPU-Community-Issue-Tracker-IGCIT

IGCIT is a Community-driven issue tracker for Intel GPUs.
GNU General Public License v3.0
113 stars 3 forks source link

Minimal repro app for a Direct3D 12 driver crash (debugger) and device removal (no debugger) during PSO creation #817

Open mightycow opened 1 month ago

mightycow commented 1 month ago

Checklist [README]

Application [Required]

None, this is a minimal repro app for the driver team: https://myt.playmorepromode.com/intel/

Processor / Processor Number [Required]

Core i5-13600K

Graphic Card [Required]

UHD Graphics 770

GPU Driver Version [Required]

31.0.101.5594

Other GPU Driver version

The problem happens with 5594, 5590 and 5444

Rendering API [Required]

Windows Build Number [Required]

Other Windows build number

No response

Intel System Support Utility report

igcit_ssu.txt

Description and steps to reproduce [Required]

  1. Download source code from https://myt.playmorepromode.com/intel/ or from the "Crash dumps" section below
  2. Run through VC++ debugger to trigger a crash in igc1464.dll
  3. Run without a debugger to trigger a device removal (0x887A0005 from CreateComputePipelineState)

Device / Platform

No response

Crash dumps [Required, if applicable]

Here's the minimal repro app with full source code and VC++ 2022 project files. intel_driver_crash_d3d12_no_exe_no_dll.zip The app makes sure to only create a device on an Intel adapter. All it does then is create a PSO using the supplied DXIL byte code. Agility SDK headers are included but the DLLs are missing. The version used is 608. Archive versions with binaries are hosted here: https://myt.playmorepromode.com/intel/

Application / Windows logs

No response

Karen-Intel commented 1 month ago

Hey @mightycow thank you for your submission. I'll be assisting you on this one. We just got a new driver yesterday and one of the newest machines. I'll do some testing and get back to you for confirmation/questions. Stay tuned!

Karen

Karen-Intel commented 1 month ago

Hii @mightycow update! I was able to reproduce the crash in the igc1464.dll, but I'm trying to run the same project on a similar system with another GPU (for debug purposes and issue isolation) I'm getting a syntax error: expecting a type specification near "UINT" when I run it in my 3050. Is this expected? NVIDIA 3050 Driver Crash Project Test

Thanks for your help

Karen

mightycow commented 1 month ago

Hello @Karen-Intel,

Thanks for taking a look at my report.

Here are my thoughts and some extra information:

mightycow commented 1 month ago

Instead of spending time fixing the build issue on the second machine, you can copy over the compiled binaries from the first machine.

App output when it fails to create the PSO:

Error code 0x887A0005 (The GPU device instance has been suspended. Use GetDeviceRemovedReason to determine the appropriate action.)
device->CreateComputePipelineState(&desc, IID_PPV_ARGS(&pso)) @ App::CreatePSO:211
Press any key to continue . . .

App output when it succeeds:

Compiled without a crash! Hurray!

There are 2 lines to change to select another vendor's adapter: 104 and 119. 104: if(SUCCEEDED(adapter->GetDesc1(&desc)) && desc.VendorId == VENDORID_INTEL) 119: assert(desc.VendorId == VENDORID_INTEL);

Hope that helps.

Karen-Intel commented 1 month ago

Hey @mightycow thank you for the details!

After exporting the executable from my Intel machine (where it fails) and after changing the VendorID, I was able to get the: Compiled without a crash! Hurray! in my NVIDIA 3050

I will proceed to create an internal report. Will be back tomorrow with the number for your reference.

Karen

mightycow commented 1 month ago

Thank you @Karen-Intel

Karen-Intel commented 1 month ago

Hii @mightycow!

Sharing the internal ID of the report I have submitted: 14022859473

Driver general cadence for fixes takes between 3-6 months avg, however some fixes take less than that. Any update I'll post it in this thread

Thank you so much for your help!

Karen

mightycow commented 1 month ago

Thanks, @Karen-Intel