IGCIT / Intel-GPU-Community-Issue-Tracker-IGCIT

IGCIT is a Community-driven issue tracker for Intel GPUs.
GNU General Public License v3.0
116 stars 4 forks source link

When interferencing nn model after some time got following: Display driver igfx stopped responding and has successfully recovered. #831

Open MaksimDanilov opened 2 months ago

MaksimDanilov commented 2 months ago

Checklist [README]

Application [Required]

PyTorch with IPEX

Processor / Processor Number [Required]

12th Gen Intel(R) Core(TM) i7-1255U , GenuineIntel

Graphic Card [Required]

Intel(R) Iris(R) Xe Graphics

GPU Driver Version [Required]

32.0.101.5768

Other GPU Driver version

No response

Rendering API [Required]

Windows Build [Required]

Windows 11 23H2

Other Windows build

No response

Intel System Support Utility report

log.txt

Description and steps to reproduce [Required]

Here reproduction steps. Above in thread contains all logs and problem.

Device / Platform

No response

Crash dumps [Required, if applicable]

No response

Application / Windows logs

No response

Karen-Intel commented 2 months ago

Hi @MaksimDanilov ty for your report. I will be assisting you on this case

Adding to debug queue :) Stay tuned

Karen

Arturo-Intel commented 2 months ago

@MaksimDanilov Hey I was able to reproduce the issue on my Intel Core Ultra 9 185H, I still have some warnings to deal with (related to torchvision dependencies), but I am working on it.

image

Question: Do you see the screen flashing when the ERROR message pops?

~~Either way I will open the case to the driver's engineering team. I will share the ID once I have it.~~

Thanks for your patience, we are working on this,

EDIT: It looks like that error and the flashing thing on the screen was because windows just rollback the driver in that precise moment (10m mark) and causes the error to pop. I am not able to reproduce it, I am still working on it.

Will update later, --r2

MaksimDanilov commented 2 months ago

~@MaksimDanilov Hey I was able to reproduce the issue on my Intel Core Ultra 9 185H, I still have some warnings to deal with (related to torchvision dependencies), but I am working on it.~

image

~Question: Do you see the screen flashing when the ERROR message pops?~

~Either way I will open the case to the driver's engineering team. I will share the ID once I have it.~

Thanks for your patience, we are working on this,

EDIT: It looks like that error and the flashing thing on the screen was because windows just rollback the driver in that precise moment (10m mark) and causes the error to pop. I am not able to reproduce it, I am still working on it.

Will update later, --r2

Thanks for debugging this. Today I received an update from ipex team and they managed to replicate this bug on Iris device. It seems that you have the same problem but different error :-)

Arturo-Intel commented 2 months ago

@MaksimDanilov we were able to reproduce this on Iris Xe and Intel Arc A-series as well This is the internal report number for this issue: 16022819108

Any update will be posted in this thread -- r2