IGCIT / Intel-GPU-Community-Issue-Tracker-IGCIT

IGCIT is a Community-driven issue tracker for Intel GPUs.
GNU General Public License v3.0
115 stars 4 forks source link

SPARKLE Intel® Arc™ A750 ORC OC Edition stuck on 600/2000 mhz #821

Open Toetje585 opened 3 months ago

Toetje585 commented 3 months ago

Checklist [README]

Application [Required]

Windows / Linux

Processor / Processor Number [Required]

AMD Ryzen™7800X3D / AMD Ryzen™ 9 7945HX

Graphic Card [Required]

SPARKLE Intel® Arc™ A750 ORC OC Edition

GPU Driver Version [Required]

32.0.101.5762

Other GPU Driver version

No response

Rendering API [Required]

Windows Build [Required]

Windows 11 24H2

Other Windows build

Windows 11 23H2

Intel System Support Utility report

intel.txt

B660DS3HAXDDR4.txt

Description and steps to reproduce [Required]

Problem:

SPARKLE Intel® Arc™ A750 ORC OC Edition is always stuck on 600/2000 mhz on idle even without any displays connected in Windows and Also on Linux. Causing the fan to ramp up and down all the time.

Reproduce:

Install the SPARKLE Intel® Arc™ A750 ORC OC Edition, and use the latest drivers.

Conclusion, even without any display connected the memory clock is not going down.

1

Tested on 3 systems 2 (AMD) 1 (Intel) and also under Linux, e.g. my conclusion there is a firmware problem. The above issue makes my fan ramp up and down all the time.

Best Regards,

Device / Platform

BD790I and ROG STRIX B650E-E GAMING WIFI

Crash dumps [Required, if applicable]

No response

Application / Windows logs

No response

EstebanIntel commented 3 months ago

Hi @Toetje585,

Can you clarify when you say, "Disconnect all cables from the Intel® Arc™ A750 ORC OC Edition." you include the power cables been disconnected?

Toetje585 commented 3 months ago

@EstebanIntel Disconnect all screens from the card.

Edit: Updated bug report!

Gabriela-Intel commented 2 months ago

Hi @Toetje585. I tested out the issue and what I observed is that it was stuck at 600/2000 for about 2-3 minutes after booting but it would then drop to 0, remain there for 7-8 minutes, and then increase back up again for a few minutes before going back down to 0. And basically repeat itself again.

Is this similar to the behavior you're seeing or is it ALWAYS stuck at 600/2000?

Toetje585 commented 2 months ago

Hi @Gabriela-Intel

Thanks for testing this out.

I asume this test you did is also in a hybrid setup because you can see the gpu going of (0). I have the same behaivor here its. It either stuck at the highest state 600/2000 or off. However because the gpu is setting the memory at 2000 and staying there for obiously no good reason the tempature reaches 50c fast and so the fan starts to ramp up and down. Can we somewhat agree this is strange behaivor specially because there no screens connected?

Does the card even have lower memory state at all?

Gabriela-Intel commented 2 months ago

Yes, definitely appears to be strange behavior. I filed a bug sighting so I can get more insight on this and perhaps get it fixed. Bug id is 14023014019 for your reference. Please stick around as it could take some time until we get more information. I'll post here again when I have an update!

Gabriela-Intel commented 2 months ago

Hey there @Toetje585. We have word back from the graphics debug team. Basically, this is expected behavior from the OS. What we're seeing is a momentary wake request being sent, which could be caused by Windows Update checking for a driver update for Arc. Every adapter has different wake timings so that's why the behavior might appear different than an Nvidia dgpu, for example.

You'll notice that there isn't a constant ramping up or down if you remove internet connection.

Toetje585 commented 2 months ago

@Gabriela-Intel Doubt it, for my deployments Windows Update does not inlcude driver updates. And even if a wake request can happen the GPU memory is directly on highest memory state, this card is not capable of clocking down the memory and so heating up fast. The same behaivor is on Linux. There is no need for the memory to run at highest powerstates if there no displays connected. The card is ramping up and down regardless of internet connection because it's in the highest state regardless of driver/operating system. This is 100% a firmware problem...

Toetje585 commented 2 months ago

@Gabriela-Intel How are we going forward from here?

Gabriela-Intel commented 2 months ago

Can you please share the firmware ver currently installed? You can find it in Arc Control under Settings > System Info > GPU Info > IFWI.

I did share your comment with debug, and they plan on following up on this to continue further investigation.

Toetje585 commented 2 months ago

@Gabriela-Intel Here is a detailed firmware overview:

Device: FW Version: DG02_1.3257 OPROM DATA Version: 14 00 24 04 00 00 00 00 OPROM CODE Version: 14 00 31 04 00 00 00 00 Device: Fw Data Version: Major Version: 101, OEM Manufacturing Data Version: 1, Major VCN: 1

Used: https://github.com/intel/igsc/

Toetje585 commented 2 months ago

@Gabriela-Intel I noticed something else, the card is not able to clean boot after a shutdown. For some reason sometimes the card is not detected in windows/linux if i start the computer again the next morning. However another powercycle brings it back. I tested this on two systems.

This is quite odd, also on lots of arc cards I see that OPROM DATA and OPROM CODE are on the same version, this seems not to be the case for this card.

Toetje585 commented 1 month ago

@Gabriela-Intel Any update on this matter as this renders he card quite unusable?

Gabriela-Intel commented 1 month ago

Sorry for the delay. I've been reaching out to the team and haven't heard any updates on your issue. I'll follow up with them again and will post here once I have something to share.