T-Troll / alienfx-tools

Alienware systems lights, fans, and power control tools and apps
MIT License
492 stars 45 forks source link

Overheating Coming Out of Sleep #201

Closed TheSQLGuru closed 1 year ago

TheSQLGuru commented 2 years ago

Normally I shut down my computer each night. I left it on last night, and it went to sleep on its own. Alienfx Fan Control GUI was running, in my Manual mode.

I turned the laptop on this morning, and began working. Some time later (unknown, but probably a few tens of minutes) I had to plug in my headphones for a call. When I touched the laptop case it was instantly painful to the skin due to heat. I keep ThrottleStop running and the laptop well restricted in power usage during normal use. I show CPU temp in the Taskbar and it was 76C, when it should have been 46-49C. I immediately opened AlienFan GUI and went to Level 6 to blast the fans. I was in such a rush that I did not stop to look at the fan speeds. But I do know that they were not running as they normally would be because when I thought about it I know that I didn't hear them.

I have had this exact same issue once before a couple of months ago. I was not as involved with AlienFX then as I am now, so I don't know if that one was due to the app although it was installed. But now I feel it is VERY likely that the application is having some negative consequence when the laptop comes out of the sleep state. I will so some testing when I get a chance, perhaps later tonight and report if the event is repeatable.

AlienFx v7.2.2 running on an Alienware X17 R1, 11980HK/3080 with a fully patched Win 10 x64 Pro 21H2 OS. My BIOS is 1.4 - which I note is well behind the current version. Let me know if there is any additional information you need or troubleshooting you want me to do.

T-Troll commented 2 years ago

Most funny, i have the same issue about 30 min ago. At my m15. I don't shut it down, but use sleep (for night) or Hibernate (if moving) instead. This time i move, so notebook was hibernated.

I don't meet it before, even after 7.x release, so i suppose it's a hello from the latest Windows update (i got it recently).

Anyway, i'll investigate the situation for both of us.

TheSQLGuru commented 2 years ago

Well, I had it happen again this morning after starting up from the laptop being off overnight. Same thing - temps ramped up to crazy hot before I noticed, then I realized no fan noise at all. Quickly switched to level 6, temps came down quickly, back to my Manual curve - worked fine all day while the laptop was running.

For Windows Updates, I do NOT have KB5017308 installed (202209 CU for Win 10 21H2 x64). I do have the 202208 CU for that build installed, as of 20220903.

Hit me up if you need any testing or additional info. I hope this isn't one of those "intermittent failures" that takes a while to put a finger on.

TheSQLGuru commented 2 years ago

I just started my laptop from a shutdown state. See the attached for what it thinks the fans are doing. I am certain that fan 4 is not spinning at 11600 RPM. :-) I am also virtually certain that my #1 fan is not spinning at 2800 - the sound isn't noticeable enough. And fan 2 should probably be more than 700 based on current CPU temps and my fan curve. I can't confirm that explicitly though.

AlienFan Control 11600 Fan RPM

TheSQLGuru commented 2 years ago

A) On the Graph, what does the verbiage mean - Fan curve (scale: 156, boost: 5, 13%? I am clearly not on boost 5, at least not that I know of.

B) I just rebooted, and the speeds are now 2300, 2100, 2800, 500

BTW, I do have the app set to Start with Windows, which I just recently started doing with the current build IIRC.

T-Troll commented 2 years ago

Ohh... 11500 is the key! This is well-known bug of some Dell G (and now - Alienware) BIOS - fans stuck in some condition, not run correctly to full RPM nor provide correct RPM. Roll back BIOS. For quick override, change power mode/g-mode. I can't do so much with it...

TheSQLGuru commented 2 years ago

1) I have not updated the BIOS since I received the laptop. It came with 1.4 (which is well out of date) and I just reverivied via CPU-Z that it is still on 1.4.

NEW INFO:

When AlienFan autostarted with windows just a bit ago, I found the following:

A) Fan speeds were NOT going up to the levels they had in the past based on current temps - except for fan 1, which was at 3100RPM. This led to the same behavior of the entire system temps (both measured as well as the case of the laptop) ramping up a degree or so every 1-2 mins. See images.

image

image

B) Raising the manual fan curve for GPU had the dot follow the curve, but the GPU-related fans remained at the same level as before.

image

C) This one is VERY wierd: I played around with the CPU Boost AC and Battery. No matter what starting point I was at, nor whether I changed AC or Battery settings, if I was in anything other than High Power (as picked via the normal power control panel task bar icon), ANY change in CPU Boost resulted in my laptop power level changing to High Power. This was repeatable over all permutations of my 4 laptop power settings (batt/bal/high/ultra).

D) I stopped and restarted AlienFan GUI, and observed that the red dot for GPU was all the way to the left on the X (temperature) axis, when it was actually 42 deg and should have been at the shelf you can see in the image.

image

Something is VERY messed up with my installation and its functionality since I updated to the most recent (7.2.2?) build. Please check if you can replicate any of what I posted above, and get other reliable testers or users to do some evals.

Thankfully I have access to all past builds. I am going to UNINSTALL my current build (after extracting all of the settings information I can find) and then try out each build from 7.1.0 to 7.2.1, and see how each plays out with my current settings reaplied after each install. If they all work improperly that will be something of a smoking gun for you to check out. But I will repeat the installs and work with just default settings for each install and rebuild my fan curve and see how that functions.

It may be a few days before I can report back.

TheSQLGuru commented 2 years ago

Still 7.2.2, Level 2 had zero RPM for all fans, and the levels are out of order (This is new finding).

image

TheSQLGuru commented 2 years ago

Prior to my complete uninstall of 7.2.2, I searched and exported all Registry entries that seem to be applicable to AlienFan tool. I am curious about what some of them are used for functionally, but am also hopeful they may be of benefit in your debugging efforts. AlienFXRegistry.zip

TheSQLGuru commented 2 years ago

As a point, the following registry keys were left after a full uninstall (as captured by Revo Uninstaller). I don't think you can do much about these though.

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Installer\Folders 2 keys

HKEY_CURRENT_USER\SOFTWARE\Microsoft\Windows\CurrentVersion\Explorer\FeatureUsage\AppSwitched 2 keys

HKEY_CURRENT_USER\SOFTWARE\Microsoft\Windows\CurrentVersion\Explorer\FeatureUsage\ShowJumpView 1 keys

HKEY_CURRENT_USER\SOFTWARE\Microsoft\Windows\CurrentVersion\UFH\SHC 11 keys

HKEY_CURRENT_USER\SOFTWARE\Microsoft\Windows\CurrentVersion\UFH\SHC 1 key

HKEY_CURRENT_USER\SOFTWARE\Microsoft\Windows NT\CurrentVersion\AppCompatFlags\Layers 3 keys

HKEY_CURRENT_USER\SOFTWARE\Classes\Local Settings\Software\Microsoft\Windows\Shell\MuiCache 2 keys

T-Troll commented 2 years ago

Something is VERY messed up with my installation

I just use data BIOS report. So some mess into BIOS functions...

the levels are out of order

Let me check how it happened. It's not an issue, in fact (i use other IDs for modes), but interesting - looks like you power modes messed into BIOS report.

I don't think you can do much about these though.

So of them are Windows-specific and uninstallers can't do anything with it. In fact, i try 4 different install/uninstall systems (i want to solve some other issues as well, f.e. registry configuration control), but all is a crap. Current one small, at least!

TheSQLGuru commented 2 years ago

Always trade-offs when it comes to using stuff built by others - install/uninstall utilities in this case.

I have been intending on updating my BIOS to the latest version, but I will hold off for now in case you need some information out of it to help with the debugging.

In the mean time, I am about to do a fresh install of 7.1.0 now that I have a clean registry and see how that works for a few days (or until it presents issues).

TheSQLGuru commented 2 years ago

v7.1.0 News:

1) Somehow my Manual Curves still existed, even though I thought I had removed all registry entries related to AlienFan Control. Where is that stored in the registry (or elsewhere) so I can remove it for testing?

image

2) My GPU temp is pegged at 2, meaning fans are both 0. HOWEVER, when I opened HWinfo to check temps, GPU temp popped up to expected value for some few seconds, then back to 0. As long as HWinfo was open it seemed to pop up and back down. No pattern noticed.

3) On level 5, all fans were 0 and stayed there. GPU temp 2. Level 4 fans 5500, 5500, 3000, 3000, GPU temp2.

image

4) Power modes still out of order.

5) Still occassionally got that thing where the entire system hangs for maybe 8 seconds when changing power levels.

QUESTION: I may have asked this before, but what is the state of the system if AlienFan Control is shut down? Does it matter what Power Level the app is on when it is turned off?

T-Troll commented 2 years ago
  1. HKCU/Software/AlienFan. In case you use profiles, also HKCU/Software/Aliefxgui/Profiles. NB: HKCU means it user-depended, so you can have some if change user.
  2. You should have more sensors (try to install Libre Hardware monitor, my tools can feed data from it). Yes, GPU one is definitely buggy into BIOS.
  3. ANY LEVEL EXCEPT MANUAL DISABLES FAN CONTROL - so fans only controlled by BIOS. Seems like level 5 is "Quiet", so fans start at higher temps.
  4. The only thing can provide it is a BIOS update (so power modes order changed in BIOS). Are you sure Windows don't update it silently?
  5. Same. It's BIOS-related, some can hang for up to 15 seconds.

ANSWER: Any fan control app except CLI store power level and some other settings at the app start and restore it back at quit. In case start mode is Manual, it also set boosts to 0.

TheSQLGuru commented 2 years ago

1.1. I have not changed users, so I don't need to worry about that aspect

2.1. I did have one additional temp sensor show up in AFC on one of the more recent builds, but never before - including 7.1.0 it seems. I use HWInfo, which I am pretty sure doesn't miss anything. Which sensors do you think I have that should be showing up in AFC? I will look for them in HWI.

As for the "bugginess" of the native GPU temps flaking out, on older builds I never had a problem with that. By old I am pretty sure that includes all v6.x and prior.

3.1. Good to know.

4.1. BIOS is reported to be v1.4.0 dated 202109089, by both CPU-Z and HWI. This is the same version the laptop shipped with.

I will fully uninstall 7.1.0 soon, wipe registry settings, and try the next build up the list. Actually I will first drop back to a v6 build to see if the GPU temp is flaky or not.