T-Troll / alienfx-tools

Alienware systems lights, fans, and power control tools and apps
MIT License
490 stars 45 forks source link

Wrong dGPU temperature on Dell G15 5510 #151

Closed londresarthur closed 2 years ago

londresarthur commented 2 years ago

Describe the bug The software is incorrectly reporting the GPU temperature, it is probably reading an ambient temperature sensor, just like AWCC does.

To Reproduce Steps to reproduce the behavior:

  1. Just use it on a Dell G15 5510

Expected behavior It would be interesting if the software read the actual temperature of the dGPU, as well as other software (such as GPU-Z and MSI Afterburner), to be able to define a temperature curve for the dGPU too

Screenshots alienfan 1 alen 2

Desktop (please complete the following information):

Additional context The same error happens using the AWCC, I don't understand if this is an ambient temperature sensor, but it appears to be, as the temperature slowly increases during use.

T-Troll commented 2 years ago

It's a good question which temperature is the right one... I got temps from BIOS reading, let me check your BIOS (i have a dump) - maybe they add some correction.

Meanwhile, try to run alienfx-mon and enable ESIF sensors - did one of them have the readings from GPU-Z?

as the temperature slowly increases during use.

It's correct. You have shared cooling system between CPU and GPU, so CPU and MB fry GPU after some time even if not used.

londresarthur commented 2 years ago

I'm 100% sure that the correct one for the dGPU is the "Temp 2" sensor, I did several tests and the temperature reported on the GPU-Z is exactly the same as that reported by the Temp 2 sensor.

photo 01 photo 02

T-Troll commented 2 years ago

Yes, it's a bug (in fact - two bugs) into your system BIOS.

Temps are shifted down (CPU too). For CPU, difference should be 0-3C, for GPU (JFYI - they use \_SB.PC00.LPCB.ECDV.TVGA._TMP()) - difference can be up to 27C.

Let me think - i can't use ESIF data for fan control directly (it's from WMI, so provide heavy impact on both size and performance), but i try to configure out how to detect incorrect data.

Meanwhile, i recommend to start fan boost early (about 20C), this should compensate it.

londresarthur commented 2 years ago

I believe this sensor reported as GPU Internal Thermistor simply has nothing to do with the actual GPU temperature. When I do a load test on the GPU the temperature reported by this sensor does not change, it only changes with hours of use. So I believe that the sensor reports some temperature from some part of the chassis, but that has nothing to do with the GPU.

londresarthur commented 2 years ago

The CPU temperature appears to be correct, it is reporting the same temperature as the Throttlestop, which reads directly from the CPU.

londresarthur commented 2 years ago

BTW, sensors Temp 5, 6 and 7 are all CPU Temperature too, they report the same temperature as the Throttlestop, and as you can see, the temperature of these sensors in the last screenshot reports very similar temperatures as GPU sensor (Temp 3).

T-Troll commented 2 years ago

I believe this sensor reported as GPU Internal Thermistor simply has nothing to do with the actual GPU temperature.

Even more interesting. As i see at reading function, they have a barrier based on other ACPI flag - so it not reports real sensor data in some cases. And, even worse, BIOS control GPU fan based on this data!

In fact, i wonder why ESIF data does not expose in thermal zones (it's G5 "feature", all of them like this, but all Alienware do so), seems like they use different ACPI blocks for it. You can try to locate it into your BIOS, and i'll add support for reading and control.

T-Troll commented 2 years ago

Update: Uff... It's into \_SB.PC00.LPCB.ECDV. But there are some issue. All temps can be readed trough method KDRT with number as parameter. But it's names... in different blocks with non-numeric device name. BTW, you also have CPU/GPU VR temp, GPU mem temp and battery temp.

T-Troll commented 2 years ago

Ok, let's test it! Here is a test version of alienfan-cli - alienfan-cli.zip

Can you please:

  1. Unpack it into Alienfx-tools folder (it also needs kdl.dll and hwacc.sys)
  2. Open administrator CMD
  3. Run command alienfan-cli test=X, where X is from 0 to N (at least 6, maybe more).

I'm interested into output log (to define how to count sensor's quantity). Meanwhile, check the data from sensors, it should be same as ESIF.

WARNING! Be careful, incorrect input value can provide BSOD, so close all other apps. You didn't break anything, anyway (data not modified).

londresarthur commented 2 years ago

photo 03

Was it supposed to happen like this?

T-Troll commented 2 years ago

No, this means method call failed. It should be 2 strings... Let me check....

londresarthur commented 2 years ago

I tested up to n=30, and nothing.

londresarthur commented 2 years ago

Just in case, there is my ACPI dump:

T-Troll commented 2 years ago

Thanks. I remove it - it have some sensitive data inside!

londresarthur commented 2 years ago

it have some sensitive data inside!

what kind of data?

T-Troll commented 2 years ago

Yes! I found the issue - these bios have different way to values!

Please try this CLI - alienfan-cli.zip

Task is the same (in case "Test result" is 1, not 0!).

T-Troll commented 2 years ago

what kind of data?

Your full system data (tags, manufacturing info) and Windows security keys. Better not share it for public.

londresarthur commented 2 years ago

test=5 gave me the actual GPU temperature

photo 04

londresarthur commented 2 years ago

what kind of data?

Your full system data (tags, manufacturing info) and Windows security keys. Better not share it for public.

Thank you!

londresarthur commented 2 years ago

Is it possible to read the temperature of the VRM through this method?

londresarthur commented 2 years ago

from test=0 to test=15:

test.txt

T-Troll commented 2 years ago

Yes, it's correct now! BTW, AWCC (And their BIOS functions) reading sensor 4. But right one is sensor 5 (even by name).

Is it possible to read the temperature of the VRM through this method?

You can check what all of this means:

Can you please check MORE sensors? You have 6, but i interested what happened if you ask for 10th or so.

Oh, i see! 255 (-1). Niiiiice!

T-Troll commented 2 years ago

Ok, let's test. Here is AlienFan tools - AlienFan.zip

First, test CLI - alienfan-cli temp - it should expose 7 sensors (but names are weird for some, i didn't configure out how to read it correctly for now).

If this works, start GUI....

londresarthur commented 2 years ago

photo 05

IT WORKED!!! Thank you very much!

T-Troll commented 2 years ago

You are welcome!

Let me do some polish and maybe configure out how to get names, so wait for new official release. If all work well into it - i close this task.

PS: Looking at your curve, I recommend spin fans earlier - it needs some time to spin up, especially to overboost.

londresarthur commented 2 years ago
  • Method(_TMP

Ok, so: 01 - CPU Package Sensor 02 - CPU VR Sensor 03 - dGPU VR Sensor 04 - dGPU VRAM Sensor 05 - AWCC (?) 06 - dGPU Sensor

I just didn't understand the sensor called "AWCC".

londresarthur commented 2 years ago

upespecially to overboost.

Overboost doesn't work on my laptop, never goes above 5000 rpm.

T-Troll commented 2 years ago

I just didn't understand the sensor called "AWCC".

It's what AWCC used as a GPU sensor ^_^

Overboost doesn't work on my laptop, never goes above 5000 rpm.

Overboost don't work in G-Mode. I think i need to disable it before testing, just forget to do so.

By the way, here is a new version - it adds ECDV sensors for different BIOS variations. Can you check all still correct for your gear? (Also, it has some fixes for overboost - see #150).

londresarthur commented 2 years ago

Overboost don't work in G-Mode. I think i need to disable it before testing, just forget to do so.

Overboost still doesn't work, even with G-mode off, the RPM doesn't go beyond 5000 rpm at all:

photo 06

Can you check all still correct for your gear?

Yes, everything is still working correctly:

photo 07

T-Troll commented 2 years ago

Interesting... For G-series overboost quite high in common - about 150+. But seems like your BIOS is nicely tuned. By the way, you can experiment - alienfan-cli setover=0,150 for example. But i don't sure you fans can run above 5000....

Anyway, thank you for testing!

londresarthur commented 2 years ago

It's what AWCC used as a GPU sensor ^_^

Yes, but I don't understand where this value comes from. It doesn't seem to be something random and doesn't seem to be influenced by the other sensors, very curious.

T-Troll commented 2 years ago

You need to study hardware design to answer your question. From the software side, it's just one of the temperature sensors connected to EC bus. Have no idea there it connected really...

... But can guess it can be Ambient sensor (i have one at my gear) or SSD (have too).

T-Troll commented 2 years ago

Please check release 6.2.0. In case it not broke anything for you, you can close this issue.