flightlessmango / MangoHud

A Vulkan and OpenGL overlay for monitoring FPS, temperatures, CPU/GPU load and more. Discord: https://discordapp.com/invite/Gj5YmBb
MIT License
6.13k stars 262 forks source link

cpu_temp using Chipset temperature #1323

Closed Kagukara closed 2 weeks ago

Kagukara commented 2 months ago

Describe the bug The cpu_temp metric in the mangohud overlay is using my Chipset temperature instead of the CPU Tctl temperature.

List relevant hardware/software information

To Reproduce Steps to reproduce the behavior:

  1. In the terminal run mangohud glxgears
  2. Open a new terminal and run sensors
  3. Check if the CPU temperature is correct

Expected behavior For the cpu_temp metric in the mangohud overlay to show the correct metric.

Screenshots capture_2024-05-10_23-09-51 capture_2024-05-10_23-16-09 capture_2024-05-10_23-16-17

flightlessmango commented 2 months ago

can you get a tree of the hwmon folder for asusec?

Kagukara commented 2 months ago
$ tree /sys/class/hwmon/hwmon4/
/sys/class/hwmon/hwmon4/
├── curr1_input
├── curr1_label
├── device -> ../../../asus-ec-sensors
├── fan1_input
├── fan1_label
├── fan2_input
├── fan2_label
├── in0_input
├── in0_label
├── name
├── power
│   ├── autosuspend_delay_ms
│   ├── control
│   ├── runtime_active_time
│   ├── runtime_status
│   └── runtime_suspended_time
├── subsystem -> ../../../../../class/hwmon
├── temp1_input
├── temp1_label
├── temp2_input
├── temp2_label
├── temp3_input
├── temp3_label
└── uevent

Let me know if this is correct.

flightlessmango commented 2 months ago

Yes that's correct, thank you. can you also get the output of awk '{print FILENAME ": " $0}' *_label in the same folder?

Kagukara commented 2 months ago

Here you go:

$ awk '{print FILENAME ": " $0}' *_label
curr1_label: CPU
fan1_label: VRM HS
fan2_label: Chipset
in0_label: CPU Core
temp1_label: Chipset
temp2_label: T_Sensor
temp3_label: VRM
Kagukara commented 1 month ago

The cpu_temp metric is still using the chipset temperature instead of the CPU temperature.

My CPU temperature can be found:

$ tree /sys/class/hwmon/hwmon3/
/sys/class/hwmon/hwmon3/
├── curr1_input
├── curr1_label
├── curr2_input
├── curr2_label
├── debug_data
├── device -> ../../../0000:00:18.3
├── in1_input
├── in1_label
├── in2_input
├── in2_label
├── name
├── power
│   ├── autosuspend_delay_ms
│   ├── control
│   ├── runtime_active_time
│   ├── runtime_status
│   └── runtime_suspended_time
├── power1_input
├── power1_label
├── power2_input
├── power2_label
├── subsystem -> ../../../../../class/hwmon
├── temp1_input
├── temp1_label
├── temp1_max
├── temp2_input
├── temp2_label
├── temp3_input
├── temp3_label
├── temp4_input
├── temp4_label
└── uevent
$ awk '{print FILENAME ": " $0}' *_label
curr1_label: SVI2_C_Core
curr2_label: SVI2_C_SoC
in1_label: SVI2_Core
in2_label: SVI2_SoC
power1_label: SVI2_P_Core
power2_label: SVI2_P_SoC
temp1_label: Tdie
temp2_label: Tctl
temp3_label: Tccd1
temp4_label: Tccd2

This maybe because I use zenpower3-dkms for my CPU sensors. Something you might need to account for?

Kagukara commented 3 weeks ago

I've uninstalled zenpower3-dkms and it still shows the chipset temperature for CPU temperature.


$ tree /sys/class/hwmon/hwmon3/:

``` /sys/class/hwmon/hwmon3/ ├── device -> ../../../0000:00:18.3 ├── name ├── power │   ├── autosuspend_delay_ms │   ├── control │   ├── runtime_active_time │   ├── runtime_status │   └── runtime_suspended_time ├── subsystem -> ../../../../../class/hwmon ├── temp1_input ├── temp1_label ├── temp3_input ├── temp3_label ├── temp4_input ├── temp4_label └── uevent 4 directories, 13 files ```

$ awk '{print FILENAME ": " $0}' *_label:

``` temp1_label: Tctl temp3_label: Tccd1 temp4_label: Tccd2 ```

$ tree /sys/class/hwmon/hwmon4/:

``` /sys/class/hwmon/hwmon4/ ├── curr1_input ├── curr1_label ├── device -> ../../../asus-ec-sensors ├── fan1_input ├── fan1_label ├── fan2_input ├── fan2_label ├── in0_input ├── in0_label ├── name ├── power │   ├── autosuspend_delay_ms │   ├── control │   ├── runtime_active_time │   ├── runtime_status │   └── runtime_suspended_time ├── subsystem -> ../../../../../class/hwmon ├── temp1_input ├── temp1_label ├── temp2_input ├── temp2_label ├── temp3_input ├── temp3_label └── uevent ```

$ awk '{print FILENAME ": " $0}' *_label:

``` curr1_label: CPU fan1_label: VRM HS fan2_label: Chipset in0_label: CPU Core temp1_label: Chipset temp2_label: T_Sensor temp3_label: VRM ```

Not sure why its not working, as it should see the name inside hwmon3 as k10temp, and as there is no Tdie it should use the Tctl.

Screenshot showing mangohud and sensors, with mangohud matching the chipset sensor for asusec-isa-0000 and not Tctl for k10temp-pci-00c3: ![capture_2024-06-23_16-59-52](https://github.com/flightlessmango/MangoHud/assets/43391109/f7861114-c5d4-4732-b82f-106abca606aa)
flightlessmango commented 3 weeks ago

Can you get the mangohud logs with MANGOHUD_LOG_LEVEL=debug mangohud vkcube?

Kagukara commented 3 weeks ago

I've copied the output to a txt file, as its too long (obnoxious) to paste into github.

Here is the cpu.cpp section though for quick viewing:

[2024-06-24 23:59:14.511] [MANGOHUD] [debug] [cpu.cpp:507] hwmon: sensor name: iwlwifi_1
[2024-06-24 23:59:14.511] [MANGOHUD] [debug] [cpu.cpp:507] hwmon: sensor name: asusec
[2024-06-24 23:59:14.511] [MANGOHUD] [debug] [cpu.cpp:490] fallback cpu temp input: /sys/class/hwmon/hwmon4/temp1_input
[2024-06-24 23:59:14.511] [MANGOHUD] [debug] [cpu.cpp:539] hwmon: using input: /sys/class/hwmon/hwmon4/temp1_input
[2024-06-24 23:59:14.511] [MANGOHUD] [debug] [cpu.cpp:632] hwmon: sensor name: iwlwifi_1
[2024-06-24 23:59:14.511] [MANGOHUD] [debug] [cpu.cpp:632] hwmon: sensor name: asusec
[2024-06-24 23:59:14.511] [MANGOHUD] [debug] [cpu.cpp:632] hwmon: sensor name: nvme
[2024-06-24 23:59:14.511] [MANGOHUD] [debug] [cpu.cpp:632] hwmon: sensor name: amdgpu
[2024-06-24 23:59:14.511] [MANGOHUD] [debug] [cpu.cpp:632] hwmon: sensor name: nct6798
[2024-06-24 23:59:14.511] [MANGOHUD] [debug] [cpu.cpp:632] hwmon: sensor name: asus
[2024-06-24 23:59:14.511] [MANGOHUD] [debug] [cpu.cpp:632] hwmon: sensor name: zenpower
[2024-06-24 23:59:14.511] [MANGOHUD] [debug] [cpu.cpp:586] hwmon: using input: /sys/class/hwmon/hwmon3/power1_input
[2024-06-24 23:59:14.511] [MANGOHUD] [debug] [cpu.cpp:587] hwmon: using input: /sys/class/hwmon/hwmon3/power2_input

mangohud_log_level_debug.txt

Kagukara commented 2 weeks ago

@flightlessmango So I edited out lines 528 to 531 for "asusec" in cpp.cpu, built and installed mangohud using the the build.sh script. The CPU temperature is now matching Tdie/Tctl in sensors.

https://github.com/flightlessmango/MangoHud/blob/2d0c0a1b3cd0a9949ac821204da61475a10218cc/src/cpu.cpp#L528-L531

Screenshot showing mangohud and sensors, with mangohud matching the Tdie/Tctl in sensors: ![capture_2024-06-27_07-57-20](https://github.com/flightlessmango/MangoHud/assets/43391109/9cd5f712-dc11-4592-8516-bd119991d60c)

Here is the cpu.cpp section for MANGOHUD_LOG_LEVEL=debug mangohud vkcube:

[2024-06-27 07:49:50.913] [MANGOHUD] [debug] [cpu.cpp:507] hwmon: sensor name: iwlwifi_1
[2024-06-27 07:49:50.913] [MANGOHUD] [debug] [cpu.cpp:507] hwmon: sensor name: asusec
[2024-06-27 07:49:50.913] [MANGOHUD] [debug] [cpu.cpp:507] hwmon: sensor name: nvme
[2024-06-27 07:49:50.913] [MANGOHUD] [debug] [cpu.cpp:507] hwmon: sensor name: amdgpu
[2024-06-27 07:49:50.913] [MANGOHUD] [debug] [cpu.cpp:507] hwmon: sensor name: nct6798
[2024-06-27 07:49:50.976] [MANGOHUD] [debug] [cpu.cpp:539] hwmon: using input: /sys/class/hwmon/hwmon7/temp13_input
[2024-06-27 07:49:50.976] [MANGOHUD] [debug] [cpu.cpp:632] hwmon: sensor name: iwlwifi_1
[2024-06-27 07:49:50.976] [MANGOHUD] [debug] [cpu.cpp:632] hwmon: sensor name: asusec
[2024-06-27 07:49:50.976] [MANGOHUD] [debug] [cpu.cpp:632] hwmon: sensor name: nvme
[2024-06-27 07:49:50.976] [MANGOHUD] [debug] [cpu.cpp:632] hwmon: sensor name: amdgpu
[2024-06-27 07:49:50.976] [MANGOHUD] [debug] [cpu.cpp:632] hwmon: sensor name: nct6798
[2024-06-27 07:49:50.976] [MANGOHUD] [debug] [cpu.cpp:632] hwmon: sensor name: asus
[2024-06-27 07:49:50.976] [MANGOHUD] [debug] [cpu.cpp:632] hwmon: sensor name: zenpower
[2024-06-27 07:49:50.976] [MANGOHUD] [debug] [cpu.cpp:586] hwmon: using input: /sys/class/hwmon/hwmon3/power1_input
[2024-06-27 07:49:50.976] [MANGOHUD] [debug] [cpu.cpp:587] hwmon: using input: /sys/class/hwmon/hwmon3/power2_input

Not sure why this temporarily fixes the problem though.

flightlessmango commented 2 weeks ago
index 51c2570..5a360c0 100644
--- a/src/cpu.cpp
+++ b/src/cpu.cpp
@@ -526,8 +526,8 @@ bool CPUStats::GetCpuFile() {
                 break;

         } else if (name == "asusec") {
-            find_input(path, "temp", input, "CPU");
-            break;
+            if (find_input(path, "temp", input, "CPU"))
+                break;
         } else {
             path.clear();
         }

Can you try this patch?

Kagukara commented 2 weeks ago

That worked, thank you.

flightlessmango commented 2 weeks ago

fixed here 8a31b967669576268d09e8efc604108c28ab3d87