GnomedDev / T2FanRD

GNU General Public License v3.0
9 stars 3 forks source link

Does not appear to consider the temperature of the AMD GPU #3

Open CaptainMorgan12 opened 2 months ago

CaptainMorgan12 commented 2 months ago

Environment: MacPro 7,1 , 2022.140.5.0.0 (iBridge: 21.16.6074.0.0,0), Linux 6.10.9-1-t2-noble, wayland, Ubuntu 24.04.1 LTS, AMD Radeon™ Pro 580X, Mesa LTS 24.2.2~kisak1~n, Reported Sep 9th 2024.

Situation: Play steam rebel galaxy outlaw watch the "edge amdgpu-pci-0700" temperature of the GPU go up to 100degC and observe the fan does not kick on (>>500rpm). The CPU temperature during game play stays at around 40degC since it isn't used. When I play extensively the computer just freezes probably reaching that critical temperature and restarts without the fan kicking on.

Workaround: set low temperature to ~40degC, the problem with this is that it is unstable and the fans kick on to high then turn off then turn back on and not in a linear fashion when not doing any extensive video work, just navigating the computer and using firefox.

image

Fan Config, t2fanrd running:

[Fan1]
low_temp=46
high_temp=75
speed_curve=linear
always_full_speed=false

[Fan2]
low_temp=46
high_temp=75
speed_curve=linear
always_full_speed=false

[Fan3]
low_temp=46
high_temp=75
speed_curve=linear
always_full_speed=false

[Fan4]
low_temp=46
high_temp=75
speed_curve=linear
always_full_speed=false

Two options to make this work better:

Option A: allow in the config to set the minimum RPM of each fan, add option to add "always_on_min_speed=true".

This way you can set the fan manually to where noise is manageable and it runs constantly say at 800RPM, which would probably solve most use cases. This is similar to what mbpfan does it mbpfan, this is the repo I used on my 2013 Trashcan MacPro and it worked fine - haven't tried it on the 2019 MacPro.

Option B: use highest temp sensor reading of all sensors as your fan input, including GPU sensors e.g. below that might be pch_lewisburg-virtual-0 at 57degC, or one that is usually higher like the enp4s0-pci-0400 MAC Temperature 53.6degC.

amdgpu-pci-0700
Adapter: PCI adapter
vddgfx:      850.00 mV 
edge:         +54.0°C  (crit = +108.0°C, hyst = -273.1°C)
PPT:          14.23 W  (cap = 123.00 W)

coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +38.0°C  (high = +87.0°C, crit = +97.0°C)
Core 2:        +37.0°C  (high = +87.0°C, crit = +97.0°C)
Core 4:        +36.0°C  (high = +87.0°C, crit = +97.0°C)
Core 5:        +36.0°C  (high = +87.0°C, crit = +97.0°C)
Core 8:        +37.0°C  (high = +87.0°C, crit = +97.0°C)
Core 9:        +38.0°C  (high = +87.0°C, crit = +97.0°C)
Core 10:       +36.0°C  (high = +87.0°C, crit = +97.0°C)
Core 12:       +35.0°C  (high = +87.0°C, crit = +97.0°C)
Core 13:       +36.0°C  (high = +87.0°C, crit = +97.0°C)

enp4s0-pci-0400
Adapter: PCI adapter
PHY Temperature:  +51.8°C  
MAC Temperature:  +51.9°C  

nvme-pci-0c00
Adapter: PCI adapter
Composite:    +41.9°C  (low  =  -5.2°C, high = +89.8°C)
                       (crit = +93.8°C)

acpitz-acpi-0
Adapter: ACPI interface
temp1:        +26.8°C  

applesmc-acpi-0
Adapter: ACPI interface
fan1:         502 RPM  (min =  500 RPM, max = 1522 RPM)
fan2:         488 RPM  (min =  500 RPM, max = 2852 RPM)
fan3:         506 RPM  (min =  500 RPM, max = 2852 RPM)
fan4:         488 RPM  (min =  500 RPM, max = 2852 RPM)

pch_lewisburg-virtual-0
Adapter: Virtual device
temp1:        +57.0°C  

enp3s0-pci-0300
Adapter: PCI adapter
PHY Temperature:  +50.9°C  
MAC Temperature:  +53.6°C  

nvme-pci-0100
Adapter: PCI adapter
Composite:    +29.9°C 
GnomedDev commented 2 months ago

I do not have a machine to test this out with, so cannot implement this, happy to accept a PR though.

CaptainMorgan12 commented 2 months ago

~~One other thing i noticed that may or may not be related, but changing the fan 1 config values:

[Fan1] low_temp=1 high_temp=20 speed_curve=linear always_full_speed=true

or any other values than those shown has no effect, it basically just runs at 500rpm, the other fans 2, 3, 4 do change based on measured temperature. I believe the fan 1 is the one that cools the GPU in the first slot - so the most important one - and I can't get it to spin up past 500 :-/. Also fan 2 only sluggishly responds to any changes in the conf, the only fans that seem to follow what the user puts in are fan 3 and fan 4.

I will test any new files you can generate if you could:

  1. use same configuration for temp measurement used for fan 3, and 4 on fan 1 and 2 and update program i'll test
  2. adjust all fan outputs with minimum adjusted to say 700 and i can test that as well without changing the temperature sensor configuration~~

Ok that was user error i think by installing lm-sensors, fancontrol, macfand, mbpfan and a bunch of other programs to try to control the fans, the daemons might have been stepping on each other since most of those controlled fan 1, and/or 2, but never fan 3, 4. After purging everything F1-F4 can be controlled through T2fanrd.

So the issue boils down to the temperature sensor T2fanrd uses, which is CPU based after doing some testing. If I run Steam where CPU is cool, but GPU is only used none of the fans recognize that. So one thing one could do is use the CPU+GPU temperature average for Fan1 and Fan2 and the CPU, or average mac temp for Fan3 and Fan4. By tweaking the min/max temp using the T2fanrd configuration file i can get the Fan1 to run at say 900-1100rpm, which seems to be just enough for the GPU temperature not going critical - it hovers around 106degC at that rpm (the downside is of course that when doing CPU heavy workload Fan1 is much too loud and is cooling the GPU which may not be doing any work).