Ricks-Lab / gpu-utils

A set of utilities for monitoring and customizing GPU performance
GNU General Public License v3.0
136 stars 23 forks source link

Fan PWM on RX 5600 XT #74

Closed csecht closed 4 years ago

csecht commented 4 years ago

EDIT: After a system restart, this issue posted below corrected itself. I'll leave it open as I try to reproduce the conditions that cause a problem, if there is one.

I found a new issue running PAC with a Navi 10 RX 5600 XT. The problem is that, upon Save, Fan PWM becomes set to whatever the current reading is in the entry field regardless whether any change was made to the fan setting. So if the card had been running in Auto, then PAC is executed to change some other parameter and fans happen to be off because the card is resting, then upon Save, the fans are (re)set to 0%. Below is an example terminal stdout where I used PAC to first 'reset' the Fan PWM (auto mode), then immediately Saved again with no changes entered. It echoed '0' to pwm1 because the fans were not running at the time:

$ ./amdgpu-pac --execute
Detected GPUs: INTEL: 1, AMD: 1
AMD: amdgpu version: 20.10-1048554
AMD: Wattman features enabled: 0xfffd7fff
2 total GPUs, 1 rw, 0 r-only, 0 w-only

# Write Delta mode.
Batch file completed: /home/craig/amdgpu-utils-master/pac_writer_4f74907aa15646c8a291ac095e1423a2.sh
Writing 1 changes to GPU /sys/class/drm/card1/device
+ sudo sh -c echo '0' >  /sys/class/drm/card1/device/hwmon/hwmon3/pwm1_enable
+ sudo sh -c echo '2' >  /sys/class/drm/card1/device/hwmon/hwmon3/pwm1_enable
PAC execution complete.
# Write Delta mode.
Batch file completed: /home/craig/amdgpu-utils-master/pac_writer_c62fb651f47549bdbd90e909e4018fd7.sh
Writing 1 changes to GPU /sys/class/drm/card1/device
+ sudo sh -c echo '1' >  /sys/class/drm/card1/device/hwmon/hwmon3/pwm1_enable
+ sudo sh -c echo '0' >  /sys/class/drm/card1/device/hwmon/hwmon3/pwm1
PAC execution complete.

My current workaround it to enter "reset" for Fan PWM every time before I hit Save, and even that sometimes requires a separate "reset" and Save to get the fans going.

Unrelated, but good news, this card can accept changes to p-state masks on-the-fly, unlike the Ellsmere and Polaris RX 4xx & 5xx series that required the card not be under load. I fact, I think that sclk mask changes on Navi 10 work only when the card is under load; every change I tried to make to sclk masks when the card was not under load did not work. MCLK masks work in all situations.

Ricks-Lab commented 4 years ago

I found an issue where it was possible for the current fan speed to be interpreted as None and the default of Zero could be seen as an intended change and saved to the card. I put several mitigations in place. Please run latest on master with --debug and share relevant pac_writer and debug entries.

csecht commented 4 years ago

Okay, so I downloaded the current Master, suspended the GPU load so fans would shut off, then ran amdgpu-pac and Saved without any changes; the fan PWM remained in its previous 'auto' state and revved up when GPU load resumed. Nice. The debug file is attached. Here is the terminal stdout:

$ ./amdgpu-pac --execute --debug
Ubuntu: Validated
Detected GPUs: INTEL: 1, AMD: 1
AMD: amdgpu version: 20.10-1048554
AMD: Wattman features enabled: 0xfffd7fff
2 total GPUs, 1 rw, 0 r-only, 0 w-only

# Write Delta mode.
Batch file completed: /home/craig/amdgpu-utils-master/pac_writer_6568606099a648c4a47d3968b4ad0e0e.sh
Writing 0 changes to GPU /sys/class/drm/card1/device
PAC execution complete.

debug_gpu-utils_20200528-113020.log

Ricks-Lab commented 4 years ago

Not sure if it will cause problems, but the latest prevents the user from setting a fan speed below 20%. Just to be safe.

csecht commented 4 years ago

It looks like the lowest fan speed on the default (boot) auto setting is 18%, so a 20% minimum manual speed is a good call; having them not be the same might help users in certain situations.