Ricks-Lab / gpu-utils

A set of utilities for monitoring and customizing GPU performance
GNU General Public License v3.0
142 stars 23 forks source link

<amdgpu-pac --execute_pac> error when setting fan speeds #7

Closed csecht closed 5 years ago

csecht commented 5 years ago

I tried the auto pac execution option to set fan speeds of both my cards at once, using Save All in the pac Gtk window. Below is the terminal output. (Card1 is the RX 460, Card0 is RX 570.) This may be the same problem as the issue I just reported where manual execution of pac shell scripts does not change fan speeds, but here provides more information about the nature of the errors (?)

$ ./amdgpu-pac --execute_pac AMD Wattman features enabled: 0xffff7fff amdgpu version: 18.50-708488 2 AMD GPUs detected 2 are compatible

WARNING: Under Development WARNING: Works but not fully tested. Please report any bugs found. Thanks!

Batch file completed: /home/craig/Desktop/amdgpu-utils-2.1.0-Features/pac_writer_8ef342bdb0fa4cca9bfe1de361976364.sh Writing changes to GPU /sys/class/drm/card1/device/

Batch file completed: /home/craig/Desktop/amdgpu-utils-2.1.0-Features/pac_writer_60c71cbf8a0444ce95eead142863da98.sh Writing changes to GPU /sys/class/drm/card0/device/

^C Traceback (most recent call last): File "/usr/lib/python3/dist-packages/gi/overrides/Gtk.py", line 1588, in main_quit @override(Gtk.main_quit) KeyboardInterrupt

At this point the terminal was non-responsive, so I closed it with ctrl-shift-w. Fan speeds remained unchanged.

csecht commented 5 years ago

hmmm, not sure why my last sentence is in bold and larger font. I used a # to set it off from the terminal output.

Ricks-Lab commented 5 years ago

Looks like I was missing a quote in setting pwm mode. Just made the change. Also had to make same fix for reset function.

Ricks-Lab commented 5 years ago

Hold off testing this. Just realized the I forgot to escape the quotes.

Ricks-Lab commented 5 years ago

@csecht Ok, now it should work. Much more clear after some sleep.

csecht commented 5 years ago

Yes! Fan speeds can now be set. Once executed however, the new current % value is not often the same as the input value. I assume that it is following some amdgpu table of preset values? The displayed value also has 13 decimals. When trying to reset a fan back to its default setting, I realized that there is no option for doing that for just one parameter. The Reset button changes all settings back to default, but it would be helpful to be able to also reset each individually.

Ricks-Lab commented 5 years ago

Could be that current value is based on pwm sensor signal where set value is the pwm setting. I suspect that there can be slight variation. Are they at least close? I think I have fixed it so the display of pwm is an integer. It is always 0 on my system so please help to verify for me. I also included the ability to reset fan and power cap by entering a negative value.

I am now working on a new feature to set pstate mask. This should have a significant impact on power management, as currently, max voltage is always used in the highest p-state. Limiting to the second highest pstate should result in a significant power savings.

csecht commented 5 years ago

Yes! using a fan speed of -1 works very nicely to set fan control to automatic. That’s a nice feature. But something broke with that update because now current fan % speed is always displayed as 0 in the pac and monitor windows. I can enter and save fan values and I can tell they take by fan noise and temperature readouts, but actual speed %’s are not shown.

I saw the pstate masking feature, but have not yet given it a shot.

i now use the —execute_pac option all the time because it provides fast response for changing settings (like when I mucked something up with fan speed and saw temps shooting up to 90+C). Having the monitor window open in a separate terminal is essential when poking around in pac. It’s good that amdgpu-pac only allows setting fans with %; ROCm-smi allows using either % or state values with the —setfan option and I once made the mistake of entering a too-low state value because I forgot the % sign, which resulted in a system shutdown from an overheated card.

On Feb 23, 2019, at 8:23 PM, Rick notifications@github.com wrote:

Could be that current value is based on pwm sensor signal where set value is the pwm sensor. I suspect that there can be slight variation. Are they at least close? I think I have fixed it so the display of pwm is an integer. It is always 0 on my system so please help to verify for me. I also included the ability to reset fan and power cap by entering a negative value.

I am now working on a new feature to set pstate mask. This should have a significant impact on power management, as currently, max voltage is always used in the highest p-state. Limiting to the second highest pstate should result in a significant power savings.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Ricks-Lab/amdgpu-utils/issues/7#issuecomment-466726984, or mute the thread https://github.com/notifications/unsubscribe-auth/AtlRQhUUjf0rve4hxC34mnmwDUed9MM_ks5vQfetgaJpZM4bOBm4.

Ricks-Lab commented 5 years ago

I fixed the fan speed read problem. To fix the 13 decimal display issue, I am converting to int, but the last version converted to int before multiplying by 100. I am preparing for release and I wanted to include you in the credits for testing (--about). Do you agree? How should I refer to you (github handle, E@H handle, name, other)?

csecht commented 5 years ago

Yes, it looks good now. I added screenshots of my latest pac and monitor windows to the fan issues comments in the Features branch.

When I first downloaded and launched the latest versions of -monitor and -pac, the fan speed came up correctly for the rx570 (card0), but read zero for the rx460. Once I saved a value to change the rx460 fan, it then read correctly thereafter. As far as I can tell, amdgpu-utils now works great!

Wow, yes, thanks for the credit. Cool. Please use my E@H handle: cecht. You can also include my full name, Craig Echt - I don’t know what the usual GitHub conventions are.

Cheers, Craig

On Feb 24, 2019, at 5:06 PM, Rick notifications@github.com wrote:

I fixed the fan speed read problem. To fix the 13 decimal display issue, I am converting to int, but the last version converted to int before multiplying by 100. I am preparing for release and I wanted to include you in the credits for testing (--about). Do you agree? How should I refer to you (github handle, E@H handle, name, other)?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Ricks-Lab/amdgpu-utils/issues/7#issuecomment-466827838, or mute the thread https://github.com/notifications/unsubscribe-auth/AtlRQrA5rTccB2AsvvcEZB-nkyjDTkULks5vQxr_gaJpZM4bOBm4.

Ricks-Lab commented 5 years ago

Development and testing now complete. Merged into master.