Ricks-Lab / gpu-utils

A set of utilities for monitoring and customizing GPU performance
GNU General Public License v3.0
136 stars 23 forks source link

AMDGPU 19.5 breaks sclk masking #52

Closed csecht closed 4 years ago

csecht commented 4 years ago

Today I upgraded Ubuntu 18.04.3 LTS to the latest distro, which upgraded the Linux kernel from 5.0.0-37 to 5.3.0-26. That also upgraded AMDGPU drivers from 19.3-934563 to 19.5-967956. As a result, amdgpu-pac can no longer set sclk masking. There is a post (https://linuxreviews.org/Mesa_20_Will_Have_SDMA_Disabled_On_AMD_RX-Series_GPUs) that talks about sdma being disabled in the recent AMDGPU drivers for RX series and older AMD cards, but also for Navi (RX 5700-series) cards. Uninstalling and reinstalling AMDGPU downloaded from AMD does not fix it.

csecht commented 4 years ago

Update: P-state masking appears to work for setting an upper limit, but doesn't block lower states from running. For example, when I set a mask of 0,6, Einstein@Home gravity wave tasks run at all states up to 6; there is generally low and fluctuating GPU utilization with these particular tasks. When running E@H binary pulsar tasks, however, the p-state stays at 6; with these tasks, GPU utilization is nearly always 100%.

Ricks-Lab commented 4 years ago

I have always found that p-state masking did not prevent the card that is driving the display from going to lower p-states. Are you seeing this for all cards?

I have 2 systems that I can not install amdgpu or rocm without getting errors after update to 5.3. Maybe a new driver version will come out soon.

csecht commented 4 years ago

I only updated to 5.3 on the host with the RX 570 cards. RX 460 cards remain with the 5.0 kernel, so I don’t know it they would be affected. (Afraid to try!) It was only after the update that both 570 cards began spending a lot of time at lower p-states when running tasks that average about 60% GPU utilization. Previously, they stayed (usually) pegged at the highest set p-state.

Yes, hoping for a new driver version soon. In mean time, am running a different flavor of E@H tasks that keep GPU utilization near 100%, which maintain the maximum set p-state.

On Feb 4, 2020, at 7:28 AM, Rick notifications@github.com wrote:

I have always found that p-state masking did not prevent the card that is driving the display from going to lower p-states. Are you seeing this for all cards?

I have 2 systems that I can not install amdgpu or rocm without getting errors after update to 5.3. Maybe a new driver version will come out soon.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Ricks-Lab/amdgpu-utils/issues/52?email_source=notifications&email_token=ALMVCQVPZGST7LJM3CUCENLRBFURVA5CNFSM4KLSYE52YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKXTX6I#issuecomment-581909497, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALMVCQQXANPLKE7E4UHSJWDRBFURVANCNFSM4KLSYE5Q.

csecht commented 4 years ago

Have updated both hosts to kernel 5.3.0-40 and pstate masking is working fine with the amdgpu-utils v3.0.0 RC1. Have also changed the featuremask in grub to amdgpu.ppfeaturemask=0xfffd7fff (from 0xffff7fff) and can now use amdgpu-pac to underclock, overclock, and undervolt RX 460 and RX 570 cards, but only if I also re-set the previously set sclk mask that I was using.