BoukeHaarsma23 / WattmanGTK

A Wattman-like GTK3+ GUI
GNU General Public License v2.0
374 stars 61 forks source link

amdgpu.ppfeaturemask=0xffffffff causes artifacts and glitching #6

Closed GottaSlay closed 5 years ago

GottaSlay commented 5 years ago

Me and another user so far have experienced this. Any easy solutions?

GloriousEggroll commented 5 years ago

i dont have issues with this, on vega 64 with 4.19 kernel

vosian commented 5 years ago

this wouldn't be a bug with this project. I think this is the proper place to report it https://bugs.freedesktop.org/

BoukeHaarsma23 commented 5 years ago

Correct, I will close this

sjnewbury commented 5 years ago

FWIW I see glitching on mclk transitions with more than 1 display connected. Without Overdrve, if you use more than 1 display memory reclocking is disabled and locked to maximum, even in manual mode. With the bit enabled memory will still reclock but since it can't do so during vblank on multiple outputs it glitches. I think this is by design, it enables you to lock the memory at a lower speed/voltage and change the maximum without crashing the system.

I solved this by using "manual" mode and setting the memory clock explicitly using the gamemode daemon. This is actually better than the behaviour without Overdrive since I can use low memory clocks to save power and reduce heat. There is still an issue of the value written into pp_dpm_mclk not staying set when modesetting occurs and switching back to auto-switching but it can be worked around..

I wrote a tiny setuid C helper to allow the user to switch memory clocks.

GM-Script-Writer-62850 commented 4 years ago

I am able to reproduce this when using more than 1 display, when not using amdgpu.ppfeaturemask the card's mclk is locked to it max speed, however this is not done when amdgpu.ppfeaturemask is in use by default even if you change the memory clock table so every state is the same it will still give the same issue, but to a much more tolerable extent, i am able to fix this by running this command as root: echo 2 > /sys/class/drm/card0/device/pp_dpm_mclk 2 is the highest state in the pl_dpm_mclk file

~$ cat /sys/class/drm/card0/device/pp_dpm_mclk
0: 300Mhz 
1: 1000Mhz 
2: 2000Mhz *

This is not a wattman bug, I'd say this is a kernel level issue with the amdgpu.ppfeaturemask overclocking feature

sjnewbury commented 4 years ago

@GM-Script-Writer-62850, it is a kernel bug. Specifically, the problem is mclk gets set to auto on any configuration change.

GM-Script-Writer-62850 commented 4 years ago

Can i get a link to the bug? I also noticed i was able to reproduce it even with only 1 display active oddly it can only glitch when the state changes, once the change is done it is good regardless of the speed, even 1920x2160 at 300Mhz memory works fine as long as it does not change states, even if all the state are identical just the process of switching between them causes it

JoneKone commented 4 years ago

The way you are talking about this seems to be relating to my instability issues on my AMD gpu.Memory clocks seem to be a bigger issue than I initially though, on Linux AMD. I had issues with the states switching and fixed my instability by simply locking the memory speed to performance mode.. I thought of enabling the feature mask but after reading this I think it would become more unstable.

cat /sys/class/drm/card0/device/pp_dpm_mclk 0: 167Mhz 1: 500Mhz 2: 700Mhz 3: 800Mhz *

I have a Vega 56

GM-Script-Writer-62850 commented 4 years ago

locking the mclk state does fix the issue (many thing can unlock it like opening super tux kart in full screen) The current kernel on xubuntu 20.04 (5.4.0-37-generic) does have the mclk state locked RX 580 8GB w/ dual monitors @ 120hz

if you manually set the mclk speeds to all be equal it still has the issue, it only flickers when the mclock state is changed if it is locked and you alter the current state it flickers also when i say all equal i mean

i think i found the bug report: https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-amdgpu/+bug/1813701

This is the current behavior of on my system (RX 580 8GB / 2x 1080p 120hz Monitors)

echo 0 > /sys/class/drm/card0/device/pp_dpm_mclk && cat /sys/class/drm/card0/device/pp_dpm_mclk
0: 300Mhz 
1: 1000Mhz 
2: 2000Mhz *

The driver has it locked to 2000Mhz the only way you can alter it is like this

echo 'm 2 300 750' > /sys/class/drm/card0/device/pp_od_clk_voltage
echo 'c' > /sys/class/drm/card0/device/pp_od_clk_voltage
cat /sys/class/drm/card0/device/pp_dpm_mclk
0: 300Mhz *
1: 1000Mhz 
2: 300Mhz 

The auto pp_od_clk_voltage file looks like this

cat /sys/class/drm/card0/device/pp_od_clk_voltage
OD_SCLK:
0:        300MHz        750mV
1:        600MHz        769mV
2:        900MHz        906mV
3:       1145MHz       1125mV
4:       1215MHz       1150mV
5:       1257MHz       1150mV
6:       1300MHz       1150mV
7:       1340MHz       1150mV
OD_MCLK:
0:        300MHz        750mV
1:       1000MHz        800mV
2:       2000MHz        950mV
OD_RANGE:
SCLK:     300MHz       2000MHz
MCLK:     300MHz       2250MHz
VDDC:     750mV        1200mV