Closed gardotd426 closed 4 years ago
Thanks for the report, I suspect this is related to the overdrive changes in stable kernel 5.5, will try to test it as soon as possible. Can you try to disable amdgpu.ppfeaturemask (as (power)upp does not rely on that)?
I also seey conflict between pp_table
settings and stuff in pp_od_clk_voltage
in Linux 5.5 on my 5700. Disabling OverDrive (via amdgpu.ppfeaturemask
) fixes it for me. It seems like OverDrive API is not yet fully implemented as of Linux 5.5.
@azeam but if you disable amdgpu.ppfeaturemask
how can you even confirm that it's working, because when you disable amdgpu.ppfeaturemask
there is no /sys/class/drm/card0/device/pp_od_clk_voltage
to check to confirm? You only get that file if amdgpu.ppfeaturemask
is set.
And as far as radeonjet
is concerned, no, it doesn't fix it:
sudo radeonjet get core table
0: 300Mhz *
1: 300Mhz
2: 300Mhz
Same with radeontop
.
With amdgpu.ppfeaturemask
disabled and with powerupp
setting the core to 1780 or below, just like before, again radeonjet
reports:
sudo radeonjet get core table
0: 300Mhz
1: 800Mhz *
2: 1780Mhz
Actually it doesn't even do anything at all without amdgpu.ppfeaturemask
set. If you set it to anything below 1780, nothing happens, it doesn't apply and the peak core freq stays at 1780. If you try to set it to anything above 1780, it drops it to 300MHz
At work now, will get back with a longer reply later tonight, but try reading the current values with (sudo) cat /sys/kernel/debug/dri/0/amdgpu_pm_info
Wait yeah it does. Still doesn't work man:
sudo cat /sys/kernel/debug/dri/0/amdgpu_pm_info
Clock Gating Flags Mask: 0x38099f05
Graphics Medium Grain Clock Gating: On
Graphics Medium Grain memory Light Sleep: Off
Graphics Coarse Grain Clock Gating: On
Graphics Coarse Grain memory Light Sleep: Off
Graphics Coarse Grain Tree Shader Clock Gating: Off
Graphics Coarse Grain Tree Shader Light Sleep: Off
Graphics Command Processor Light Sleep: Off
Graphics Run List Controller Light Sleep: Off
Graphics 3D Coarse Grain Clock Gating: Off
Graphics 3D Coarse Grain memory Light Sleep: Off
Memory Controller Light Sleep: On
Memory Controller Medium Grain Clock Gating: On
System Direct Memory Access Light Sleep: On
System Direct Memory Access Medium Grain Clock Gating: On
Bus Interface Medium Grain Clock Gating: On
Bus Interface Light Sleep: On
Unified Video Decoder Medium Grain Clock Gating: Off
Video Compression Engine Medium Grain Clock Gating: Off
Host Data Path Light Sleep: On
Host Data Path Medium Grain Clock Gating: On
Digital Right Management Medium Grain Clock Gating: Off
Digital Right Management Light Sleep: Off
Rom Medium Grain Clock Gating: Off
Data Fabric Medium Grain Clock Gating: Off
Address Translation Hub Medium Grain Clock Gating: On
Address Translation Hub Light Sleep: On
GFX Clocks and Power:
100 MHz (MCLK)
300 MHz (SCLK)
300 MHz (PSTATE_SCLK)
100 MHz (PSTATE_MCLK)
800 mV (VDDGFX)
11.0 W (average GPU)
GPU Temperature: 33 C
GPU Load: 0 %
MEM Load: 2 %
SMC Feature Mask: 0x00000622a3ddaffb
VCN: Disabled
It seems this program is just broken
I've now updated to stable kernel 5.5 (from rc2) but I'm not able to reproduce this on my 5700 XT, (in fact nor any other issues even with OverDrive enabled as far as I can tell). I still don't get the:
VDDC_CURVE_SCLK[X]
VDDC_CURVE_VOLT[X]
values, like you have on your 5600, so something is different with the OverDrive implementation, either between our systems or the way the 5600/5700 XT cards are working. This is of less importance with the OverDrive turned off though, just a remark. But as @sibradzic noted above there still seem to be issues with the OverDrive settings in combination with the pp table, so keep OverDrive disabled.
Powerupp checks for the pp table revision number and the only one implemented is "12", so our pp tables should be constructed the same, and from the information you have given the application also seems to (read and) change the expected parameters (even if the results aren't).
A few notes:
Just to confirm, did you remember to do a update-grub
after removing the amdgpu.ppfeaturemask
boot parameter (I tend to forget that...)?
The sudo cat /sys/kernel/debug/dri/0/amdgpu_pm_info
output is without GPU load, try to run a windowed benchmark and see if it changes when running the command simultaneously.
Have you checked the performance? If the card is dropping to 300 MHz there should be a noticeable performance drop when you change the value from 1780 to 1781.
And the thing is, when this happens, everything reports the frequency at 300MHz, except powerupp.
Powerupp does not read any actual clock speeds, it only reads the values that are set in the pp table (using upp). It is however on top of my to-do list to add some simple monitoring feature.
If I try to apply a value of 1785MHz, click "Apply Current", type my password, and then hit "Load Current", everything in powerupp stays the same, so it's not properly reading /sys/class/drm/card0/pp_od_clk_voltage.
That is expected, if you successfully apply values and then load them they should appear the same. Powerupp only reads the values set in the pp table.
This sucks, I was super pumped to find such an easy-to-use GUI, and I tried to look at the code since kdesu said my password was needed to run /usr/bin/bash, so I figured it was a bash script. But /usr/bin/powerupp isn't a bash script and I can't read it (I'm assuming that powerupp executes a second bash script but I can't find it).
It is not actually a bash script as in a file on the system, but it sends a couple of bash commands under the same pkexec (kdesu) prompt (to avoid having to type the password multiple times): including the upp commands containing the values entered to write to the pp table and also a write to the hwmon power limit (as the pp table power limit is oddly implemented). If you do a "persistent save" it will however create a bash script (containing basically the same things as when applying) in /usr/bin/powerupp_startup_script_cardX.sh
With
amdgpu.ppfeaturemask
disabled and withpowerupp
setting the core to 1780 or below, just like before, againradeonjet
reports:sudo radeonjet get core table 0: 300Mhz 1: 800Mhz * 2: 1780Mhz
This is actually expected when your GPU is idle. Are you sure you are actually putting your GPU under any load when you are checking these values? Try running this little monitoring script in a terminal, before running some game or 3D test, in a window (of full-screen on another monitor, case you have more than one):
cat > ~/monitorgpu.sh << EOF
#!/bin/bash
watch -n0.5 "sudo tail -n16 /sys/kernel/debug/dri/0/amdgpu_pm_info && \
echo SCLK: && \
cat /sys/class/drm/card0/device/pp_dpm_sclk && \
echo MCLK: && \
cat /sys/class/drm/card0/device/pp_dpm_mclk && \
echo Temps: && \
sensors amdgpu-pci-0c00"
EOF
chmod +x ~/monitorgpu.sh
and start it with cd && ./monitorgpu.sh
Now start some GPU load and check those values changing (and change they should, regardless if you have any of the over/under clock/volt applied). MCLK values should fluctuate even if you do simple tings on your desktop, like moving some window around for example...
It's not the load. All I have to do is run sudo sh -c "echo 'high' > /sys/class/drm/card0/device/power_dpm_force_performance_level" which forces it to run at the highest dpm state. I run that every time after I test anything to make sure that I'm running at the highest state. And I know
poweruppdoesn't read actual clock speeds, but it's not reading the pp table correctly when I try to set it to anything higher than 1780. And there's no difference between the 5700s and 5600s when it comes to this stuff, I just have the patch that the devs added to allow overclocking on Navi (they said it's supposed to be fixed for everyone on 5.5, but obviously it's not because you don't get all those values in
pp_od_clk_voltage`. The way mine looks is how it's supposed to look for everyone. I got it straight from the dev I was one of the people actually on the gitlab issue requesting overclocking on Navi. But anyway, the thing that doesn't make any sense is that I can overclock with radeon-profile, it's just powerupp that seems to do nothing. And yes, I remembered to update grub. I'll run that command you asked me to run here in a few and give you the output.
Have you tried forcing performance level without any overclocking applied (regardless if it's powerupp, or just upp or pp_od_clk_voltage)?
Provided you have no tweaks applied, does any of this affect your GPU clocks at all:
echo low | sudo tee /sys/class/drm/card0/device/power_dpm_force_performance_level
echo high | sudo tee /sys/class/drm/card0/device/power_dpm_force_performance_level
?
The above totally do work on my 5700 in any 5.5rcX or 5.5 final release, without any additional patches, and regardless of how I modify pp_table
or pp_od_clk_voltage
settings, so it definitely ain't a kernel "Navi 10" issue.
Do you have the latest radeon firmware binaries deployed?
Can you share contents of your 5600XT pp_table
?
Changing the performance level to high in /sys/class/drm/card0/device/power_dpm_force_performance_level
always forces the GPU to run at it's highest frequency state. If it's at stock, setting it to high forces it to run at 1780Mhz. That's how I was making sure powerupp wasn't working in the first place, When I would apply the powrupp config, and then ran sudo sh -c "echo 'high' > /sys/class/drm/card0/device/power_dpm_force_performance_level
, and then queried the current clock speed, it would be shown to be 300 by every program available (radeontop, radeonjet, radeon-profile, cat, etc).
Now we're getting somewhere: I pulled down the upp
repo and read the README to see how to use it and ran sudo ./upp.py dump
, and got this:
./gits/upp/upp.py dump
Dumping the PP table from '/sys/class/drm/card0/device/pp_table' binary...
StructureSize: 1674
TableFormatRevision: 12
RevisionId: 1
TableSize: 482
GoldenPPId: 2292
GoldenRevision: 15418
FormatId: 125
PlatformCaps: 8
ThermalControllerType: 27
SmallPowerLimit1: 0
SmallPowerLimit2: 0
BoostPowerLimit: 0
ODTurboPowerLimit: 0
ODPowerSavePowerLimit: 0
SoftwareShutdownTemp: 118
Reserved0:
Reserved0 0: 0
Reserved0 1: 0
Reserved0 2: 0
Reserved0 3: 0
Reserved0 4: 0
Reserved0 5: 0
PowerSavingClockTable:
ucTableRevision: 1
Reserve:
Byte 0: 0
Byte 1: 0
Byte 2: 0
PowerSavingClockCount: 10
PowerSavingClockMax:
Frequency 0: 1780
Frequency 1: 1267
Frequency 2: 1086
Frequency 3: 1267
Frequency 4: 1267
Frequency 5: 750
Frequency 6: 1267
Frequency 7: 1284
Frequency 8: 1284
Frequency 9: 810
Frequency 10: 0
Frequency 11: 0
Frequency 12: 0
Frequency 13: 0
Frequency 14: 0
Frequency 15: 0
PowerSavingClockMin:
Frequency 0: 300
Frequency 1: 100
Frequency 2: 100
Frequency 3: 100
Frequency 4: 507
Frequency 5: 100
Frequency 6: 507
Frequency 7: 308
Frequency 8: 300
Frequency 9: 300
Frequency 10: 0
Frequency 11: 0
Frequency 12: 0
Frequency 13: 0
Frequency 14: 0
Frequency 15: 0
OverDrive8Table:
ucODTableRevision: 128
Reserve:
Byte 0: 0
Byte 1: 0
Byte 2: 0
ODFeatureCount: 14
ODFeatureCapabilities:
Capability 0: 30
Capability 1: 0
Capability 2: 0
Capability 3: 0
Capability 4: 1
Capability 5: 1
Capability 6: 1
Capability 7: 1
Capability 8: 1
Capability 9: 1
Capability 10: 1
Capability 11: 1
Capability 12: 1
Capability 13: 1
Capability 14: 1
Capability 15: 1
Capability 16: 1
Capability 17: 1
Capability 18: 0
Capability 19: 0
Capability 20: 0
Capability 21: 0
Capability 22: 0
Capability 23: 0
Capability 24: 0
Capability 25: 0
Capability 26: 0
Capability 27: 0
Capability 28: 0
Capability 29: 0
Capability 30: 0
Capability 31: 0
ODSettingCount: 0
ODSettingsMax:
Setting 0: 1820
Setting 1: 1820
Setting 2: 1820
Setting 3: 1050
Setting 4: 1820
Setting 5: 1050
Setting 6: 1820
Setting 7: 1050
Setting 8: 930
Setting 9: 20
Setting 10: 3200
Setting 11: 3200
Setting 12: 100
Setting 13: 110
Setting 14: 2
Setting 15: 1
Setting 16: 1
Setting 17: 1
Setting 18: 1
Setting 19: 100
Setting 20: 100
Setting 21: 100
Setting 22: 100
Setting 23: 100
Setting 24: 100
Setting 25: 100
Setting 26: 100
Setting 27: 100
Setting 28: 100
Setting 29: 0
Setting 30: 0
Setting 31: 0
ODSettingsMin:
Setting 0: 800
Setting 1: 800
Setting 2: 800
Setting 3: 800
Setting 4: 800
Setting 5: 800
Setting 6: 800
Setting 7: 800
Setting 8: 625
Setting 9: 50
Setting 10: 700
Setting 11: 700
Setting 12: 25
Setting 13: 50
Setting 14: 0
Setting 15: 0
Setting 16: 0
Setting 17: 0
Setting 18: 0
Setting 19: 25
Setting 20: 20
Setting 21: 25
Setting 22: 20
Setting 23: 25
Setting 24: 20
Setting 25: 25
Setting 26: 20
Setting 27: 25
Setting 28: 20
Setting 29: 0
Setting 30: 0
Setting 31: 0
smcPPTable:
TableVersion: 8
FeaturesToRun:
Features 0: 2749345791
Features 1: 1571
SocketPowerLimitAc:
Wattage 0: 160
Wattage 1: 0
Wattage 2: 0
Wattage 3: 0
SocketPowerLimitAcTau:
Time 0: 0
Time 1: 0
Time 2: 0
Time 3: 0
SocketPowerLimitDc:
Wattage 0: 160
Wattage 1: 0
Wattage 2: 0
Wattage 3: 0
SocketPowerLimitDcTau:
Time 0: 0
Time 1: 0
Time 2: 0
Time 3: 0
TdcLimitSoc: 14
TdcLimitSocTau: 0
TdcLimitGfx: 150
TdcLimitGfxTau: 0
TedgeLimit: 100
ThotspotLimit: 110
TmemLimit: 105
Tvr_gfxLimit: 115
Tvr_mem0Limit: 115
Tvr_mem1Limit: 115
Tvr_socLimit: 115
Tliquid0Limit: 0
Tliquid1Limit: 0
TplxLimit: 0
FitLimit: 0
PpmPowerLimit: 0
PpmTemperatureThreshold: 0
ThrottlerControlMask: 28926
FwDStateMask: 1
UlvVoltageOffsetSoc: 100
UlvVoltageOffsetGfx: 100
GceaLinkMgrIdleThreshold: 0
paddingRlcUlvParams0: 0
paddingRlcUlvParams1: 0
paddingRlcUlvParams2: 0
UlvSmnclkDid: 0
UlvMp1clkDid: 0
UlvGfxclkBypass: 0
Padding234: 0
MinVoltageUlvGfx: 3100
MinVoltageUlvSoc: 3100
MinVoltageGfx: 3200
MinVoltageSoc: 3200
MaxVoltageGfx: 4200
MaxVoltageSoc: 4200
LoadLineResistanceGfx: 76
LoadLineResistanceSoc: 0
DpmDescriptor:
DpmDescriptor 0:
VoltageMode: 1
SnapToDiscrete: 0
NumDiscreteLevels: 2
padding: 0
ConversionToAvfsClk:
m: 0.0
b: 0.0
SsCurve:
a: 0.2542000114917755
b: -0.2162500023841858
c: 0.6957200169563293
DpmDescriptor 1:
VoltageMode: 1
SnapToDiscrete: 0
NumDiscreteLevels: 2
padding: 0
ConversionToAvfsClk:
m: 1.0
b: 0.0
SsCurve:
a: 0.21750999987125397
b: -0.05852000042796135
c: 0.714680016040802
DpmDescriptor 2:
VoltageMode: 1
SnapToDiscrete: 1
NumDiscreteLevels: 4
padding: 0
ConversionToAvfsClk:
m: 1.0
b: 0.0
SsCurve:
a: 0.21750999987125397
b: -0.05852000042796135
c: 0.714680016040802
DpmDescriptor 3:
VoltageMode: 1
SnapToDiscrete: 0
NumDiscreteLevels: 2
padding: 0
ConversionToAvfsClk:
m: 0.6442999839782715
b: 0.5349000096321106
SsCurve:
a: 0.0
b: 0.38510000705718994
c: 0.567799985408783
DpmDescriptor 4:
VoltageMode: 1
SnapToDiscrete: 0
NumDiscreteLevels: 2
padding: 0
ConversionToAvfsClk:
m: 0.5094000101089478
b: 0.5924999713897705
SsCurve:
a: 0.0
b: 0.33070001006126404
c: 0.5684999823570251
DpmDescriptor 5:
VoltageMode: 1
SnapToDiscrete: 0
NumDiscreteLevels: 2
padding: 0
ConversionToAvfsClk:
m: 1.25600004196167
b: -0.34380000829696655
SsCurve:
a: 0.0
b: 0.5343000292778015
c: 0.24529999494552612
DpmDescriptor 6:
VoltageMode: 1
SnapToDiscrete: 0
NumDiscreteLevels: 2
padding: 0
ConversionToAvfsClk:
m: 0.8216000199317932
b: 0.014600000344216824
SsCurve:
a: 0.0
b: 0.47760000824928284
c: 0.2526000142097473
DpmDescriptor 7:
VoltageMode: 2
SnapToDiscrete: 0
NumDiscreteLevels: 2
padding: 0
ConversionToAvfsClk:
m: 0.0
b: 0.0
SsCurve:
a: 0.0
b: 0.0
c: 0.0
DpmDescriptor 8:
VoltageMode: 2
SnapToDiscrete: 0
NumDiscreteLevels: 2
padding: 0
ConversionToAvfsClk:
m: 0.0
b: 0.0
SsCurve:
a: 0.0
b: 0.0
c: 0.0
FreqTableGfx:
Frequency 0: 300
Frequency 1: 1780
Frequency 2: 1400
Frequency 3: 1400
Frequency 4: 1400
Frequency 5: 1400
Frequency 6: 1400
Frequency 7: 1400
Frequency 8: 1400
Frequency 9: 1400
Frequency 10: 1400
Frequency 11: 1400
Frequency 12: 1400
Frequency 13: 1400
Frequency 14: 1400
Frequency 15: 1400
FreqTableVclk:
Frequency 0: 100
Frequency 1: 1267
Frequency 2: 1267
Frequency 3: 1267
Frequency 4: 1267
Frequency 5: 1267
Frequency 6: 1267
Frequency 7: 1267
FreqTableDclk:
Frequency 0: 100
Frequency 1: 1086
Frequency 2: 1086
Frequency 3: 1086
Frequency 4: 1086
Frequency 5: 1086
Frequency 6: 1086
Frequency 7: 1086
FreqTableSocclk:
Frequency 0: 507
Frequency 1: 1267
Frequency 2: 950
Frequency 3: 950
Frequency 4: 950
Frequency 5: 950
Frequency 6: 950
Frequency 7: 950
FreqTableUclk:
Frequency 0: 100
Frequency 1: 500
Frequency 2: 625
Frequency 3: 900
FreqTableDcefclk:
Frequency 0: 507
Frequency 1: 1267
Frequency 2: 1267
Frequency 3: 1267
Frequency 4: 1267
Frequency 5: 1267
Frequency 6: 1267
Frequency 7: 1267
FreqTableDispclk:
Frequency 0: 308
Frequency 1: 1284
Frequency 2: 1284
Frequency 3: 1284
Frequency 4: 1284
Frequency 5: 1284
Frequency 6: 1284
Frequency 7: 1284
FreqTablePixclk:
Frequency 0: 300
Frequency 1: 1284
Frequency 2: 1188
Frequency 3: 1188
Frequency 4: 1188
Frequency 5: 1188
Frequency 6: 1188
Frequency 7: 1188
FreqTablePhyclk:
Frequency 0: 300
Frequency 1: 810
Frequency 2: 810
Frequency 3: 810
Frequency 4: 810
Frequency 5: 810
Frequency 6: 810
Frequency 7: 810
Paddingclks:
Padding32 0: 30409168
Padding32 1: 30409168
Padding32 2: 30409168
Padding32 3: 30409168
Padding32 4: 30409168
Padding32 5: 30409168
Padding32 6: 30409168
Padding32 7: 30409168
Padding32 8: 30409168
Padding32 9: 30409168
Padding32 10: 30409168
Padding32 11: 30409168
Padding32 12: 30409168
Padding32 13: 30409168
Padding32 14: 30409168
Padding32 15: 30409168
DcModeMaxFreq:
Frequency 0: 1780
Frequency 1: 1267
Frequency 2: 875
Frequency 3: 1086
Frequency 4: 1267
Frequency 5: 1267
Frequency 6: 1284
Frequency 7: 1284
Frequency 8: 810
Padding8_Clks: 464
FreqTableUclkDiv:
Byte 0: 0
Byte 1: 3
Byte 2: 3
Byte 3: 3
Mp0clkFreq:
Frequency 0: 304
Frequency 1: 507
Mp0DpmVoltage:
Voltage 0: 3200
Voltage 1: 3200
MemVddciVoltage:
Voltage 0: 2700
Voltage 1: 3400
Voltage 2: 3400
Voltage 3: 3400
MemMvddVoltage:
Voltage 0: 5000
Voltage 1: 5400
Voltage 2: 5400
Voltage 3: 5400
GfxclkFgfxoffEntry: 800
GfxclkFinit: 800
GfxclkFidle: 800
GfxclkSlewRate: 0
GfxclkFopt: 0
Padding567:
Byte 0: 208
Byte 1: 1
GfxclkDsMaxFreq: 0
GfxclkSource: 1
Padding456: 2
LowestUclkReservedForUlv: 0
Padding8_Uclk:
Byte 0: 0
Byte 1: 91
Byte 2: 0
MemoryType: 0
MemoryChannels: 12
PaddingMem:
Byte 0: 0
Byte 1: 0
PcieGenSpeed:
Speed 0: 0
Speed 1: 3
PcieLaneCount:
Count 0: 6
Count 1: 6
LclkFreq:
Frequency 0: 81
Frequency 1: 619
EnableTdpm: 0
TdpmHighHystTemperature: 0
TdpmLowHystTemperature: 0
GfxclkFreqHighTempLimit: 0
FanStopTemp: 50
FanStartTemp: 60
FanGainEdge: 400
FanGainHotspot: 100
FanGainLiquid0: 400
FanGainLiquid1: 400
FanGainVrGfx: 400
FanGainVrSoc: 400
FanGainVrMem0: 400
FanGainVrMem1: 400
FanGainPlx: 400
FanGainMem: 400
FanPwmMin: 15
FanAcousticLimitRpm: 1000
FanThrottlingRpm: 2900
FanMaximumRpm: 3200
FanTargetTemperature: 81
FanTargetGfxclk: 800
FanTempInputSelect: 1
FanPadding: 0
FanZeroRpmEnable: 1
FanTachEdgePerRev: 2
FuzzyFan_ErrorSetDelta: 0
FuzzyFan_ErrorRateSetDelta: 0
FuzzyFan_PwmSetDelta: 0
FuzzyFan_Reserved: 0
OverrideAvfsGb:
Byte 0: 0
Byte 1: 0
Padding8_Avfs:
Byte 0: 0
Byte 1: 0
qAvfsGb:
qAvfsGb 0:
a: 0.017810000106692314
b: -0.047279998660087585
c: 0.054019998759031296
qAvfsGb 1:
a: 0.0
b: 0.0
c: 0.029999999329447746
dBtcGbGfxPll:
a: 0.0
b: 0.0
c: 0.0
dBtcGbGfxDfll:
a: 0.09754999727010727
b: 0.04839000105857849
c: -0.07373999804258347
dBtcGbSoc:
a: 0.0023399998899549246
b: -0.0023900000378489494
c: 0.09239000082015991
qAgingGb:
qAgingGb 0:
m: 0.0
b: 0.0
qAgingGb 1:
m: 0.0
b: 0.0
qStaticVoltageOffset:
qStaticVoltageOffset 0:
a: 0.0
b: 0.0
c: 0.0
qStaticVoltageOffset 1:
a: 0.0
b: 0.0
c: 0.0
DcTol:
Voltage 0: 160
Voltage 1: 160
DcBtcEnabled:
Byte 0: 1
Byte 1: 1
Padding8_GfxBtc:
Byte 0: 0
Byte 1: 0
DcBtcMin:
Voltage 0: 0
Voltage 1: 0
DcBtcMax:
Voltage 0: 160
Voltage 1: 160
DebugOverrides: 512
ReservedEquation0:
a: 0.0
b: 0.0
c: 0.0
ReservedEquation1:
a: 0.0
b: 0.0
c: 0.0
ReservedEquation2:
a: 0.0
b: 0.0
c: 0.0
ReservedEquation3:
a: 0.0
b: 0.0
c: 0.0
TotalPowerConfig: 1
TotalPowerSpare1: 0
TotalPowerSpare2: 0
PccThresholdLow: 0
PccThresholdHigh: 0
PaddingAPCC:
Padding32 0: 0
Padding32 1: 0
Padding32 2: 0
Padding32 3: 0
Padding32 4: 0
Padding32 5: 0
VDDGFX_TVmin: 0
VDDSOC_TVmin: 0
VDDGFX_Vmin_HiTemp: 0
VDDGFX_Vmin_LoTemp: 0
VDDSOC_Vmin_HiTemp: 0
VDDSOC_Vmin_LoTemp: 0
VDDGFX_TVminHystersis: 0
VDDSOC_TVminHystersis: 0
BtcConfig: 0
SsFmin:
Frequency 0: 425
Frequency 1: 135
Frequency 2: 135
Frequency 3: 0
Frequency 4: 0
Frequency 5: 0
Frequency 6: 0
Frequency 7: 0
Frequency 8: 0
Frequency 9: 0
DcBtcGb:
Voltage 0: 25
Voltage 1: 25
Reserved:
Padding32 0: 1130
Padding32 1: 1465
Padding32 2: 1560
Padding32 3: 0
Padding32 4: 0
Padding32 5: 0
Padding32 6: 0
Padding32 7: 0
I2cControllers:
I2cControllers 0:
Enabled: 0
Speed: 0
Padding0: 0
Padding1: 0
SlaveAddress: 0
ControllerPort: 0
ControllerName: 0
ThermalThrottler: 0
I2cProtocol: 0
I2cControllers 1:
Enabled: 0
Speed: 0
Padding0: 0
Padding1: 0
SlaveAddress: 0
ControllerPort: 0
ControllerName: 0
ThermalThrottler: 0
I2cProtocol: 0
I2cControllers 2:
Enabled: 0
Speed: 0
Padding0: 0
Padding1: 0
SlaveAddress: 0
ControllerPort: 0
ControllerName: 0
ThermalThrottler: 0
I2cProtocol: 0
I2cControllers 3:
Enabled: 0
Speed: 0
Padding0: 0
Padding1: 0
SlaveAddress: 0
ControllerPort: 0
ControllerName: 0
ThermalThrottler: 0
I2cProtocol: 0
I2cControllers 4:
Enabled: 0
Speed: 0
Padding0: 0
Padding1: 0
SlaveAddress: 0
ControllerPort: 0
ControllerName: 0
ThermalThrottler: 0
I2cProtocol: 0
I2cControllers 5:
Enabled: 0
Speed: 0
Padding0: 0
Padding1: 0
SlaveAddress: 0
ControllerPort: 0
ControllerName: 0
ThermalThrottler: 0
I2cProtocol: 0
I2cControllers 6:
Enabled: 0
Speed: 0
Padding0: 0
Padding1: 0
SlaveAddress: 0
ControllerPort: 0
ControllerName: 0
ThermalThrottler: 0
I2cProtocol: 0
I2cControllers 7:
Enabled: 0
Speed: 0
Padding0: 0
Padding1: 0
SlaveAddress: 0
ControllerPort: 0
ControllerName: 0
ThermalThrottler: 0
I2cProtocol: 0
MaxVoltageStepGfx: 0
MaxVoltageStepSoc: 0
VddGfxVrMapping: 0
VddSocVrMapping: 0
VddMem0VrMapping: 0
VddMem1VrMapping: 0
GfxUlvPhaseSheddingMask: 0
SocUlvPhaseSheddingMask: 0
ExternalSensorPresent: 0
Padding8_V: 0
GfxMaxCurrent: 0
GfxOffset: 0
Padding_TelemetryGfx: 0
SocMaxCurrent: 0
SocOffset: 0
Padding_TelemetrySoc: 0
Mem0MaxCurrent: 0
Mem0Offset: 0
Padding_TelemetryMem0: 0
Mem1MaxCurrent: 0
Mem1Offset: 0
Padding_TelemetryMem1: 0
AcDcGpio: 0
AcDcPolarity: 0
VR0HotGpio: 0
VR0HotPolarity: 0
VR1HotGpio: 0
VR1HotPolarity: 0
GthrGpio: 0
GthrPolarity: 0
LedPin0: 0
LedPin1: 0
LedPin2: 0
padding8_4: 0
PllGfxclkSpreadEnabled: 0
PllGfxclkSpreadPercent: 0
PllGfxclkSpreadFreq: 0
DfllGfxclkSpreadEnabled: 0
DfllGfxclkSpreadPercent: 0
DfllGfxclkSpreadFreq: 0
UclkSpreadEnabled: 0
UclkSpreadPercent: 0
UclkSpreadFreq: 0
SoclkSpreadEnabled: 0
SocclkSpreadPercent: 0
SocclkSpreadFreq: 0
TotalBoardPower: 0
BoardPadding: 0
MvddRatio: 0
BoardReserved:
Padding32 0: 0
Padding32 1: 0
Padding32 2: 0
Padding32 3: 0
Padding32 4: 0
Padding32 5: 0
Padding32 6: 0
Padding32 7: 0
Padding32 8: 0
MmHubPadding:
Padding32 0: 0
Padding32 1: 0
Padding32 2: 0
Padding32 3: 0
Padding32 4: 0
Padding32 5: 0
Padding32 6: 0
Padding32 7: 0
TableContentRevision: 0
You'll notice up there, that the frequency table starts with state 0 at 300Mhz, then goes to state 1 at 1780, and down from there. This is everything default, no amdgpu.ppfeaturemask
, nothing. So, since 1780 is supposed to be the max (I don't even know why 300 is at state 0), I ran sudo ./upp.py set /smcPPTable/FreqTableGfx/1=1790 --write
. Which immediately forced the clocks to 300MHz just like powerupp
was doing. So therein must lie the issue. And yes, those are true clocks. Running ./upp.py dump
after running the above command gives the same result as before, only it says 1790 for state 1 instead of 1780. So all should be good, right? You would think so, since the 1780 frequency state 1 is what it runs at normally under load. But nope. radeonjet get core table
returns:
0: 300Mhz
1: 300Mhz
2: 300Mhz
as does `radeon-profile. So, I took your advice and ran something to stress the gpu to make sure they weren't just inaccurately reported for some reason. So I ran unigine heaven, and unless you think 8 or 9 fps on an RX 5600 XT sounds right, then no, they're definitely being reported correctly and the clocks actually DO get set to 300Mhz. I reapplied the default stock card settings and ran unigine heaven again, at the reported 1780MHz, and sure enough, I was back up to 78-79 fps average. So this isn't a reporting error, running powerupp (and upp as well, which would make sense) and making it anything above the stock clock frequency forces the clock to run at 300MHz.
So, naturally my next thought was to try to change the 300MHz state 0, right? Bad idea. Running sudo ./upp.py set /smcPPTable/FreqTableGfx/0=1790 --write
threw a bunch of errors:
sudo ./upp.py set /smcPPTable/FreqTableGfx/0=1790 --write
Changing smcPPTable.FreqTableGfx.0 from 300 to 1790 at 0x32e
Commiting changes to '/sys/class/drm/card0/device/pp_table'.
Traceback (most recent call last):
File "./upp.py", line 183, in <module>
cli(obj={})()
File "/usr/lib/python3.8/site-packages/click/core.py", line 764, in __call__
return self.main(*args, **kwargs)
File "/usr/lib/python3.8/site-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/usr/lib/python3.8/site-packages/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/lib/python3.8/site-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/lib/python3.8/site-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/usr/lib/python3.8/site-packages/click/decorators.py", line 17, in new_func
return f(get_current_context(), *args, **kwargs)
File "./upp.py", line 170, in set
decode._write_pp_tables_file(input_file, decode.pp_tbl_bytes)
File "/home/matt/gits/upp/decode.py", line 30, in _write_pp_tables_file
f.close()
OSError: [Errno 62] Timer expired
At which point the entire pp_table
file had its contents erased, or at least that's how it acted. The system almost froze, and radeon-profile
reports NO clock speeds at all (everything is just blank), and radeonjet get core table
returns nothing. At all.
So obviously that 0 state can't be modified, or something. Now again, all of this is WITHOUT amdgpu.ppfeaturemask
enabled. With amdgpu.ppfeaturemask
enabled, powerupp
still doesn't work, I haven't yet tried upp
but surely it will be the same thing.
This is actually expected when your GPU is idle. Are you sure you are actually putting your GPU under any load when you are checking these values? Try running this little monitoring script in a terminal, before running some game or 3D test, in a window (of full-screen on another monitor, case you have more than one):
Regarding that comment, I'm not sure what you're referring to. I wasn't trying to illustrate that the card was running at 800MHz in the quote you were replying to there, I was pointing out that if you set the core clock to anything 1780MHz and BELOW, then it worked fine, and radeonjet
reports the table as it should, as in the example you'll see that there are 3 states, 300MHz, 800MHz, and 1780MHz. But trying to set it to anything above 1780 in powerupp
sets all three of those states to 300MHz. So I'm not sure what the point was of that quote, but either way it doesn't matter now because we know that no, it's not anything to do with the gpu not being under load, and the clocks are in fact all forced to 300MHz.
Not really sure where to go from here, I mean maybe it's something to do with the new kernel patch that they added (which is why my pp_od_clk_voltage
shows more info than yours, that's not because it's a 5600 vs 5700, other people with 5700s with that patch in effect show the exact same table as me). I just installed linux-mainline
which is a vanilla 5.5-1 kernel which should be the same one you're using (or effectively the same), and see what happens. But as of right now, it seems with the new kernel patch upp
and powerupp
are going to be broken, and that patch I believe is going to be mainlined, I just don't think it made it in time for 5.5-1
UPDATE: This is a crazy coincidence, but I was commenting in the comments section on the Phoronix article about the 5600 XT and the new firmware, and I was asking about something to do with memory clocks completely unrelated to overclocking, and someone who is apparently an engineer replied "If you go outside the firmware's limits the clock defaults to 300 MHz. That matches the performance Michael was seeing."
So it sounds like for some reason the way that upp tries to edit pp_table
to change the clocks violates the firmware settings and forces the clocks to 300MHz, whereas if you use pp_od_clk_voltage
like radeon-profile
, you can in fact go up to the 1820MHz firmware limit (but pp_od_clk_voltage
requires amdgpu.ppfeaturemask
set)
I ran sudo ./upp.py set /smcPPTable/FreqTableGfx/1=1790 --write. Which immediately forced the clocks to 300MHz just like powerupp was doing. So therein must lie the issue.
OK, so setting anything larger than 1780 is making the amdgpu driver power-management go nuts for you? Have you tried to see if there is anything significant in the kernel log (dmesg
) after you do that? How about setting the same thing to something lower than 1780, does that break the driver?
But nope. radeonjet get core table returns:
Sorry, I have no clue what is radeonjet, but I guess that should match the output of cat /sys/class/drm/card0/device/pp_dpm_sclk
? If so, it totally seems that the driver goes nuts...
So, naturally my next thought was to try to change the 300MHz state 0, right? Bad idea.
Indeed :) Lowest state clocks for both GPU & VRAM are not meant to be changed at all. Hence the very unpredictable driver behaviour or just hang.
Not really sure where to go from here
It looks to me that your card firmware is blocking your max FreqTableGfx/1 clock. Try setting lower clock to confirm if the pp_table interface works at all in the first place (you may also try changing all instances of 1780 to 1800 for example, just for the lulz). Then make sure you have the latest VBIOS as well as the latest firmware, check https://www.phoronix.com/scan.php?page=news_item&px=Ubuntu-19.10-Radeon-RX-5700. As I see no report of an issue similar to yours on any 5700 cards, my gut feeling is telling mi this is totally about 5600XT firmware / VBIOS. Are you running factory-VBIOS (one with AMD-pre-anounced lower clocks) or the one after the card was released?
btw, can you please share your pp_table, in its raw form?
https://people.freedesktop.org/~agd5f/radeon_ucode/navi10/new/navi10_smc.bin
oh, wait, you are there already...
https://people.freedesktop.org/~agd5f/radeon_ucode/navi10/new/navi10_smc.bin
That's literally the link that I posted above, it's the same comments section. I already have that firmware, I got it from the devs days ago. Also, there IS no "after the card was released" vBIOS for the Sapphire Pulse, Sapphire actually flashed the new vBIOS on all of their 5600 XTs in North America before launch, which is why my stock frequency is 1780Mhz instead of 1650 or 1675 or whatever, which was the original one. But, I do have a copy of the original vBIOS but that would be useless because the new firmware is for the new vBIOS, and the old vBIOS has lower limits than the new one.
Also yes, if you read my original comments, like I said if you set it to anything under 1780 in upp or powerupp it does in fact work. Going over 1780, though, does not. And from the engineer in the Phoronix forum's comments, it sounds like it's something to do with the way upp tries to change clock speeds which violates the firmware's settings, as opposed to radeon-profile
which uses pp_od_clk_voltage
and doesn't violate those settings, which would explain why radeon-profile
works and upp doesn't.
Well, I guess you have your answer there, it's a firmware limitation. I agree that it's odd not being able to increase the clock to 1820 MHz with pp table though. Anyway this is not an issue with neither powerupp nor upp, they are doing what they are supposed to (i.e. reading and adjusting the pp table) afaict, but I find it interesting and would like to know more so I'll keep the issue open for a while if there's more information to be had.
If I raise the memory clock but keep the core at 1780 (or below), it actually applies correctly. And the thing is, when this happens, everything reports the frequency at 300MHz, except powerupp.
Does this mean that if you keep the Gfx clock at 1780 you can increase the memory clock without anything breaking or does the Gfx clock drop to 300 if you increase the memory clock? What about Gfx voltage, can that be increased (if you keep the clock at 1780)?
I noticed that you experienced similar issues earlier in Manjaro. Was this without any overclocking applied and did you solve that?
it sounds like it's something to do with the way upp tries to change clock speeds which violates the firmware's settings, as opposed to radeon-profile which uses pp_od_clk_voltage and doesn't violate those settings, which would explain why radeon-profile works and upp doesn't.
What upp does is simply changing a value in pp_table, and it does its job correctly, as you had already demonstrated. It is amdgpu driver logic that processes the Power Play changes when tables are changed, and re-applies all the clock/voltage parameters from scratch (basically, modifying pp_table would cause driver power management to be completely re-initialized). I guess this re-init would fail with "unexpected" setting in Power Play. On the other hand, the sysfs API clock change does not re-init everything, it just trigger clock change in the driver logic, which is likely the reason of success with setting clock above 1780 pp_od_clk_voltage
. The way I see all of this is a firmware quirk very specific to 5600XT, it was never an issue with powerupp or upp in the first place.
Since you have both old & new Sapphire Pulse 5600XT vBIOS files, can you please share? I totally need them for comparing Power Play tables and double-checking if upp works as expected on both.
Does this mean that if you keep the Gfx clock at 1780 you can increase the memory clock without anything breaking or does the Gfx clock drop to 300 if you increase the memory clock? What about Gfx voltage, can that be increased (if you keep the clock at 1780)?
Yes. Memory overclocking worked. I don't know about the voltages, because Navi doesn't have a voltage for each state, only a voltage curve, and I don't feel comfortable in my knowledge of the 5600 XT safe voltages to test out raising voltage limits. I've tried lowering them, and that works.
I noticed that you experienced similar issues earlier in Manjaro. Was this without any overclocking applied and did you solve that?
No, but I never tested the new firmware or anything like that. After that very initial testing, I just went back to Arch since the card was working fine there, and I haven't used Manjaro since, I've just been using Arch. I'll try it out later today though and use the new firmware and see if anything is fixed. I imagine it was the firmware issue though.
What upp does is simply changing a value in pp_table, and it does its job correctly, as you had already demonstrated.
That's what I'm saying. It looks like editing pp_table is the issue, as apparently that causes the firmware to freak out. It does seem like this is something due to the firmware, like I said, BUT I wouldn't say it's an "issue" with the card OR with upp/powerupp, it sounds like upp just isn't compatibile with this card. But anyway, I'll upload the new and original versions of the performance vBIOS if you want:
Details on which is which are in the README in the zip vbios.zip
Ok, so possibly the only firmware limit is the maximum target frequency. It could be possible to do some workarounds only for 5600 XT in powerupp by setting the target frequency using OverDrive instead of the pp table, but, will consider it... Here is something you can try:
First enable OverDrive (amdgpu.ppfeaturemask=0xffffffff
) and reboot.
In terminal (with proper path to upp, and yes it's supposed to be 1830, or anything above 1820 at least):
upp.py set --write OverDrive8Table/ODSettingsMax/0=1830
sudo sh -c "echo 's 1 1830' > /sys/class/drm/card0/device/pp_od_clk_voltage"
cat /sys/class/drm/card0/device/pp_od_clk_voltage
Another thing to note is that the target frequency and the actual working frequency of the GPU are not (always) the same, meaning that in order to actually get the card running at clocks higher than 1820 you would probably have to increase the voltage (for example I can only run at a maximum of 30 MHz below the target frequency and 80 below the OverDrive max with stock settings on my 5700 XT, so a bit surprising that you can run at 1820 without increasing the voltage on your 5600 XT). But I think it would break to 300 MHz when increasing the OverDrive limit if it affects some firmware limit. In case it works (cat pp_od_clk_voltage
shows 1830 MHz) can you also paste the output of:
glxinfo -B | egrep 'Device|OpenGL renderer'
Doing the above doesn't cause an error or anything, but it has no actual effect other than changing the max clock in OD_RANGE:
for SCLK
to 1830 instead of 1820 in pp_od_clk_voltage
as well as changing state 1 for OD_SCLK
from 1780 to 1830. But the card is still running at 1780 according to radeon-profile
and radeonjet
. Well, sudo sh -c "echo 's 1 1830' > /sys/class/drm/card0/device/pp_od_clk_voltage"
fails with a Permission denied
error, but that's because for some reason with some cards you can't write to the symlinked /sys/class/drm/card0/device/pp_od_clk_voltage
, but you can instead run sudo sh -c "echo 's 1 1830' > /sys/devices/pci0000:00/0000:00:03.1/0000:07:00.0/0000:08:00.0/0000:09:00.0/pp_od_clk_voltage"
which is the true location (on my MOBO, others may have different numbers I guess). So I ran that command, got no error. I then ran sudo cat /sys/class/drm/card0/device/pp_od_clk_voltage
and got this:
sudo cat /sys/class/drm/card0/device/pp_od_clk_voltage
OD_SCLK:
0: 800Mhz
1: 1830Mhz
OD_MCLK:
1: 900MHz
OD_VDDC_CURVE:
0: 800MHz @ 0mV
1: 1290MHz @ 0mV
2: 1780MHz @ 0mV
OD_RANGE:
SCLK: 800Mhz 1830Mhz
MCLK: 625Mhz 930Mhz
VDDC_CURVE_SCLK[0]: 800Mhz 1820Mhz
VDDC_CURVE_VOLT[0]: 800mV 1050mV
VDDC_CURVE_SCLK[1]: 800Mhz 1820Mhz
VDDC_CURVE_VOLT[1]: 800mV 1050mV
VDDC_CURVE_SCLK[2]: 800Mhz 1820Mhz
VDDC_CURVE_VOLT[2]: 800mV 1050mV
So the OD_SCLK
and OD_RANGE
values get changed, but the OD_VDDC_CURVE
stays at the stock settings, and I'm not sure how to adjust that with a sudo sh -c "echo '.........' > /sys/class/blahblahblah"
. I know you can use 's <state number> <value>'
and 'm <state number> <value>
' for OD_SCLK
and OD_MCLK
, but that's all I know, as Polaris was completely different. Anyway, after seeing the 1830 properly set, I didn't notice that OD_VDDC_CURVE
was still at 1780 so I thought it worked and set the clock speed to 1830. But then I checked and no, it's still at 1780 according to radeon-profile
, radeontop
, and radeonjet
. So then I tried to use upp
or powerupp
to change the clock to something above 1780, not knowing if it would maybe have been fixed, but no, it still did the same exact thing and set the clocks to 300MHz.
As requested:
glxinfo -B | egrep 'Device|OpenGL renderer'
Device: AMD Radeon RX 5600 XT (NAVI10, DRM 3.36.0, 5.5.0-3-tkg-pds, LLVM 9.0.1) (0x731f)
OpenGL renderer string: AMD Radeon RX 5600 XT (NAVI10, DRM 3.36.0, 5.5.0-3-tkg-pds, LLVM 9.0.1)
And I would like to say I really appreciate you guys helping try to get this work, even if powerupp and upp seem to be incompatible with the 5600 XT. Maybe we can get it working but even if not, it's very much appreciated.
You can try increasing the OverDrive8Table/ODSettingsMax/2, 4 and 6
(I guess those are the limits for the VDDC_CURVE, not sure what 1 is). You can probably do the OverDrive overclocking in radeon-profile instead of terminal (perhaps restart the program after changing the values).
And if not, in terminal I believe it is "vc 2 1830 1050"
(2 for point 2) to set the OD_VDDC_CURVE.
But, have you not been able to get the card running at 1820 MHz before (regardless of method)? I was under that impression but maybe I made that up myself. If not, I would guess that it needs more voltage to run higher than 1780 MHz.
Just FYI, both 5600XT VBIOSes pp_tables are fully decode-able and modifiable by upp, no issue there. Maybe @gardotd426 may find this diff between Power Play settings between old and new VBIOS interesting:
diff -u4 Sapphire.RX5600XT.6144.191209.rom.pp_table.dump Sapphire.RX5600XT.411EFMIU.X4E.pp_table.dump
--- Sapphire.RX5600XT.6144.191209.rom.pp_table.dump 2020-02-04 00:12:47.265947327 +0900
+++ Sapphire.RX5600XT.411EFMIU.X4E.pp_table.dump 2020-02-04 00:13:18.046248608 +0900
@@ -1,11 +1,11 @@
-Dumping the PP table from '../Sapphire.RX5600XT.6144.191209.rom.pp_table' binary...
+Dumping the PP table from '../Sapphire.RX5600XT.411EFMIU.X4E.pp_table' binary...
StructureSize: 1674
TableFormatRevision: 12
RevisionId: 1
TableSize: 482
GoldenPPId: 2292
-GoldenRevision: 15288
+GoldenRevision: 15418
FormatId: 125
PlatformCaps: 8
ThermalControllerType: 27
SmallPowerLimit1: 0
@@ -28,9 +28,9 @@
Byte 1: 0
Byte 2: 0
PowerSavingClockCount: 10
PowerSavingClockMax:
- Frequency 0: 1650
+ Frequency 0: 1780
Frequency 1: 1267
Frequency 2: 1086
Frequency 3: 1267
Frequency 4: 1267
@@ -103,15 +103,15 @@
Capability 30: 0
Capability 31: 0
ODSettingCount: 0
ODSettingsMax:
- Setting 0: 1725
- Setting 1: 1725
- Setting 2: 1725
+ Setting 0: 1820
+ Setting 1: 1820
+ Setting 2: 1820
Setting 3: 1050
- Setting 4: 1725
+ Setting 4: 1820
Setting 5: 1050
- Setting 6: 1725
+ Setting 6: 1820
Setting 7: 1050
Setting 8: 930
Setting 9: 20
Setting 10: 3200
@@ -174,9 +174,9 @@
FeaturesToRun:
Features 0: 2749345791
Features 1: 1571
SocketPowerLimitAc:
- Wattage 0: 150
+ Wattage 0: 160
Wattage 1: 0
Wattage 2: 0
Wattage 3: 0
SocketPowerLimitAcTau:
@@ -184,9 +184,9 @@
Time 1: 0
Time 2: 0
Time 3: 0
SocketPowerLimitDc:
- Wattage 0: 150
+ Wattage 0: 160
Wattage 1: 0
Wattage 2: 0
Wattage 3: 0
SocketPowerLimitDcTau:
@@ -195,9 +195,9 @@
Time 2: 0
Time 3: 0
TdcLimitSoc: 14
TdcLimitSocTau: 0
- TdcLimitGfx: 141
+ TdcLimitGfx: 150
TdcLimitGfxTau: 0
TedgeLimit: 100
ThotspotLimit: 110
TmemLimit: 105
@@ -341,9 +341,9 @@
b: 0.0
c: 0.0
FreqTableGfx:
Frequency 0: 300
- Frequency 1: 1650
+ Frequency 1: 1780
Frequency 2: 1400
Frequency 3: 1400
Frequency 4: 1400
Frequency 5: 1400
@@ -387,9 +387,9 @@
FreqTableUclk:
Frequency 0: 100
Frequency 1: 500
Frequency 2: 625
- Frequency 3: 750
+ Frequency 3: 875
FreqTableDcefclk:
Frequency 0: 507
Frequency 1: 1267
Frequency 2: 1267
@@ -442,11 +442,11 @@
Padding32 13: 30409168
Padding32 14: 30409168
Padding32 15: 30409168
DcModeMaxFreq:
- Frequency 0: 1650
+ Frequency 0: 1780
Frequency 1: 1267
- Frequency 2: 750
+ Frequency 2: 875
Frequency 3: 1086
Frequency 4: 1267
Frequency 5: 1267
Frequency 6: 1284
@@ -510,9 +510,9 @@
GfxclkFreqHighTempLimit: 0
FanStopTemp: 50
FanStartTemp: 60
FanGainEdge: 400
- FanGainHotspot: 400
+ FanGainHotspot: 100
FanGainLiquid0: 400
FanGainLiquid1: 400
FanGainVrGfx: 400
FanGainVrSoc: 400
@@ -520,12 +520,12 @@
FanGainVrMem1: 400
FanGainPlx: 400
FanGainMem: 400
FanPwmMin: 15
- FanAcousticLimitRpm: 1250
+ FanAcousticLimitRpm: 1000
FanThrottlingRpm: 2900
FanMaximumRpm: 3200
- FanTargetTemperature: 83
+ FanTargetTemperature: 81
FanTargetGfxclk: 800
FanTempInputSelect: 1
FanPadding: 0
FanZeroRpmEnable: 1
Note the PowerSavingClockMax/0
change from 1650 to 1780. Could this be the thing that is limiting you card? I'd suggest trying adjusting both PowerSavingClockMax/0
& FreqTableGfx/1
and see if it has any consequence (make sure changes are simultaneous)... That new VBIOS also has DcModeMaxFreq/0
set to 1780, I'd try changing that one as well...
There is another funny thing about FreqTableUclk/3
, the new VBIOS default is 875 but the value in @gardotd426 's dump is 900.
@azeam, yes I've been able to get the overclocking working with other methods including corectrl
and radeon-profile
. Also, I found out the reason our /sys/class/drm/card0/pp_od_clk_voltage
files look different despite Navi overclocking apparently being in the kernel now, is that the patches that I mentioned from the gitlab issue page were not upstreamed yet, I just now found out today from the dev. He also posted another patch to get the correct voltages (I'm assuming he's talking about where it says @0mV, I'm applying the patch to a kernel I'm building right now and will let you know).
@sibradzic, setting FreqTableGfx/1 forces the card to 300MHz again,
PowerSavingClockMaxdoesn't though. I don't know what you mean by setting them simultaneously, can you give
upp` two commands in one string? If so, I haven't tried that, only running them one after another.
Also, the vBIOS memory dump thing might be because the new vBIOS that I'm currently used I had a default memory overclock to 900MHz set by default in radeon-profile
. I unchecked "Restore Selected Overclock Profile on Start" and rebooted, and it defaults to 875 peak frequency state.
Funnily enough, it seems that OD_SCLK
has no actual bearing on clock speeds. When I opened up corectrl
and made the peak clock frequency 1820 and applied it, the frequency went up to 1820, in radeon-profile
, but OD_SCLK
state 1 in /sys/class/drm/card0/pp_od_clk_voltage
was still listed as 1780. I have no idea what that's about, other than obviously the Navi OD implementation is in it's infancy and hasn't matured in Linux the way Polaris's implementation had.
But yeah, if it's possible to run two commands at once with upp
and that's what you meant, let me know how and I'll try it. But running them one after another, PowerSavingClockMax/0=1820
does nothing and then FreqTableGfx/1=1820
breaks it.
(One line)
upp.py set --write PowerSavingClockTable/PowerSavingClockMax/0=1820 smcPPTable/FreqTableGfx/1=1820 smcPPTable/DcModeMaxFreq/0=1820
Also make sure there are no profiles auto-loaded with radeon-profile or CoreCtrl when testing upp. I don't think that is the cause here but I've noticed some weird things, even without patches and with OverDrive disabled, so safer with them off for trouble-shooting purposes. For example, with OverDrive disabled on my system I can set the CoreCtrl clock slider to 300 MHz and this will cause the card to lock to "manual" dpm performance level and it will stay at state 0 (300 MHz)/manual even if I try to overwrite the perfomance level manually or change the pp table.
As for how the OverDrive overclocking works I'm not very familiar with that, but did you try setting vc point freq volt
in pp_od_clk_voltage
? Had a quick glance at the CoreCtrl code and from what I can tell that seems to be how it is setting the curve as well.
Seems like new firmware was released today https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/commit/?id=b791e15d3e0ac2705eaa7965ed9b6d4c85fef2a2
It does absolutely nothing to help Manjaro, so it's not a firmware issue. Manjaro is still forced at 300MHz, and even trying to load defaults with like powerupp or anything doesn't work. I'm about to file a bug report with Manjaro because this has been an issue since I got the card, regardless of which vBIOS or firmware I used. And for some reason, the new firmware isn't available on vanilla Arch yet. But like I said I already downloaded the firmware from the link from the devs and it helped with performance on the new vBIOS but that's the only change it made, I've had the new firmware since before I even filed this issue.
On Tue, Feb 4, 2020 at 2:54 PM azeam notifications@github.com wrote:
Seems like new firmware was released today https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/commit/?id=b791e15d3e0ac2705eaa7965ed9b6d4c85fef2a2
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/azeam/powerupp/issues/1?email_source=notifications&email_token=AM5Y333GXXVL6BK5SGLYX3DRBHBYTA5CNFSM4KOQVV4KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKY66WI#issuecomment-582086489, or unsubscribe https://github.com/notifications/unsubscribe-auth/AM5Y337XBLHLDHVPYEXFJBLRBHBYTANCNFSM4KOQVV4A .
Is it "the same kind of stuck at 300 MHz" you get in Manjaro as when setting the clock >1780 with upp, i.e. radeonjet etc. show all three states at 300 MHz?
Do try the one line triple upp command above (in Arch), in case it helps it would be great.
No, it's stuck at 300MHz in Manjaro right out of the box no matter what. Also no, I'm in Arch right now, but from what I remember from earlier it was like state 1 was 300MHz, state 2 was 850 or 800MHz, and state 3 was 300MHz. I'll check again here in a bit, I'm cloning my Manjaro install to back it up and then I'm gonna install Pop OS where Manjaro was, to see if it happens on Ubuntu-based distros as well, plus I wanna see if the same rendering issues I'm having in RE2 (and RE7, apparently. I just installed it today, and same thing happens) also happen in Pop.
Also, no such luck:
sudo ./upp.py set --write PowerSavingClockTable/PowerSavingClockMax/0=1820 smcPPTable/FreqTableGfx/1=1820 smcPPTable/DcModeMaxFreq/0=1820
Changing PowerSavingClockTable.PowerSavingClockMax.0 from 1780 to 1820 at 0x036
Changing smcPPTable.FreqTableGfx.1 from 1780 to 1820 at 0x330
Changing smcPPTable.DcModeMaxFreq.0 from 1820 to 1820 at 0x406
Commiting changes to '/sys/class/drm/card0/device/pp_table'.
sudo cat /sys/class/drm/card0/device/pp_od_clk_voltage
OD_SCLK:
0: 800Mhz
1: 300Mhz
OD_MCLK:
1: 875MHz
OD_VDDC_CURVE:
0: 800MHz @ 706mV
1: 550MHz @ 706mV
2: 300MHz @ 706mV
OD_RANGE:
SCLK: 800Mhz 1820Mhz
MCLK: 625Mhz 930Mhz
VDDC_CURVE_SCLK[0]: 800Mhz 1820Mhz
VDDC_CURVE_VOLT[0]: 800mV 1050mV
VDDC_CURVE_SCLK[1]: 800Mhz 1820Mhz
VDDC_CURVE_VOLT[1]: 800mV 1050mV
VDDC_CURVE_SCLK[2]: 800Mhz 1820Mhz
VDDC_CURVE_VOLT[2]: 800mV 1050mV
The restrictive limits with the 5600 XT seem to apply under Windows as well. Did you try to increase the OverDrive limits above 1820
upp.py set --write OverDrive8Table/ODSettingsMax/0=1830 OverDrive8Table/ODSettingsMax/2=1830 OverDrive8Table/ODSettingsMax/4=1830 OverDrive8Table/ODSettingsMax/6=1830
and then overclock above 1820 in CoreCtrl/radeon-profile?
No luck:
sudo upp.py set --write OverDrive8Table/ODSettingsMax/0=1830 OverDrive8Table/ODSettingsMax/2=1830 OverDrive8Table/ODSettingsMax/4=1830 OverDrive8Table/ODSettingsMax/6=1830
[sudo] password for matt:
Changing OverDrive8Table.ODSettingsMax.0 from 1820 to 1830 at 0x0e2
Changing OverDrive8Table.ODSettingsMax.2 from 1820 to 1830 at 0x0ea
Changing OverDrive8Table.ODSettingsMax.4 from 1820 to 1830 at 0x0f2
Changing OverDrive8Table.ODSettingsMax.6 from 1820 to 1830 at 0x0fa
Commiting changes to '/sys/class/drm/card0/device/pp_table'.
sudo cat /sys/class/drm/card0/device/pp_od_clk_voltage
OD_SCLK:
0: 800Mhz
1: 1780Mhz
OD_MCLK:
1: 875MHz
OD_VDDC_CURVE:
0: 800MHz @ 706mV
1: 1290MHz @ 738mV
2: 1780MHz @ 935mV
OD_RANGE:
SCLK: 800Mhz 1830Mhz
MCLK: 625Mhz 930Mhz
VDDC_CURVE_SCLK[0]: 800Mhz 1830Mhz
VDDC_CURVE_VOLT[0]: 800mV 1050mV
VDDC_CURVE_SCLK[1]: 800Mhz 1830Mhz
VDDC_CURVE_VOLT[1]: 800mV 1050mV
VDDC_CURVE_SCLK[2]: 800Mhz 1830Mhz
VDDC_CURVE_VOLT[2]: 800mV 1050mV
Then after setting the overclock to 1830 in radeon-profile:
sudo cat /sys/class/drm/card0/device/pp_od_clk_voltage
OD_SCLK:
0: 800Mhz
1: 1830Mhz
OD_MCLK:
1: 875MHz
OD_VDDC_CURVE:
0: 800MHz @ 800mV
1: 1290MHz @ 800mV
2: 1780MHz @ 942mV
OD_RANGE:
SCLK: 800Mhz 1830Mhz
MCLK: 625Mhz 930Mhz
VDDC_CURVE_SCLK[0]: 800Mhz 1830Mhz
VDDC_CURVE_VOLT[0]: 800mV 1050mV
VDDC_CURVE_SCLK[1]: 800Mhz 1830Mhz
VDDC_CURVE_VOLT[1]: 800mV 1050mV
VDDC_CURVE_SCLK[2]: 800Mhz 1830Mhz
VDDC_CURVE_VOLT[2]: 800mV 1050mV
But:
sudo radeonjet get core table
0: 300Mhz *
1: 300Mhz
2: 300Mhz
And that's confirmed, radeon-profile shows 300 as well (and again we've figured out that those are indeed accurate numbers already).
Also, that shouldn't even matter anyway because the issue was never that I couldn't overclock using upp
or powerupp
to anything above 1820, it's that you can't overclock with it at ALL. Remember, even 1781 causes it to error to 300 MHz. I was never trying to get it to go past 1820, I was just trying to get it to go past anything other than the stock 1780, which radeon-profile CAN do, and upp
and powerupp
can't.
And just to confirm, I lowered the range in radeon-profile
back down to 1820, and now the card is running at 1820:
sudo cat /sys/class/drm/card0/device/pp_od_clk_voltage
OD_SCLK:
0: 800Mhz
1: 1820Mhz
OD_MCLK:
1: 875MHz
OD_VDDC_CURVE:
0: 800MHz @ 800mV
1: 1290MHz @ 800mV
2: 1780MHz @ 942mV
OD_RANGE:
SCLK: 800Mhz 1830Mhz
MCLK: 625Mhz 930Mhz
VDDC_CURVE_SCLK[0]: 800Mhz 1830Mhz
VDDC_CURVE_VOLT[0]: 800mV 1050mV
VDDC_CURVE_SCLK[1]: 800Mhz 1830Mhz
VDDC_CURVE_VOLT[1]: 800mV 1050mV
VDDC_CURVE_SCLK[2]: 800Mhz 1830Mhz
VDDC_CURVE_VOLT[2]: 800mV 1050mV
sudo radeonjet get core table
0: 300Mhz
1: 1060Mhz
2: 1820Mhz *
Thanks. Yes, I know this is a different matter, I was just curious if it is possible to increase the OverDrive limits under Linux, but it seems to be the same as in Windows (I believe what you did know is what MorePowerTool does), as suspected.
I don't know for sure why it won't allow the pp table to be set above 1780 but the explanation by @sibradzic makes sense.
My bad, I wasn't trying to insinuate that you like, didn't grasp it, it's just this has been such a long thread I didn't know if maybe it got lost in all the messages, and since as of late we've been trying all sorts of stuff it seemed like a possibility. I know you know what you're doing lol. I love Linux and open source, so I'm happy to try anything to help out, since the 5600 XT is so brand new and I'm probably one of the very few people that is using Linux, has a 5600 XT, AND is wanting to overclock, so I suppose in this instance I can actually be somewhat useful in my contributions, I just wish I knew more so I could try and help out more than I'm currently able to.
No worries, it's interesting to find out more about this card. It's a pity that it doesn't allow the full potential of the pp table, hopefully it will change in the future. I don't think I will add any workarounds by setting the clock in a different way in powerupp, at least for now. It would basically mean just as much hassle (if not more, by complicating the code maintenance and dependencies even for other cards, depending on implementation) as using some other software for setting the OverDrive clock frequency (and powerupp for the other things, to the extent they are adjustable), as is possible now.
If it would have been possible to increase the OverDrive limits it would have made more sense to do it, imho, but it seems like the OverDrive restrictions also apply to the pp table so it wouldn't add anything that is not possible to do with other software. I will add some of the information we've gathered in the readme at least. Closing this issue now but please let me know if there are any changes later on!
(On a totally unrelated note, I noticed in your initial screenshot that the memory dpm selection radiobuttons are not displayed on your system as intended. The positioning of certain GTK elements is for some reason different between different systems, and on your system the size of the radiobuttons are smaller than what they appear for me but I haven't figured out how to set it consistently yet. I'm opening an issue for that and will try to work something out).
I would hold off on looking into that, I'm using i3 and it's probably i3's fault. If I remember correctly, when I was in Plasma it didn't do that. I'll log into a Plasma session at some point today and make sure, at which point you can chalk it up to tiling WM weirdness. i3 has trouble with windows that are supposed to be floating like that, sometimes even if you set them to float.
I'm on kernel 5.5 on Arch Linux, and on my 5600 XT, if I set the core clock to anything below the stock boost (1780MHz), it correctly applies.
sudo cat /sys/class/drm/card0/devices/pp_od_clk_voltage
shows whichever value I set (as doesradeonjet
andradeon-profile
). However, if I set it to ANYTHING above 1780, even 1781MHz, it breaks. If I have my settings like this:sudo cat /sys/class/drm/card0/device/pp_od_clk_voltage
gives me: OD_SCLK: 0: 800Mhz 1: 300Mhz OD_MCLK: 1: 900MHz OD_VDDC_CURVE: 0: 800MHz @ 0mV 1: 550MHz @ 0mV 2: 300MHz @ 0mV OD_RANGE: SCLK: 800Mhz 1820Mhz MCLK: 625Mhz 930Mhz VDDC_CURVE_SCLK[0]: 800Mhz 1820Mhz VDDC_CURVE_VOLT[0]: 800mV 1050mV VDDC_CURVE_SCLK[1]: 800Mhz 1820Mhz VDDC_CURVE_VOLT[1]: 800mV 1050mV VDDC_CURVE_SCLK[2]: 800Mhz 1820Mhz VDDC_CURVE_VOLT[2]: 800mV 1050mVYou'll notice that it makes state 0 800MHz, and the "boost" state, state 1 is 300MHz. I've tried this with a dozen values going all the way up to 1820 (the max of the card). Same thing every time. And it's not a reporting error.
If I raise the memory clock but keep the core at 1780 (or below), it actually applies correctly. And the thing is, when this happens, everything reports the frequency at 300MHz, except
powerupp
. If I try to apply a value of 1785MHz, click "Apply Current", type my password, and then hit "Load Current", everything inpowerupp
stays the same, so it's not properly reading/sys/class/drm/card0/pp_od_clk_voltage
. This sucks, I was super pumped to find such an easy-to-use GUI, and I tried to look at the code sincekdesu
said my password was needed to run/usr/bin/bash
, so I figured it was a bash script. But/usr/bin/powerupp
isn't a bash script and I can't read it (I'm assuming thatpowerupp
executes a second bash script but I can't find it).Yes, I have all the dependencies, and
radeon-profile
will correctly set frequency states.