azeam / powerupp

Simple GUI for UPP
GNU General Public License v3.0
81 stars 7 forks source link

This is Broken #1

Closed gardotd426 closed 4 years ago

gardotd426 commented 4 years ago

I'm on kernel 5.5 on Arch Linux, and on my 5600 XT, if I set the core clock to anything below the stock boost (1780MHz), it correctly applies. sudo cat /sys/class/drm/card0/devices/pp_od_clk_voltage shows whichever value I set (as does radeonjet and radeon-profile). However, if I set it to ANYTHING above 1780, even 1781MHz, it breaks. If I have my settings like this: Screenshot_20200131_230224

sudo cat /sys/class/drm/card0/device/pp_od_clk_voltage gives me: OD_SCLK: 0: 800Mhz 1: 300Mhz OD_MCLK: 1: 900MHz OD_VDDC_CURVE: 0: 800MHz @ 0mV 1: 550MHz @ 0mV 2: 300MHz @ 0mV OD_RANGE: SCLK: 800Mhz 1820Mhz MCLK: 625Mhz 930Mhz VDDC_CURVE_SCLK[0]: 800Mhz 1820Mhz VDDC_CURVE_VOLT[0]: 800mV 1050mV VDDC_CURVE_SCLK[1]: 800Mhz 1820Mhz VDDC_CURVE_VOLT[1]: 800mV 1050mV VDDC_CURVE_SCLK[2]: 800Mhz 1820Mhz VDDC_CURVE_VOLT[2]: 800mV 1050mV

You'll notice that it makes state 0 800MHz, and the "boost" state, state 1 is 300MHz. I've tried this with a dozen values going all the way up to 1820 (the max of the card). Same thing every time. And it's not a reporting error.

If I raise the memory clock but keep the core at 1780 (or below), it actually applies correctly. And the thing is, when this happens, everything reports the frequency at 300MHz, except powerupp. If I try to apply a value of 1785MHz, click "Apply Current", type my password, and then hit "Load Current", everything in powerupp stays the same, so it's not properly reading /sys/class/drm/card0/pp_od_clk_voltage. This sucks, I was super pumped to find such an easy-to-use GUI, and I tried to look at the code since kdesu said my password was needed to run /usr/bin/bash, so I figured it was a bash script. But /usr/bin/powerupp isn't a bash script and I can't read it (I'm assuming that powerupp executes a second bash script but I can't find it).

Yes, I have all the dependencies, and radeon-profile will correctly set frequency states.

azeam commented 4 years ago

Thanks for the report, I suspect this is related to the overdrive changes in stable kernel 5.5, will try to test it as soon as possible. Can you try to disable amdgpu.ppfeaturemask (as (power)upp does not rely on that)?

sibradzic commented 4 years ago

I also seey conflict between pp_table settings and stuff in pp_od_clk_voltage in Linux 5.5 on my 5700. Disabling OverDrive (via amdgpu.ppfeaturemask) fixes it for me. It seems like OverDrive API is not yet fully implemented as of Linux 5.5.

gardotd426 commented 4 years ago

@azeam but if you disable amdgpu.ppfeaturemask how can you even confirm that it's working, because when you disable amdgpu.ppfeaturemask there is no /sys/class/drm/card0/device/pp_od_clk_voltage to check to confirm? You only get that file if amdgpu.ppfeaturemask is set.

gardotd426 commented 4 years ago

And as far as radeonjet is concerned, no, it doesn't fix it:

sudo radeonjet get core table                 
0: 300Mhz *
1: 300Mhz 
2: 300Mhz 
gardotd426 commented 4 years ago

Same with radeontop.

gardotd426 commented 4 years ago

With amdgpu.ppfeaturemask disabled and with powerupp setting the core to 1780 or below, just like before, again radeonjet reports:

sudo radeonjet get core table
0: 300Mhz 
1: 800Mhz *
2: 1780Mhz 
gardotd426 commented 4 years ago

Actually it doesn't even do anything at all without amdgpu.ppfeaturemask set. If you set it to anything below 1780, nothing happens, it doesn't apply and the peak core freq stays at 1780. If you try to set it to anything above 1780, it drops it to 300MHz

azeam commented 4 years ago

At work now, will get back with a longer reply later tonight, but try reading the current values with (sudo) cat /sys/kernel/debug/dri/0/amdgpu_pm_info

gardotd426 commented 4 years ago

Wait yeah it does. Still doesn't work man:

sudo cat /sys/kernel/debug/dri/0/amdgpu_pm_info
Clock Gating Flags Mask: 0x38099f05
    Graphics Medium Grain Clock Gating: On
    Graphics Medium Grain memory Light Sleep: Off
    Graphics Coarse Grain Clock Gating: On
    Graphics Coarse Grain memory Light Sleep: Off
    Graphics Coarse Grain Tree Shader Clock Gating: Off
    Graphics Coarse Grain Tree Shader Light Sleep: Off
    Graphics Command Processor Light Sleep: Off
    Graphics Run List Controller Light Sleep: Off
    Graphics 3D Coarse Grain Clock Gating: Off
    Graphics 3D Coarse Grain memory Light Sleep: Off
    Memory Controller Light Sleep: On
    Memory Controller Medium Grain Clock Gating: On
    System Direct Memory Access Light Sleep: On
    System Direct Memory Access Medium Grain Clock Gating: On
    Bus Interface Medium Grain Clock Gating: On
    Bus Interface Light Sleep: On
    Unified Video Decoder Medium Grain Clock Gating: Off
    Video Compression Engine Medium Grain Clock Gating: Off
    Host Data Path Light Sleep: On
    Host Data Path Medium Grain Clock Gating: On
    Digital Right Management Medium Grain Clock Gating: Off
    Digital Right Management Light Sleep: Off
    Rom Medium Grain Clock Gating: Off
    Data Fabric Medium Grain Clock Gating: Off
    Address Translation Hub Medium Grain Clock Gating: On
    Address Translation Hub Light Sleep: On

GFX Clocks and Power:
    100 MHz (MCLK)
    300 MHz (SCLK)
    300 MHz (PSTATE_SCLK)
    100 MHz (PSTATE_MCLK)
    800 mV (VDDGFX)
    11.0 W (average GPU)

GPU Temperature: 33 C
GPU Load: 0 %
MEM Load: 2 %

SMC Feature Mask: 0x00000622a3ddaffb
VCN: Disabled

It seems this program is just broken

azeam commented 4 years ago

I've now updated to stable kernel 5.5 (from rc2) but I'm not able to reproduce this on my 5700 XT, (in fact nor any other issues even with OverDrive enabled as far as I can tell). I still don't get the:

VDDC_CURVE_SCLK[X] 
VDDC_CURVE_VOLT[X] 

values, like you have on your 5600, so something is different with the OverDrive implementation, either between our systems or the way the 5600/5700 XT cards are working. This is of less importance with the OverDrive turned off though, just a remark. But as @sibradzic noted above there still seem to be issues with the OverDrive settings in combination with the pp table, so keep OverDrive disabled.

Powerupp checks for the pp table revision number and the only one implemented is "12", so our pp tables should be constructed the same, and from the information you have given the application also seems to (read and) change the expected parameters (even if the results aren't).

A few notes: Just to confirm, did you remember to do a update-grub after removing the amdgpu.ppfeaturemask boot parameter (I tend to forget that...)?

The sudo cat /sys/kernel/debug/dri/0/amdgpu_pm_info output is without GPU load, try to run a windowed benchmark and see if it changes when running the command simultaneously.

Have you checked the performance? If the card is dropping to 300 MHz there should be a noticeable performance drop when you change the value from 1780 to 1781.

And the thing is, when this happens, everything reports the frequency at 300MHz, except powerupp.

Powerupp does not read any actual clock speeds, it only reads the values that are set in the pp table (using upp). It is however on top of my to-do list to add some simple monitoring feature.

If I try to apply a value of 1785MHz, click "Apply Current", type my password, and then hit "Load Current", everything in powerupp stays the same, so it's not properly reading /sys/class/drm/card0/pp_od_clk_voltage.

That is expected, if you successfully apply values and then load them they should appear the same. Powerupp only reads the values set in the pp table.

This sucks, I was super pumped to find such an easy-to-use GUI, and I tried to look at the code since kdesu said my password was needed to run /usr/bin/bash, so I figured it was a bash script. But /usr/bin/powerupp isn't a bash script and I can't read it (I'm assuming that powerupp executes a second bash script but I can't find it).

It is not actually a bash script as in a file on the system, but it sends a couple of bash commands under the same pkexec (kdesu) prompt (to avoid having to type the password multiple times): including the upp commands containing the values entered to write to the pp table and also a write to the hwmon power limit (as the pp table power limit is oddly implemented). If you do a "persistent save" it will however create a bash script (containing basically the same things as when applying) in /usr/bin/powerupp_startup_script_cardX.sh

sibradzic commented 4 years ago

With amdgpu.ppfeaturemask disabled and with powerupp setting the core to 1780 or below, just like before, again radeonjet reports:

sudo radeonjet get core table
0: 300Mhz 
1: 800Mhz *
2: 1780Mhz 

This is actually expected when your GPU is idle. Are you sure you are actually putting your GPU under any load when you are checking these values? Try running this little monitoring script in a terminal, before running some game or 3D test, in a window (of full-screen on another monitor, case you have more than one):

cat > ~/monitorgpu.sh << EOF
#!/bin/bash
watch -n0.5 "sudo tail -n16 /sys/kernel/debug/dri/0/amdgpu_pm_info && \
             echo SCLK: && \
             cat /sys/class/drm/card0/device/pp_dpm_sclk && \
             echo MCLK: && \
             cat /sys/class/drm/card0/device/pp_dpm_mclk && \
             echo Temps: && \
             sensors amdgpu-pci-0c00"
EOF
chmod +x ~/monitorgpu.sh

and start it with cd && ./monitorgpu.sh

Now start some GPU load and check those values changing (and change they should, regardless if you have any of the over/under clock/volt applied). MCLK values should fluctuate even if you do simple tings on your desktop, like moving some window around for example...

gardotd426 commented 4 years ago

It's not the load. All I have to do is run sudo sh -c "echo 'high' > /sys/class/drm/card0/device/power_dpm_force_performance_level" which forces it to run at the highest dpm state. I run that every time after I test anything to make sure that I'm running at the highest state. And I knowpoweruppdoesn't read actual clock speeds, but it's not reading the pp table correctly when I try to set it to anything higher than 1780. And there's no difference between the 5700s and 5600s when it comes to this stuff, I just have the patch that the devs added to allow overclocking on Navi (they said it's supposed to be fixed for everyone on 5.5, but obviously it's not because you don't get all those values inpp_od_clk_voltage`. The way mine looks is how it's supposed to look for everyone. I got it straight from the dev I was one of the people actually on the gitlab issue requesting overclocking on Navi. But anyway, the thing that doesn't make any sense is that I can overclock with radeon-profile, it's just powerupp that seems to do nothing. And yes, I remembered to update grub. I'll run that command you asked me to run here in a few and give you the output.

sibradzic commented 4 years ago

Have you tried forcing performance level without any overclocking applied (regardless if it's powerupp, or just upp or pp_od_clk_voltage)? Provided you have no tweaks applied, does any of this affect your GPU clocks at all: echo low | sudo tee /sys/class/drm/card0/device/power_dpm_force_performance_level echo high | sudo tee /sys/class/drm/card0/device/power_dpm_force_performance_level ?

The above totally do work on my 5700 in any 5.5rcX or 5.5 final release, without any additional patches, and regardless of how I modify pp_table or pp_od_clk_voltage settings, so it definitely ain't a kernel "Navi 10" issue.

Do you have the latest radeon firmware binaries deployed? Can you share contents of your 5600XT pp_table?

gardotd426 commented 4 years ago

Changing the performance level to high in /sys/class/drm/card0/device/power_dpm_force_performance_level always forces the GPU to run at it's highest frequency state. If it's at stock, setting it to high forces it to run at 1780Mhz. That's how I was making sure powerupp wasn't working in the first place, When I would apply the powrupp config, and then ran sudo sh -c "echo 'high' > /sys/class/drm/card0/device/power_dpm_force_performance_level, and then queried the current clock speed, it would be shown to be 300 by every program available (radeontop, radeonjet, radeon-profile, cat, etc).

Now we're getting somewhere: I pulled down the upp repo and read the README to see how to use it and ran sudo ./upp.py dump, and got this:

 ./gits/upp/upp.py dump
Dumping the PP table from '/sys/class/drm/card0/device/pp_table' binary...
StructureSize: 1674
TableFormatRevision: 12
RevisionId: 1
TableSize: 482
GoldenPPId: 2292
GoldenRevision: 15418
FormatId: 125
PlatformCaps: 8
ThermalControllerType: 27
SmallPowerLimit1: 0
SmallPowerLimit2: 0
BoostPowerLimit: 0
ODTurboPowerLimit: 0
ODPowerSavePowerLimit: 0
SoftwareShutdownTemp: 118
Reserved0:
  Reserved0 0: 0
  Reserved0 1: 0
  Reserved0 2: 0
  Reserved0 3: 0
  Reserved0 4: 0
  Reserved0 5: 0
PowerSavingClockTable:
  ucTableRevision: 1
  Reserve:
    Byte 0: 0
    Byte 1: 0
    Byte 2: 0
  PowerSavingClockCount: 10
  PowerSavingClockMax:
    Frequency 0: 1780
    Frequency 1: 1267
    Frequency 2: 1086
    Frequency 3: 1267
    Frequency 4: 1267
    Frequency 5: 750
    Frequency 6: 1267
    Frequency 7: 1284
    Frequency 8: 1284
    Frequency 9: 810
    Frequency 10: 0
    Frequency 11: 0
    Frequency 12: 0
    Frequency 13: 0
    Frequency 14: 0
    Frequency 15: 0
  PowerSavingClockMin:
    Frequency 0: 300
    Frequency 1: 100
    Frequency 2: 100
    Frequency 3: 100
    Frequency 4: 507
    Frequency 5: 100
    Frequency 6: 507
    Frequency 7: 308
    Frequency 8: 300
    Frequency 9: 300
    Frequency 10: 0
    Frequency 11: 0
    Frequency 12: 0
    Frequency 13: 0
    Frequency 14: 0
    Frequency 15: 0
OverDrive8Table:
  ucODTableRevision: 128
  Reserve:
    Byte 0: 0
    Byte 1: 0
    Byte 2: 0
  ODFeatureCount: 14
  ODFeatureCapabilities:
    Capability 0: 30
    Capability 1: 0
    Capability 2: 0
    Capability 3: 0
    Capability 4: 1
    Capability 5: 1
    Capability 6: 1
    Capability 7: 1
    Capability 8: 1
    Capability 9: 1
    Capability 10: 1
    Capability 11: 1
    Capability 12: 1
    Capability 13: 1
    Capability 14: 1
    Capability 15: 1
    Capability 16: 1
    Capability 17: 1
    Capability 18: 0
    Capability 19: 0
    Capability 20: 0
    Capability 21: 0
    Capability 22: 0
    Capability 23: 0
    Capability 24: 0
    Capability 25: 0
    Capability 26: 0
    Capability 27: 0
    Capability 28: 0
    Capability 29: 0
    Capability 30: 0
    Capability 31: 0
  ODSettingCount: 0
  ODSettingsMax:
    Setting 0: 1820
    Setting 1: 1820
    Setting 2: 1820
    Setting 3: 1050
    Setting 4: 1820
    Setting 5: 1050
    Setting 6: 1820
    Setting 7: 1050
    Setting 8: 930
    Setting 9: 20
    Setting 10: 3200
    Setting 11: 3200
    Setting 12: 100
    Setting 13: 110
    Setting 14: 2
    Setting 15: 1
    Setting 16: 1
    Setting 17: 1
    Setting 18: 1
    Setting 19: 100
    Setting 20: 100
    Setting 21: 100
    Setting 22: 100
    Setting 23: 100
    Setting 24: 100
    Setting 25: 100
    Setting 26: 100
    Setting 27: 100
    Setting 28: 100
    Setting 29: 0
    Setting 30: 0
    Setting 31: 0
  ODSettingsMin:
    Setting 0: 800
    Setting 1: 800
    Setting 2: 800
    Setting 3: 800
    Setting 4: 800
    Setting 5: 800
    Setting 6: 800
    Setting 7: 800
    Setting 8: 625
    Setting 9: 50
    Setting 10: 700
    Setting 11: 700
    Setting 12: 25
    Setting 13: 50
    Setting 14: 0
    Setting 15: 0
    Setting 16: 0
    Setting 17: 0
    Setting 18: 0
    Setting 19: 25
    Setting 20: 20
    Setting 21: 25
    Setting 22: 20
    Setting 23: 25
    Setting 24: 20
    Setting 25: 25
    Setting 26: 20
    Setting 27: 25
    Setting 28: 20
    Setting 29: 0
    Setting 30: 0
    Setting 31: 0
smcPPTable:
  TableVersion: 8
  FeaturesToRun:
    Features 0: 2749345791
    Features 1: 1571
  SocketPowerLimitAc:
    Wattage 0: 160
    Wattage 1: 0
    Wattage 2: 0
    Wattage 3: 0
  SocketPowerLimitAcTau:
    Time 0: 0
    Time 1: 0
    Time 2: 0
    Time 3: 0
  SocketPowerLimitDc:
    Wattage 0: 160
    Wattage 1: 0
    Wattage 2: 0
    Wattage 3: 0
  SocketPowerLimitDcTau:
    Time 0: 0
    Time 1: 0
    Time 2: 0
    Time 3: 0
  TdcLimitSoc: 14
  TdcLimitSocTau: 0
  TdcLimitGfx: 150
  TdcLimitGfxTau: 0
  TedgeLimit: 100
  ThotspotLimit: 110
  TmemLimit: 105
  Tvr_gfxLimit: 115
  Tvr_mem0Limit: 115
  Tvr_mem1Limit: 115
  Tvr_socLimit: 115
  Tliquid0Limit: 0
  Tliquid1Limit: 0
  TplxLimit: 0
  FitLimit: 0
  PpmPowerLimit: 0
  PpmTemperatureThreshold: 0
  ThrottlerControlMask: 28926
  FwDStateMask: 1
  UlvVoltageOffsetSoc: 100
  UlvVoltageOffsetGfx: 100
  GceaLinkMgrIdleThreshold: 0
  paddingRlcUlvParams0: 0
  paddingRlcUlvParams1: 0
  paddingRlcUlvParams2: 0
  UlvSmnclkDid: 0
  UlvMp1clkDid: 0
  UlvGfxclkBypass: 0
  Padding234: 0
  MinVoltageUlvGfx: 3100
  MinVoltageUlvSoc: 3100
  MinVoltageGfx: 3200
  MinVoltageSoc: 3200
  MaxVoltageGfx: 4200
  MaxVoltageSoc: 4200
  LoadLineResistanceGfx: 76
  LoadLineResistanceSoc: 0
  DpmDescriptor:
    DpmDescriptor 0:
      VoltageMode: 1
      SnapToDiscrete: 0
      NumDiscreteLevels: 2
      padding: 0
      ConversionToAvfsClk:
        m: 0.0
        b: 0.0
      SsCurve:
        a: 0.2542000114917755
        b: -0.2162500023841858
        c: 0.6957200169563293
    DpmDescriptor 1:
      VoltageMode: 1
      SnapToDiscrete: 0
      NumDiscreteLevels: 2
      padding: 0
      ConversionToAvfsClk:
        m: 1.0
        b: 0.0
      SsCurve:
        a: 0.21750999987125397
        b: -0.05852000042796135
        c: 0.714680016040802
    DpmDescriptor 2:
      VoltageMode: 1
      SnapToDiscrete: 1
      NumDiscreteLevels: 4
      padding: 0
      ConversionToAvfsClk:
        m: 1.0
        b: 0.0
      SsCurve:
        a: 0.21750999987125397
        b: -0.05852000042796135
        c: 0.714680016040802
    DpmDescriptor 3:
      VoltageMode: 1
      SnapToDiscrete: 0
      NumDiscreteLevels: 2
      padding: 0
      ConversionToAvfsClk:
        m: 0.6442999839782715
        b: 0.5349000096321106
      SsCurve:
        a: 0.0
        b: 0.38510000705718994
        c: 0.567799985408783
    DpmDescriptor 4:
      VoltageMode: 1
      SnapToDiscrete: 0
      NumDiscreteLevels: 2
      padding: 0
      ConversionToAvfsClk:
        m: 0.5094000101089478
        b: 0.5924999713897705
      SsCurve:
        a: 0.0
        b: 0.33070001006126404
        c: 0.5684999823570251
    DpmDescriptor 5:
      VoltageMode: 1
      SnapToDiscrete: 0
      NumDiscreteLevels: 2
      padding: 0
      ConversionToAvfsClk:
        m: 1.25600004196167
        b: -0.34380000829696655
      SsCurve:
        a: 0.0
        b: 0.5343000292778015
        c: 0.24529999494552612
    DpmDescriptor 6:
      VoltageMode: 1
      SnapToDiscrete: 0
      NumDiscreteLevels: 2
      padding: 0
      ConversionToAvfsClk:
        m: 0.8216000199317932
        b: 0.014600000344216824
      SsCurve:
        a: 0.0
        b: 0.47760000824928284
        c: 0.2526000142097473
    DpmDescriptor 7:
      VoltageMode: 2
      SnapToDiscrete: 0
      NumDiscreteLevels: 2
      padding: 0
      ConversionToAvfsClk:
        m: 0.0
        b: 0.0
      SsCurve:
        a: 0.0
        b: 0.0
        c: 0.0
    DpmDescriptor 8:
      VoltageMode: 2
      SnapToDiscrete: 0
      NumDiscreteLevels: 2
      padding: 0
      ConversionToAvfsClk:
        m: 0.0
        b: 0.0
      SsCurve:
        a: 0.0
        b: 0.0
        c: 0.0
  FreqTableGfx:
    Frequency 0: 300
    Frequency 1: 1780
    Frequency 2: 1400
    Frequency 3: 1400
    Frequency 4: 1400
    Frequency 5: 1400
    Frequency 6: 1400
    Frequency 7: 1400
    Frequency 8: 1400
    Frequency 9: 1400
    Frequency 10: 1400
    Frequency 11: 1400
    Frequency 12: 1400
    Frequency 13: 1400
    Frequency 14: 1400
    Frequency 15: 1400
  FreqTableVclk:
    Frequency 0: 100
    Frequency 1: 1267
    Frequency 2: 1267
    Frequency 3: 1267
    Frequency 4: 1267
    Frequency 5: 1267
    Frequency 6: 1267
    Frequency 7: 1267
  FreqTableDclk:
    Frequency 0: 100
    Frequency 1: 1086
    Frequency 2: 1086
    Frequency 3: 1086
    Frequency 4: 1086
    Frequency 5: 1086
    Frequency 6: 1086
    Frequency 7: 1086
  FreqTableSocclk:
    Frequency 0: 507
    Frequency 1: 1267
    Frequency 2: 950
    Frequency 3: 950
    Frequency 4: 950
    Frequency 5: 950
    Frequency 6: 950
    Frequency 7: 950
  FreqTableUclk:
    Frequency 0: 100
    Frequency 1: 500
    Frequency 2: 625
    Frequency 3: 900
  FreqTableDcefclk:
    Frequency 0: 507
    Frequency 1: 1267
    Frequency 2: 1267
    Frequency 3: 1267
    Frequency 4: 1267
    Frequency 5: 1267
    Frequency 6: 1267
    Frequency 7: 1267
  FreqTableDispclk:
    Frequency 0: 308
    Frequency 1: 1284
    Frequency 2: 1284
    Frequency 3: 1284
    Frequency 4: 1284
    Frequency 5: 1284
    Frequency 6: 1284
    Frequency 7: 1284
  FreqTablePixclk:
    Frequency 0: 300
    Frequency 1: 1284
    Frequency 2: 1188
    Frequency 3: 1188
    Frequency 4: 1188
    Frequency 5: 1188
    Frequency 6: 1188
    Frequency 7: 1188
  FreqTablePhyclk:
    Frequency 0: 300
    Frequency 1: 810
    Frequency 2: 810
    Frequency 3: 810
    Frequency 4: 810
    Frequency 5: 810
    Frequency 6: 810
    Frequency 7: 810
  Paddingclks:
    Padding32 0: 30409168
    Padding32 1: 30409168
    Padding32 2: 30409168
    Padding32 3: 30409168
    Padding32 4: 30409168
    Padding32 5: 30409168
    Padding32 6: 30409168
    Padding32 7: 30409168
    Padding32 8: 30409168
    Padding32 9: 30409168
    Padding32 10: 30409168
    Padding32 11: 30409168
    Padding32 12: 30409168
    Padding32 13: 30409168
    Padding32 14: 30409168
    Padding32 15: 30409168
  DcModeMaxFreq:
    Frequency 0: 1780
    Frequency 1: 1267
    Frequency 2: 875
    Frequency 3: 1086
    Frequency 4: 1267
    Frequency 5: 1267
    Frequency 6: 1284
    Frequency 7: 1284
    Frequency 8: 810
  Padding8_Clks: 464
  FreqTableUclkDiv:
    Byte 0: 0
    Byte 1: 3
    Byte 2: 3
    Byte 3: 3
  Mp0clkFreq:
    Frequency 0: 304
    Frequency 1: 507
  Mp0DpmVoltage:
    Voltage 0: 3200
    Voltage 1: 3200
  MemVddciVoltage:
    Voltage 0: 2700
    Voltage 1: 3400
    Voltage 2: 3400
    Voltage 3: 3400
  MemMvddVoltage:
    Voltage 0: 5000
    Voltage 1: 5400
    Voltage 2: 5400
    Voltage 3: 5400
  GfxclkFgfxoffEntry: 800
  GfxclkFinit: 800
  GfxclkFidle: 800
  GfxclkSlewRate: 0
  GfxclkFopt: 0
  Padding567:
    Byte 0: 208
    Byte 1: 1
  GfxclkDsMaxFreq: 0
  GfxclkSource: 1
  Padding456: 2
  LowestUclkReservedForUlv: 0
  Padding8_Uclk:
    Byte 0: 0
    Byte 1: 91
    Byte 2: 0
  MemoryType: 0
  MemoryChannels: 12
  PaddingMem:
    Byte 0: 0
    Byte 1: 0
  PcieGenSpeed:
    Speed 0: 0
    Speed 1: 3
  PcieLaneCount:
    Count 0: 6
    Count 1: 6
  LclkFreq:
    Frequency 0: 81
    Frequency 1: 619
  EnableTdpm: 0
  TdpmHighHystTemperature: 0
  TdpmLowHystTemperature: 0
  GfxclkFreqHighTempLimit: 0
  FanStopTemp: 50
  FanStartTemp: 60
  FanGainEdge: 400
  FanGainHotspot: 100
  FanGainLiquid0: 400
  FanGainLiquid1: 400
  FanGainVrGfx: 400
  FanGainVrSoc: 400
  FanGainVrMem0: 400
  FanGainVrMem1: 400
  FanGainPlx: 400
  FanGainMem: 400
  FanPwmMin: 15
  FanAcousticLimitRpm: 1000
  FanThrottlingRpm: 2900
  FanMaximumRpm: 3200
  FanTargetTemperature: 81
  FanTargetGfxclk: 800
  FanTempInputSelect: 1
  FanPadding: 0
  FanZeroRpmEnable: 1
  FanTachEdgePerRev: 2
  FuzzyFan_ErrorSetDelta: 0
  FuzzyFan_ErrorRateSetDelta: 0
  FuzzyFan_PwmSetDelta: 0
  FuzzyFan_Reserved: 0
  OverrideAvfsGb:
    Byte 0: 0
    Byte 1: 0
  Padding8_Avfs:
    Byte 0: 0
    Byte 1: 0
  qAvfsGb:
    qAvfsGb 0:
      a: 0.017810000106692314
      b: -0.047279998660087585
      c: 0.054019998759031296
    qAvfsGb 1:
      a: 0.0
      b: 0.0
      c: 0.029999999329447746
  dBtcGbGfxPll:
    a: 0.0
    b: 0.0
    c: 0.0
  dBtcGbGfxDfll:
    a: 0.09754999727010727
    b: 0.04839000105857849
    c: -0.07373999804258347
  dBtcGbSoc:
    a: 0.0023399998899549246
    b: -0.0023900000378489494
    c: 0.09239000082015991
  qAgingGb:
    qAgingGb 0:
      m: 0.0
      b: 0.0
    qAgingGb 1:
      m: 0.0
      b: 0.0
  qStaticVoltageOffset:
    qStaticVoltageOffset 0:
      a: 0.0
      b: 0.0
      c: 0.0
    qStaticVoltageOffset 1:
      a: 0.0
      b: 0.0
      c: 0.0
  DcTol:
    Voltage 0: 160
    Voltage 1: 160
  DcBtcEnabled:
    Byte 0: 1
    Byte 1: 1
  Padding8_GfxBtc:
    Byte 0: 0
    Byte 1: 0
  DcBtcMin:
    Voltage 0: 0
    Voltage 1: 0
  DcBtcMax:
    Voltage 0: 160
    Voltage 1: 160
  DebugOverrides: 512
  ReservedEquation0:
    a: 0.0
    b: 0.0
    c: 0.0
  ReservedEquation1:
    a: 0.0
    b: 0.0
    c: 0.0
  ReservedEquation2:
    a: 0.0
    b: 0.0
    c: 0.0
  ReservedEquation3:
    a: 0.0
    b: 0.0
    c: 0.0
  TotalPowerConfig: 1
  TotalPowerSpare1: 0
  TotalPowerSpare2: 0
  PccThresholdLow: 0
  PccThresholdHigh: 0
  PaddingAPCC:
    Padding32 0: 0
    Padding32 1: 0
    Padding32 2: 0
    Padding32 3: 0
    Padding32 4: 0
    Padding32 5: 0
  VDDGFX_TVmin: 0
  VDDSOC_TVmin: 0
  VDDGFX_Vmin_HiTemp: 0
  VDDGFX_Vmin_LoTemp: 0
  VDDSOC_Vmin_HiTemp: 0
  VDDSOC_Vmin_LoTemp: 0
  VDDGFX_TVminHystersis: 0
  VDDSOC_TVminHystersis: 0
  BtcConfig: 0
  SsFmin:
    Frequency 0: 425
    Frequency 1: 135
    Frequency 2: 135
    Frequency 3: 0
    Frequency 4: 0
    Frequency 5: 0
    Frequency 6: 0
    Frequency 7: 0
    Frequency 8: 0
    Frequency 9: 0
  DcBtcGb:
    Voltage 0: 25
    Voltage 1: 25
  Reserved:
    Padding32 0: 1130
    Padding32 1: 1465
    Padding32 2: 1560
    Padding32 3: 0
    Padding32 4: 0
    Padding32 5: 0
    Padding32 6: 0
    Padding32 7: 0
  I2cControllers:
    I2cControllers 0:
      Enabled: 0
      Speed: 0
      Padding0: 0
      Padding1: 0
      SlaveAddress: 0
      ControllerPort: 0
      ControllerName: 0
      ThermalThrottler: 0
      I2cProtocol: 0
    I2cControllers 1:
      Enabled: 0
      Speed: 0
      Padding0: 0
      Padding1: 0
      SlaveAddress: 0
      ControllerPort: 0
      ControllerName: 0
      ThermalThrottler: 0
      I2cProtocol: 0
    I2cControllers 2:
      Enabled: 0
      Speed: 0
      Padding0: 0
      Padding1: 0
      SlaveAddress: 0
      ControllerPort: 0
      ControllerName: 0
      ThermalThrottler: 0
      I2cProtocol: 0
    I2cControllers 3:
      Enabled: 0
      Speed: 0
      Padding0: 0
      Padding1: 0
      SlaveAddress: 0
      ControllerPort: 0
      ControllerName: 0
      ThermalThrottler: 0
      I2cProtocol: 0
    I2cControllers 4:
      Enabled: 0
      Speed: 0
      Padding0: 0
      Padding1: 0
      SlaveAddress: 0
      ControllerPort: 0
      ControllerName: 0
      ThermalThrottler: 0
      I2cProtocol: 0
    I2cControllers 5:
      Enabled: 0
      Speed: 0
      Padding0: 0
      Padding1: 0
      SlaveAddress: 0
      ControllerPort: 0
      ControllerName: 0
      ThermalThrottler: 0
      I2cProtocol: 0
    I2cControllers 6:
      Enabled: 0
      Speed: 0
      Padding0: 0
      Padding1: 0
      SlaveAddress: 0
      ControllerPort: 0
      ControllerName: 0
      ThermalThrottler: 0
      I2cProtocol: 0
    I2cControllers 7:
      Enabled: 0
      Speed: 0
      Padding0: 0
      Padding1: 0
      SlaveAddress: 0
      ControllerPort: 0
      ControllerName: 0
      ThermalThrottler: 0
      I2cProtocol: 0
  MaxVoltageStepGfx: 0
  MaxVoltageStepSoc: 0
  VddGfxVrMapping: 0
  VddSocVrMapping: 0
  VddMem0VrMapping: 0
  VddMem1VrMapping: 0
  GfxUlvPhaseSheddingMask: 0
  SocUlvPhaseSheddingMask: 0
  ExternalSensorPresent: 0
  Padding8_V: 0
  GfxMaxCurrent: 0
  GfxOffset: 0
  Padding_TelemetryGfx: 0
  SocMaxCurrent: 0
  SocOffset: 0
  Padding_TelemetrySoc: 0
  Mem0MaxCurrent: 0
  Mem0Offset: 0
  Padding_TelemetryMem0: 0
  Mem1MaxCurrent: 0
  Mem1Offset: 0
  Padding_TelemetryMem1: 0
  AcDcGpio: 0
  AcDcPolarity: 0
  VR0HotGpio: 0
  VR0HotPolarity: 0
  VR1HotGpio: 0
  VR1HotPolarity: 0
  GthrGpio: 0
  GthrPolarity: 0
  LedPin0: 0
  LedPin1: 0
  LedPin2: 0
  padding8_4: 0
  PllGfxclkSpreadEnabled: 0
  PllGfxclkSpreadPercent: 0
  PllGfxclkSpreadFreq: 0
  DfllGfxclkSpreadEnabled: 0
  DfllGfxclkSpreadPercent: 0
  DfllGfxclkSpreadFreq: 0
  UclkSpreadEnabled: 0
  UclkSpreadPercent: 0
  UclkSpreadFreq: 0
  SoclkSpreadEnabled: 0
  SocclkSpreadPercent: 0
  SocclkSpreadFreq: 0
  TotalBoardPower: 0
  BoardPadding: 0
  MvddRatio: 0
  BoardReserved:
    Padding32 0: 0
    Padding32 1: 0
    Padding32 2: 0
    Padding32 3: 0
    Padding32 4: 0
    Padding32 5: 0
    Padding32 6: 0
    Padding32 7: 0
    Padding32 8: 0
  MmHubPadding:
    Padding32 0: 0
    Padding32 1: 0
    Padding32 2: 0
    Padding32 3: 0
    Padding32 4: 0
    Padding32 5: 0
    Padding32 6: 0
    Padding32 7: 0
TableContentRevision: 0

You'll notice up there, that the frequency table starts with state 0 at 300Mhz, then goes to state 1 at 1780, and down from there. This is everything default, no amdgpu.ppfeaturemask, nothing. So, since 1780 is supposed to be the max (I don't even know why 300 is at state 0), I ran sudo ./upp.py set /smcPPTable/FreqTableGfx/1=1790 --write. Which immediately forced the clocks to 300MHz just like powerupp was doing. So therein must lie the issue. And yes, those are true clocks. Running ./upp.py dump after running the above command gives the same result as before, only it says 1790 for state 1 instead of 1780. So all should be good, right? You would think so, since the 1780 frequency state 1 is what it runs at normally under load. But nope. radeonjet get core table returns:

0: 300Mhz
1: 300Mhz
2: 300Mhz

as does `radeon-profile. So, I took your advice and ran something to stress the gpu to make sure they weren't just inaccurately reported for some reason. So I ran unigine heaven, and unless you think 8 or 9 fps on an RX 5600 XT sounds right, then no, they're definitely being reported correctly and the clocks actually DO get set to 300Mhz. I reapplied the default stock card settings and ran unigine heaven again, at the reported 1780MHz, and sure enough, I was back up to 78-79 fps average. So this isn't a reporting error, running powerupp (and upp as well, which would make sense) and making it anything above the stock clock frequency forces the clock to run at 300MHz.

So, naturally my next thought was to try to change the 300MHz state 0, right? Bad idea. Running sudo ./upp.py set /smcPPTable/FreqTableGfx/0=1790 --write threw a bunch of errors:

sudo ./upp.py set /smcPPTable/FreqTableGfx/0=1790 --write
Changing smcPPTable.FreqTableGfx.0 from 300 to 1790 at 0x32e
Commiting changes to '/sys/class/drm/card0/device/pp_table'.
Traceback (most recent call last):
  File "./upp.py", line 183, in <module>
    cli(obj={})()
  File "/usr/lib/python3.8/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/usr/lib/python3.8/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/usr/lib/python3.8/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/lib/python3.8/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/lib/python3.8/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/usr/lib/python3.8/site-packages/click/decorators.py", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "./upp.py", line 170, in set
    decode._write_pp_tables_file(input_file, decode.pp_tbl_bytes)
  File "/home/matt/gits/upp/decode.py", line 30, in _write_pp_tables_file
    f.close()
OSError: [Errno 62] Timer expired

At which point the entire pp_table file had its contents erased, or at least that's how it acted. The system almost froze, and radeon-profile reports NO clock speeds at all (everything is just blank), and radeonjet get core table returns nothing. At all.

So obviously that 0 state can't be modified, or something. Now again, all of this is WITHOUT amdgpu.ppfeaturemask enabled. With amdgpu.ppfeaturemask enabled, powerupp still doesn't work, I haven't yet tried upp but surely it will be the same thing.

This is actually expected when your GPU is idle. Are you sure you are actually putting your GPU under any load when you are checking these values? Try running this little monitoring script in a terminal, before running some game or 3D test, in a window (of full-screen on another monitor, case you have more than one):

Regarding that comment, I'm not sure what you're referring to. I wasn't trying to illustrate that the card was running at 800MHz in the quote you were replying to there, I was pointing out that if you set the core clock to anything 1780MHz and BELOW, then it worked fine, and radeonjet reports the table as it should, as in the example you'll see that there are 3 states, 300MHz, 800MHz, and 1780MHz. But trying to set it to anything above 1780 in powerupp sets all three of those states to 300MHz. So I'm not sure what the point was of that quote, but either way it doesn't matter now because we know that no, it's not anything to do with the gpu not being under load, and the clocks are in fact all forced to 300MHz.

Not really sure where to go from here, I mean maybe it's something to do with the new kernel patch that they added (which is why my pp_od_clk_voltage shows more info than yours, that's not because it's a 5600 vs 5700, other people with 5700s with that patch in effect show the exact same table as me). I just installed linux-mainline which is a vanilla 5.5-1 kernel which should be the same one you're using (or effectively the same), and see what happens. But as of right now, it seems with the new kernel patch upp and powerupp are going to be broken, and that patch I believe is going to be mainlined, I just don't think it made it in time for 5.5-1

gardotd426 commented 4 years ago

UPDATE: This is a crazy coincidence, but I was commenting in the comments section on the Phoronix article about the 5600 XT and the new firmware, and I was asking about something to do with memory clocks completely unrelated to overclocking, and someone who is apparently an engineer replied "If you go outside the firmware's limits the clock defaults to 300 MHz. That matches the performance Michael was seeing."

So it sounds like for some reason the way that upp tries to edit pp_table to change the clocks violates the firmware settings and forces the clocks to 300MHz, whereas if you use pp_od_clk_voltage like radeon-profile, you can in fact go up to the 1820MHz firmware limit (but pp_od_clk_voltage requires amdgpu.ppfeaturemask set)

sibradzic commented 4 years ago

I ran sudo ./upp.py set /smcPPTable/FreqTableGfx/1=1790 --write. Which immediately forced the clocks to 300MHz just like powerupp was doing. So therein must lie the issue.

OK, so setting anything larger than 1780 is making the amdgpu driver power-management go nuts for you? Have you tried to see if there is anything significant in the kernel log (dmesg) after you do that? How about setting the same thing to something lower than 1780, does that break the driver?

But nope. radeonjet get core table returns:

Sorry, I have no clue what is radeonjet, but I guess that should match the output of cat /sys/class/drm/card0/device/pp_dpm_sclk? If so, it totally seems that the driver goes nuts...

So, naturally my next thought was to try to change the 300MHz state 0, right? Bad idea.

Indeed :) Lowest state clocks for both GPU & VRAM are not meant to be changed at all. Hence the very unpredictable driver behaviour or just hang.

Not really sure where to go from here

It looks to me that your card firmware is blocking your max FreqTableGfx/1 clock. Try setting lower clock to confirm if the pp_table interface works at all in the first place (you may also try changing all instances of 1780 to 1800 for example, just for the lulz). Then make sure you have the latest VBIOS as well as the latest firmware, check https://www.phoronix.com/scan.php?page=news_item&px=Ubuntu-19.10-Radeon-RX-5700. As I see no report of an issue similar to yours on any 5700 cards, my gut feeling is telling mi this is totally about 5600XT firmware / VBIOS. Are you running factory-VBIOS (one with AMD-pre-anounced lower clocks) or the one after the card was released?

btw, can you please share your pp_table, in its raw form?

sibradzic commented 4 years ago

check https://www.phoronix.com/forums/forum/linux-graphics-x-org-drivers/open-source-amd-linux/1154834-radeon-rx-5600-xt-with-new-vbios-offering-better-linux-performance-following-fix/page3

https://people.freedesktop.org/~agd5f/radeon_ucode/navi10/new/navi10_smc.bin

oh, wait, you are there already...

gardotd426 commented 4 years ago

check https://www.phoronix.com/forums/forum/linux-graphics-x-org-drivers/open-source-amd-linux/1154834-radeon-rx-5600-xt-with-new-vbios-offering-better-linux-performance-following-fix/page3

https://people.freedesktop.org/~agd5f/radeon_ucode/navi10/new/navi10_smc.bin

That's literally the link that I posted above, it's the same comments section. I already have that firmware, I got it from the devs days ago. Also, there IS no "after the card was released" vBIOS for the Sapphire Pulse, Sapphire actually flashed the new vBIOS on all of their 5600 XTs in North America before launch, which is why my stock frequency is 1780Mhz instead of 1650 or 1675 or whatever, which was the original one. But, I do have a copy of the original vBIOS but that would be useless because the new firmware is for the new vBIOS, and the old vBIOS has lower limits than the new one.

Also yes, if you read my original comments, like I said if you set it to anything under 1780 in upp or powerupp it does in fact work. Going over 1780, though, does not. And from the engineer in the Phoronix forum's comments, it sounds like it's something to do with the way upp tries to change clock speeds which violates the firmware's settings, as opposed to radeon-profile which uses pp_od_clk_voltage and doesn't violate those settings, which would explain why radeon-profile works and upp doesn't.

azeam commented 4 years ago

Well, I guess you have your answer there, it's a firmware limitation. I agree that it's odd not being able to increase the clock to 1820 MHz with pp table though. Anyway this is not an issue with neither powerupp nor upp, they are doing what they are supposed to (i.e. reading and adjusting the pp table) afaict, but I find it interesting and would like to know more so I'll keep the issue open for a while if there's more information to be had.

If I raise the memory clock but keep the core at 1780 (or below), it actually applies correctly. And the thing is, when this happens, everything reports the frequency at 300MHz, except powerupp.

Does this mean that if you keep the Gfx clock at 1780 you can increase the memory clock without anything breaking or does the Gfx clock drop to 300 if you increase the memory clock? What about Gfx voltage, can that be increased (if you keep the clock at 1780)?

I noticed that you experienced similar issues earlier in Manjaro. Was this without any overclocking applied and did you solve that?

sibradzic commented 4 years ago

it sounds like it's something to do with the way upp tries to change clock speeds which violates the firmware's settings, as opposed to radeon-profile which uses pp_od_clk_voltage and doesn't violate those settings, which would explain why radeon-profile works and upp doesn't.

What upp does is simply changing a value in pp_table, and it does its job correctly, as you had already demonstrated. It is amdgpu driver logic that processes the Power Play changes when tables are changed, and re-applies all the clock/voltage parameters from scratch (basically, modifying pp_table would cause driver power management to be completely re-initialized). I guess this re-init would fail with "unexpected" setting in Power Play. On the other hand, the sysfs API clock change does not re-init everything, it just trigger clock change in the driver logic, which is likely the reason of success with setting clock above 1780 pp_od_clk_voltage. The way I see all of this is a firmware quirk very specific to 5600XT, it was never an issue with powerupp or upp in the first place.

Since you have both old & new Sapphire Pulse 5600XT vBIOS files, can you please share? I totally need them for comparing Power Play tables and double-checking if upp works as expected on both.

gardotd426 commented 4 years ago

Does this mean that if you keep the Gfx clock at 1780 you can increase the memory clock without anything breaking or does the Gfx clock drop to 300 if you increase the memory clock? What about Gfx voltage, can that be increased (if you keep the clock at 1780)?

Yes. Memory overclocking worked. I don't know about the voltages, because Navi doesn't have a voltage for each state, only a voltage curve, and I don't feel comfortable in my knowledge of the 5600 XT safe voltages to test out raising voltage limits. I've tried lowering them, and that works.

I noticed that you experienced similar issues earlier in Manjaro. Was this without any overclocking applied and did you solve that?

No, but I never tested the new firmware or anything like that. After that very initial testing, I just went back to Arch since the card was working fine there, and I haven't used Manjaro since, I've just been using Arch. I'll try it out later today though and use the new firmware and see if anything is fixed. I imagine it was the firmware issue though.

What upp does is simply changing a value in pp_table, and it does its job correctly, as you had already demonstrated.

That's what I'm saying. It looks like editing pp_table is the issue, as apparently that causes the firmware to freak out. It does seem like this is something due to the firmware, like I said, BUT I wouldn't say it's an "issue" with the card OR with upp/powerupp, it sounds like upp just isn't compatibile with this card. But anyway, I'll upload the new and original versions of the performance vBIOS if you want:

Details on which is which are in the README in the zip vbios.zip

azeam commented 4 years ago

Ok, so possibly the only firmware limit is the maximum target frequency. It could be possible to do some workarounds only for 5600 XT in powerupp by setting the target frequency using OverDrive instead of the pp table, but, will consider it... Here is something you can try:

First enable OverDrive (amdgpu.ppfeaturemask=0xffffffff) and reboot.

In terminal (with proper path to upp, and yes it's supposed to be 1830, or anything above 1820 at least):

upp.py set --write OverDrive8Table/ODSettingsMax/0=1830
sudo sh -c "echo 's 1 1830' > /sys/class/drm/card0/device/pp_od_clk_voltage"
cat /sys/class/drm/card0/device/pp_od_clk_voltage

Another thing to note is that the target frequency and the actual working frequency of the GPU are not (always) the same, meaning that in order to actually get the card running at clocks higher than 1820 you would probably have to increase the voltage (for example I can only run at a maximum of 30 MHz below the target frequency and 80 below the OverDrive max with stock settings on my 5700 XT, so a bit surprising that you can run at 1820 without increasing the voltage on your 5600 XT). But I think it would break to 300 MHz when increasing the OverDrive limit if it affects some firmware limit. In case it works (cat pp_od_clk_voltage shows 1830 MHz) can you also paste the output of:

glxinfo -B | egrep 'Device|OpenGL renderer'

gardotd426 commented 4 years ago

Doing the above doesn't cause an error or anything, but it has no actual effect other than changing the max clock in OD_RANGE: for SCLK to 1830 instead of 1820 in pp_od_clk_voltage as well as changing state 1 for OD_SCLK from 1780 to 1830. But the card is still running at 1780 according to radeon-profile and radeonjet. Well, sudo sh -c "echo 's 1 1830' > /sys/class/drm/card0/device/pp_od_clk_voltage" fails with a Permission denied error, but that's because for some reason with some cards you can't write to the symlinked /sys/class/drm/card0/device/pp_od_clk_voltage, but you can instead run sudo sh -c "echo 's 1 1830' > /sys/devices/pci0000:00/0000:00:03.1/0000:07:00.0/0000:08:00.0/0000:09:00.0/pp_od_clk_voltage" which is the true location (on my MOBO, others may have different numbers I guess). So I ran that command, got no error. I then ran sudo cat /sys/class/drm/card0/device/pp_od_clk_voltage and got this:

sudo cat /sys/class/drm/card0/device/pp_od_clk_voltage
OD_SCLK:
0: 800Mhz
1: 1830Mhz
OD_MCLK:
1: 900MHz
OD_VDDC_CURVE:
0: 800MHz @ 0mV
1: 1290MHz @ 0mV
2: 1780MHz @ 0mV
OD_RANGE:
SCLK:     800Mhz       1830Mhz
MCLK:     625Mhz        930Mhz
VDDC_CURVE_SCLK[0]:     800Mhz       1820Mhz
VDDC_CURVE_VOLT[0]:     800mV        1050mV
VDDC_CURVE_SCLK[1]:     800Mhz       1820Mhz
VDDC_CURVE_VOLT[1]:     800mV        1050mV
VDDC_CURVE_SCLK[2]:     800Mhz       1820Mhz
VDDC_CURVE_VOLT[2]:     800mV        1050mV

So the OD_SCLK and OD_RANGE values get changed, but the OD_VDDC_CURVE stays at the stock settings, and I'm not sure how to adjust that with a sudo sh -c "echo '.........' > /sys/class/blahblahblah". I know you can use 's <state number> <value>' and 'm <state number> <value>' for OD_SCLK and OD_MCLK, but that's all I know, as Polaris was completely different. Anyway, after seeing the 1830 properly set, I didn't notice that OD_VDDC_CURVE was still at 1780 so I thought it worked and set the clock speed to 1830. But then I checked and no, it's still at 1780 according to radeon-profile, radeontop, and radeonjet. So then I tried to use upp or powerupp to change the clock to something above 1780, not knowing if it would maybe have been fixed, but no, it still did the same exact thing and set the clocks to 300MHz.

As requested:

glxinfo -B | egrep 'Device|OpenGL renderer'
    Device: AMD Radeon RX 5600 XT (NAVI10, DRM 3.36.0, 5.5.0-3-tkg-pds, LLVM 9.0.1) (0x731f)
OpenGL renderer string: AMD Radeon RX 5600 XT (NAVI10, DRM 3.36.0, 5.5.0-3-tkg-pds, LLVM 9.0.1)

And I would like to say I really appreciate you guys helping try to get this work, even if powerupp and upp seem to be incompatible with the 5600 XT. Maybe we can get it working but even if not, it's very much appreciated.

azeam commented 4 years ago

You can try increasing the OverDrive8Table/ODSettingsMax/2, 4 and 6 (I guess those are the limits for the VDDC_CURVE, not sure what 1 is). You can probably do the OverDrive overclocking in radeon-profile instead of terminal (perhaps restart the program after changing the values).

azeam commented 4 years ago

And if not, in terminal I believe it is "vc 2 1830 1050" (2 for point 2) to set the OD_VDDC_CURVE.

azeam commented 4 years ago

But, have you not been able to get the card running at 1820 MHz before (regardless of method)? I was under that impression but maybe I made that up myself. If not, I would guess that it needs more voltage to run higher than 1780 MHz.

sibradzic commented 4 years ago

Just FYI, both 5600XT VBIOSes pp_tables are fully decode-able and modifiable by upp, no issue there. Maybe @gardotd426 may find this diff between Power Play settings between old and new VBIOS interesting:

diff -u4 Sapphire.RX5600XT.6144.191209.rom.pp_table.dump Sapphire.RX5600XT.411EFMIU.X4E.pp_table.dump 
--- Sapphire.RX5600XT.6144.191209.rom.pp_table.dump 2020-02-04 00:12:47.265947327 +0900
+++ Sapphire.RX5600XT.411EFMIU.X4E.pp_table.dump    2020-02-04 00:13:18.046248608 +0900
@@ -1,11 +1,11 @@
-Dumping the PP table from '../Sapphire.RX5600XT.6144.191209.rom.pp_table' binary...
+Dumping the PP table from '../Sapphire.RX5600XT.411EFMIU.X4E.pp_table' binary...
 StructureSize: 1674
 TableFormatRevision: 12
 RevisionId: 1
 TableSize: 482
 GoldenPPId: 2292
-GoldenRevision: 15288
+GoldenRevision: 15418
 FormatId: 125
 PlatformCaps: 8
 ThermalControllerType: 27
 SmallPowerLimit1: 0
@@ -28,9 +28,9 @@
     Byte 1: 0
     Byte 2: 0
   PowerSavingClockCount: 10
   PowerSavingClockMax:
-    Frequency 0: 1650
+    Frequency 0: 1780
     Frequency 1: 1267
     Frequency 2: 1086
     Frequency 3: 1267
     Frequency 4: 1267
@@ -103,15 +103,15 @@
     Capability 30: 0
     Capability 31: 0
   ODSettingCount: 0
   ODSettingsMax:
-    Setting 0: 1725
-    Setting 1: 1725
-    Setting 2: 1725
+    Setting 0: 1820
+    Setting 1: 1820
+    Setting 2: 1820
     Setting 3: 1050
-    Setting 4: 1725
+    Setting 4: 1820
     Setting 5: 1050
-    Setting 6: 1725
+    Setting 6: 1820
     Setting 7: 1050
     Setting 8: 930
     Setting 9: 20
     Setting 10: 3200
@@ -174,9 +174,9 @@
   FeaturesToRun:
     Features 0: 2749345791
     Features 1: 1571
   SocketPowerLimitAc:
-    Wattage 0: 150
+    Wattage 0: 160
     Wattage 1: 0
     Wattage 2: 0
     Wattage 3: 0
   SocketPowerLimitAcTau:
@@ -184,9 +184,9 @@
     Time 1: 0
     Time 2: 0
     Time 3: 0
   SocketPowerLimitDc:
-    Wattage 0: 150
+    Wattage 0: 160
     Wattage 1: 0
     Wattage 2: 0
     Wattage 3: 0
   SocketPowerLimitDcTau:
@@ -195,9 +195,9 @@
     Time 2: 0
     Time 3: 0
   TdcLimitSoc: 14
   TdcLimitSocTau: 0
-  TdcLimitGfx: 141
+  TdcLimitGfx: 150
   TdcLimitGfxTau: 0
   TedgeLimit: 100
   ThotspotLimit: 110
   TmemLimit: 105
@@ -341,9 +341,9 @@
         b: 0.0
         c: 0.0
   FreqTableGfx:
     Frequency 0: 300
-    Frequency 1: 1650
+    Frequency 1: 1780
     Frequency 2: 1400
     Frequency 3: 1400
     Frequency 4: 1400
     Frequency 5: 1400
@@ -387,9 +387,9 @@
   FreqTableUclk:
     Frequency 0: 100
     Frequency 1: 500
     Frequency 2: 625
-    Frequency 3: 750
+    Frequency 3: 875
   FreqTableDcefclk:
     Frequency 0: 507
     Frequency 1: 1267
     Frequency 2: 1267
@@ -442,11 +442,11 @@
     Padding32 13: 30409168
     Padding32 14: 30409168
     Padding32 15: 30409168
   DcModeMaxFreq:
-    Frequency 0: 1650
+    Frequency 0: 1780
     Frequency 1: 1267
-    Frequency 2: 750
+    Frequency 2: 875
     Frequency 3: 1086
     Frequency 4: 1267
     Frequency 5: 1267
     Frequency 6: 1284
@@ -510,9 +510,9 @@
   GfxclkFreqHighTempLimit: 0
   FanStopTemp: 50
   FanStartTemp: 60
   FanGainEdge: 400
-  FanGainHotspot: 400
+  FanGainHotspot: 100
   FanGainLiquid0: 400
   FanGainLiquid1: 400
   FanGainVrGfx: 400
   FanGainVrSoc: 400
@@ -520,12 +520,12 @@
   FanGainVrMem1: 400
   FanGainPlx: 400
   FanGainMem: 400
   FanPwmMin: 15
-  FanAcousticLimitRpm: 1250
+  FanAcousticLimitRpm: 1000
   FanThrottlingRpm: 2900
   FanMaximumRpm: 3200
-  FanTargetTemperature: 83
+  FanTargetTemperature: 81
   FanTargetGfxclk: 800
   FanTempInputSelect: 1
   FanPadding: 0
   FanZeroRpmEnable: 1

Note the PowerSavingClockMax/0 change from 1650 to 1780. Could this be the thing that is limiting you card? I'd suggest trying adjusting both PowerSavingClockMax/0 & FreqTableGfx/1 and see if it has any consequence (make sure changes are simultaneous)... That new VBIOS also has DcModeMaxFreq/0 set to 1780, I'd try changing that one as well... There is another funny thing about FreqTableUclk/3, the new VBIOS default is 875 but the value in @gardotd426 's dump is 900.

gardotd426 commented 4 years ago

@azeam, yes I've been able to get the overclocking working with other methods including corectrl and radeon-profile. Also, I found out the reason our /sys/class/drm/card0/pp_od_clk_voltage files look different despite Navi overclocking apparently being in the kernel now, is that the patches that I mentioned from the gitlab issue page were not upstreamed yet, I just now found out today from the dev. He also posted another patch to get the correct voltages (I'm assuming he's talking about where it says @0mV, I'm applying the patch to a kernel I'm building right now and will let you know).

@sibradzic, setting FreqTableGfx/1 forces the card to 300MHz again,PowerSavingClockMaxdoesn't though. I don't know what you mean by setting them simultaneously, can you giveupp` two commands in one string? If so, I haven't tried that, only running them one after another.

Also, the vBIOS memory dump thing might be because the new vBIOS that I'm currently used I had a default memory overclock to 900MHz set by default in radeon-profile. I unchecked "Restore Selected Overclock Profile on Start" and rebooted, and it defaults to 875 peak frequency state.

Funnily enough, it seems that OD_SCLK has no actual bearing on clock speeds. When I opened up corectrl and made the peak clock frequency 1820 and applied it, the frequency went up to 1820, in radeon-profile, but OD_SCLK state 1 in /sys/class/drm/card0/pp_od_clk_voltage was still listed as 1780. I have no idea what that's about, other than obviously the Navi OD implementation is in it's infancy and hasn't matured in Linux the way Polaris's implementation had.

But yeah, if it's possible to run two commands at once with upp and that's what you meant, let me know how and I'll try it. But running them one after another, PowerSavingClockMax/0=1820 does nothing and then FreqTableGfx/1=1820 breaks it.

azeam commented 4 years ago

(One line)
upp.py set --write PowerSavingClockTable/PowerSavingClockMax/0=1820 smcPPTable/FreqTableGfx/1=1820 smcPPTable/DcModeMaxFreq/0=1820

Also make sure there are no profiles auto-loaded with radeon-profile or CoreCtrl when testing upp. I don't think that is the cause here but I've noticed some weird things, even without patches and with OverDrive disabled, so safer with them off for trouble-shooting purposes. For example, with OverDrive disabled on my system I can set the CoreCtrl clock slider to 300 MHz and this will cause the card to lock to "manual" dpm performance level and it will stay at state 0 (300 MHz)/manual even if I try to overwrite the perfomance level manually or change the pp table.

As for how the OverDrive overclocking works I'm not very familiar with that, but did you try setting vc point freq volt in pp_od_clk_voltage? Had a quick glance at the CoreCtrl code and from what I can tell that seems to be how it is setting the curve as well.

azeam commented 4 years ago

Seems like new firmware was released today https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/commit/?id=b791e15d3e0ac2705eaa7965ed9b6d4c85fef2a2

gardotd426 commented 4 years ago

It does absolutely nothing to help Manjaro, so it's not a firmware issue. Manjaro is still forced at 300MHz, and even trying to load defaults with like powerupp or anything doesn't work. I'm about to file a bug report with Manjaro because this has been an issue since I got the card, regardless of which vBIOS or firmware I used. And for some reason, the new firmware isn't available on vanilla Arch yet. But like I said I already downloaded the firmware from the link from the devs and it helped with performance on the new vBIOS but that's the only change it made, I've had the new firmware since before I even filed this issue.

On Tue, Feb 4, 2020 at 2:54 PM azeam notifications@github.com wrote:

Seems like new firmware was released today https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/commit/?id=b791e15d3e0ac2705eaa7965ed9b6d4c85fef2a2

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/azeam/powerupp/issues/1?email_source=notifications&email_token=AM5Y333GXXVL6BK5SGLYX3DRBHBYTA5CNFSM4KOQVV4KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKY66WI#issuecomment-582086489, or unsubscribe https://github.com/notifications/unsubscribe-auth/AM5Y337XBLHLDHVPYEXFJBLRBHBYTANCNFSM4KOQVV4A .

azeam commented 4 years ago

Is it "the same kind of stuck at 300 MHz" you get in Manjaro as when setting the clock >1780 with upp, i.e. radeonjet etc. show all three states at 300 MHz?

Do try the one line triple upp command above (in Arch), in case it helps it would be great.

gardotd426 commented 4 years ago

No, it's stuck at 300MHz in Manjaro right out of the box no matter what. Also no, I'm in Arch right now, but from what I remember from earlier it was like state 1 was 300MHz, state 2 was 850 or 800MHz, and state 3 was 300MHz. I'll check again here in a bit, I'm cloning my Manjaro install to back it up and then I'm gonna install Pop OS where Manjaro was, to see if it happens on Ubuntu-based distros as well, plus I wanna see if the same rendering issues I'm having in RE2 (and RE7, apparently. I just installed it today, and same thing happens) also happen in Pop.

Also, no such luck:

 sudo ./upp.py set --write PowerSavingClockTable/PowerSavingClockMax/0=1820 smcPPTable/FreqTableGfx/1=1820 smcPPTable/DcModeMaxFreq/0=1820
Changing PowerSavingClockTable.PowerSavingClockMax.0 from 1780 to 1820 at 0x036
Changing smcPPTable.FreqTableGfx.1 from 1780 to 1820 at 0x330
Changing smcPPTable.DcModeMaxFreq.0 from 1820 to 1820 at 0x406
Commiting changes to '/sys/class/drm/card0/device/pp_table'.
sudo cat /sys/class/drm/card0/device/pp_od_clk_voltage
OD_SCLK:
0: 800Mhz
1: 300Mhz
OD_MCLK:
1: 875MHz
OD_VDDC_CURVE:
0: 800MHz @ 706mV
1: 550MHz @ 706mV
2: 300MHz @ 706mV
OD_RANGE:
SCLK:     800Mhz       1820Mhz
MCLK:     625Mhz        930Mhz
VDDC_CURVE_SCLK[0]:     800Mhz       1820Mhz
VDDC_CURVE_VOLT[0]:     800mV        1050mV
VDDC_CURVE_SCLK[1]:     800Mhz       1820Mhz
VDDC_CURVE_VOLT[1]:     800mV        1050mV
VDDC_CURVE_SCLK[2]:     800Mhz       1820Mhz
VDDC_CURVE_VOLT[2]:     800mV        1050mV
azeam commented 4 years ago

The restrictive limits with the 5600 XT seem to apply under Windows as well. Did you try to increase the OverDrive limits above 1820

upp.py set --write OverDrive8Table/ODSettingsMax/0=1830 OverDrive8Table/ODSettingsMax/2=1830 OverDrive8Table/ODSettingsMax/4=1830 OverDrive8Table/ODSettingsMax/6=1830

and then overclock above 1820 in CoreCtrl/radeon-profile?

gardotd426 commented 4 years ago

No luck:

sudo upp.py set --write OverDrive8Table/ODSettingsMax/0=1830 OverDrive8Table/ODSettingsMax/2=1830 OverDrive8Table/ODSettingsMax/4=1830 OverDrive8Table/ODSettingsMax/6=1830
[sudo] password for matt:
Changing OverDrive8Table.ODSettingsMax.0 from 1820 to 1830 at 0x0e2
Changing OverDrive8Table.ODSettingsMax.2 from 1820 to 1830 at 0x0ea
Changing OverDrive8Table.ODSettingsMax.4 from 1820 to 1830 at 0x0f2
Changing OverDrive8Table.ODSettingsMax.6 from 1820 to 1830 at 0x0fa
Commiting changes to '/sys/class/drm/card0/device/pp_table'.
sudo cat /sys/class/drm/card0/device/pp_od_clk_voltage
OD_SCLK:
0: 800Mhz
1: 1780Mhz
OD_MCLK:
1: 875MHz
OD_VDDC_CURVE:
0: 800MHz @ 706mV
1: 1290MHz @ 738mV
2: 1780MHz @ 935mV
OD_RANGE:
SCLK:     800Mhz       1830Mhz
MCLK:     625Mhz        930Mhz
VDDC_CURVE_SCLK[0]:     800Mhz       1830Mhz
VDDC_CURVE_VOLT[0]:     800mV        1050mV
VDDC_CURVE_SCLK[1]:     800Mhz       1830Mhz
VDDC_CURVE_VOLT[1]:     800mV        1050mV
VDDC_CURVE_SCLK[2]:     800Mhz       1830Mhz
VDDC_CURVE_VOLT[2]:     800mV        1050mV

Then after setting the overclock to 1830 in radeon-profile:

sudo cat /sys/class/drm/card0/device/pp_od_clk_voltage
OD_SCLK:
0: 800Mhz
1: 1830Mhz
OD_MCLK:
1: 875MHz
OD_VDDC_CURVE:
0: 800MHz @ 800mV
1: 1290MHz @ 800mV
2: 1780MHz @ 942mV
OD_RANGE:
SCLK:     800Mhz       1830Mhz
MCLK:     625Mhz        930Mhz
VDDC_CURVE_SCLK[0]:     800Mhz       1830Mhz
VDDC_CURVE_VOLT[0]:     800mV        1050mV
VDDC_CURVE_SCLK[1]:     800Mhz       1830Mhz
VDDC_CURVE_VOLT[1]:     800mV        1050mV
VDDC_CURVE_SCLK[2]:     800Mhz       1830Mhz
VDDC_CURVE_VOLT[2]:     800mV        1050mV

But:

sudo radeonjet get core table
0: 300Mhz *
1: 300Mhz
2: 300Mhz

And that's confirmed, radeon-profile shows 300 as well (and again we've figured out that those are indeed accurate numbers already).

Also, that shouldn't even matter anyway because the issue was never that I couldn't overclock using upp or powerupp to anything above 1820, it's that you can't overclock with it at ALL. Remember, even 1781 causes it to error to 300 MHz. I was never trying to get it to go past 1820, I was just trying to get it to go past anything other than the stock 1780, which radeon-profile CAN do, and upp and powerupp can't.

gardotd426 commented 4 years ago

And just to confirm, I lowered the range in radeon-profile back down to 1820, and now the card is running at 1820:

sudo cat /sys/class/drm/card0/device/pp_od_clk_voltage
OD_SCLK:
0: 800Mhz
1: 1820Mhz
OD_MCLK:
1: 875MHz
OD_VDDC_CURVE:
0: 800MHz @ 800mV
1: 1290MHz @ 800mV
2: 1780MHz @ 942mV
OD_RANGE:
SCLK:     800Mhz       1830Mhz
MCLK:     625Mhz        930Mhz
VDDC_CURVE_SCLK[0]:     800Mhz       1830Mhz
VDDC_CURVE_VOLT[0]:     800mV        1050mV
VDDC_CURVE_SCLK[1]:     800Mhz       1830Mhz
VDDC_CURVE_VOLT[1]:     800mV        1050mV
VDDC_CURVE_SCLK[2]:     800Mhz       1830Mhz
VDDC_CURVE_VOLT[2]:     800mV        1050mV
sudo radeonjet get core table
0: 300Mhz
1: 1060Mhz
2: 1820Mhz *
azeam commented 4 years ago

Thanks. Yes, I know this is a different matter, I was just curious if it is possible to increase the OverDrive limits under Linux, but it seems to be the same as in Windows (I believe what you did know is what MorePowerTool does), as suspected.

I don't know for sure why it won't allow the pp table to be set above 1780 but the explanation by @sibradzic makes sense.

gardotd426 commented 4 years ago

My bad, I wasn't trying to insinuate that you like, didn't grasp it, it's just this has been such a long thread I didn't know if maybe it got lost in all the messages, and since as of late we've been trying all sorts of stuff it seemed like a possibility. I know you know what you're doing lol. I love Linux and open source, so I'm happy to try anything to help out, since the 5600 XT is so brand new and I'm probably one of the very few people that is using Linux, has a 5600 XT, AND is wanting to overclock, so I suppose in this instance I can actually be somewhat useful in my contributions, I just wish I knew more so I could try and help out more than I'm currently able to.

azeam commented 4 years ago

No worries, it's interesting to find out more about this card. It's a pity that it doesn't allow the full potential of the pp table, hopefully it will change in the future. I don't think I will add any workarounds by setting the clock in a different way in powerupp, at least for now. It would basically mean just as much hassle (if not more, by complicating the code maintenance and dependencies even for other cards, depending on implementation) as using some other software for setting the OverDrive clock frequency (and powerupp for the other things, to the extent they are adjustable), as is possible now.

If it would have been possible to increase the OverDrive limits it would have made more sense to do it, imho, but it seems like the OverDrive restrictions also apply to the pp table so it wouldn't add anything that is not possible to do with other software. I will add some of the information we've gathered in the readme at least. Closing this issue now but please let me know if there are any changes later on!

(On a totally unrelated note, I noticed in your initial screenshot that the memory dpm selection radiobuttons are not displayed on your system as intended. The positioning of certain GTK elements is for some reason different between different systems, and on your system the size of the radiobuttons are smaller than what they appear for me but I haven't figured out how to set it consistently yet. I'm opening an issue for that and will try to work something out).

gardotd426 commented 4 years ago

I would hold off on looking into that, I'm using i3 and it's probably i3's fault. If I remember correctly, when I was in Plasma it didn't do that. I'll log into a Plasma session at some point today and make sure, at which point you can chalk it up to tiling WM weirdness. i3 has trouble with windows that are supposed to be floating like that, sometimes even if you set them to float.