FlyGoat / RyzenAdj

Adjust power management settings for Ryzen APUs
GNU Lesser General Public License v3.0
1.69k stars 123 forks source link

tctl-temp reported 0.000 on --info and on HWiNFO #151

Closed bagusnl closed 3 years ago

bagusnl commented 3 years ago

Hi,

I am currently using Ryzen 3 3200U and RyzenAdj 0.8.1. When I do ryzenadj -i the tctl-temp limit value is shown correctly at 95.000 but the actual value is 0.000 causing the TCTL Temp reporting on HWiNFO to be 0.000 as well.

.\ryzenadj -i
CPU Family: Picasso
SMU BIOS Interface Version: 8
Version: v0.8.1
PM Table Version: 1e0004
Name Value Paramter
STAPM LIMIT 25.000 stapm-limit
STAPM VALUE 10.092
PPT LIMIT FAST 21.000 fast-limit
PPT VALUE FAST 7.930
PPT LIMIT SLOW 25.000 slow-limit
PPT VALUE SLOW 9.829
StapmTimeConst 200.000 stapm-time
SlowPPTTimeConst 60.000 slow-time
PPT LIMIT APU -nan(ind) apu-slow-limit
PPT VALUE APU -nan(ind)
TDC LIMIT VDD 45.000 vrm-current
TDC VALUE VDD 1.809
TDC LIMIT SOC 15.000 vrmsoc-current
TDC VALUE SOC 3.307
EDC LIMIT VDD 55.000 vrmmax-current
EDC VALUE VDD 37.494
EDC LIMIT SOC 20.000 vrmsocmax-current
EDC VALUE SOC 5.233
THM LIMIT CORE 95.000 tctl-temp
THM VALUE CORE 0.000
STT LIMIT APU -nan(ind) apu-skin-temp
STT VALUE APU -nan(ind)
STT LIMIT dGPU -nan(ind) dgpu-skin-temp
STT VALUE dGPU -nan(ind)

Full power table dump provided below

dump-table.log

HWiNFO screenshot: image

edit: added HWiNFO sensor screenshot

Falcosc commented 3 years ago
I will take one of the other values Offset Data Value
0x0050 0x42BE0000 95.000
0x0054 0x00000000 0.000
0x0058 0x42BE0000 95.000
0x005C 0x423CFCBA 47.247
0x0060 0x42BE0000 95.000
0x0064 0x00000000 0.000
0x0068 0x42BE0000 95.000
0x006C 0x423F23EF 47.785
0x0070 0x42BE0000 95.000
0x0074 0x423BABFA 46.918
0x0078 0x42BE0000 95.000
0x007C 0x4238BF01 46.187
0x0080 0x42BE0000 95.000
0x0084 0x423F0E1C 47.764
bagusnl commented 3 years ago

I will take one of the other values

Offset Data Value 0x0050 0x42BE0000 95.000 0x0054 0x00000000 0.000 0x0058 0x42BE0000 95.000 0x005C 0x423CFCBA 47.247 0x0060 0x42BE0000 95.000 0x0064 0x00000000 0.000 0x0068 0x42BE0000 95.000 0x006C 0x423F23EF 47.785 0x0070 0x42BE0000 95.000 0x0074 0x423BABFA 46.918 0x0078 0x42BE0000 95.000 0x007C 0x4238BF01 46.187 0x0080 0x42BE0000 95.000 0x0084 0x423F0E1C 47.764

I see, those values looks about right for my current temp. But my concern is that the tctl-max value is not set correctly because I experience massive frequency drops/lags at random times. Here are my current config

    adjust "fast_limit" 20000
    adjust "slow_limit" 25000
    adjust "slow_time" 60
    adjust "tctl_temp" 95
    adjust "vrm_current" 45000
    adjust "vrmmax_current" 55000
    adjust "vrmsoc_current" 15000
    adjust "vrmsocmax_current" 20000
    adjust "stapm_limit" 20000
bagusnl commented 3 years ago

The issue seems to be something set the value of vrm_current to 0.000 when tctl reported by HWiNFO goes to 75 C

CPU Family: Picasso
SMU BIOS Interface Version: 8
Version: v0.8.1
PM Table Version: 1e0004
|       Name       |   Value   |      Paramter      |
|------------------|-----------|--------------------|
| STAPM LIMIT      |    20.000 | stapm-limit        |
| STAPM VALUE      |    12.830 |                    |
| PPT LIMIT FAST   |    20.000 | fast-limit         |
| PPT VALUE FAST   |     4.301 |                    |
| PPT LIMIT SLOW   |    25.000 | slow-limit         |
| PPT VALUE SLOW   |    13.336 |                    |
| StapmTimeConst   |   200.000 | stapm-time         |
| SlowPPTTimeConst |    60.000 | slow-time          |
| PPT LIMIT APU    | -nan(ind) | apu-slow-limit     |
| PPT VALUE APU    | -nan(ind) |                    |
| TDC LIMIT VDD    |     0.000 | vrm-current        |
| TDC VALUE VDD    |     1.306 |                    |
| TDC LIMIT SOC    |    15.000 | vrmsoc-current     |
| TDC VALUE SOC    |     1.646 |                    |
| EDC LIMIT VDD    |    55.000 | vrmmax-current     |
| EDC VALUE VDD    |     2.915 |                    |
| EDC LIMIT SOC    |    20.000 | vrmsocmax-current  |
| EDC VALUE SOC    |     2.003 |                    |
| THM LIMIT CORE   |    95.000 | tctl-temp          |
| THM VALUE CORE   |     0.000 |                    |
| STT LIMIT APU    | -nan(ind) | apu-skin-temp      |
| STT VALUE APU    | -nan(ind) |                    |
| STT LIMIT dGPU   | -nan(ind) | dgpu-skin-temp     |
| STT VALUE dGPU   | -nan(ind) |                    |
Falcosc commented 3 years ago

I had Prochot issues on some workloads. I had to go down to 93°C

But your TDC LIMIT VDD goes to 0A discovery is new for me and most likely not prochot related if you have it at 75°C

Maybe this is the thing which is limiting some people.

Is your Thermal Throttle PROCHOT active during this 0A state, just to double check? How many seconds are you limited at 0A? Maybe it makes sense to check if limit is 0A and try to overwrite it.

bagusnl commented 3 years ago

PROCHOT readings doesn't seems to be active during 0A state, but it just might be because the CPU throttles so hard HWiNFO fails to read the value because the system is basically almost freezed during 0A state. The 0A state seems to active for 10s to 20s

bagusnl commented 3 years ago

I tried to dump table when it was lagging, but it kinda failed because the lag affects how long it takes to read the pmtable. The lag was gone half way reading it. tabledump-half-lag.txt

Falcosc commented 3 years ago

The PowerShell script is tuned to use as few cycles as possible because it initialize ryzenAdj only once. In contrast, the known GUI tools do initialize ryzenAdj on each call. You can use the faster powershell script it to set your affected current value in a loop.

If your system is setting the 0A only once every X seconds, you may can workaround by overwriting it.

bagusnl commented 3 years ago

Yeah, I'm currently using the powershell script with inpoutx64.dll deleted due to Anti-Cheat issue, I'm checking the power table from different directory containing the same RyzenAdj version.

I'm not the best on PowerShell, might need help to set it up. Is it on line 33 ? The current string is this $Script:repeatWaitTimeSeconds = 1 haven't changed anything on the powershell script other than the ones for the config.

EDIT: I changed line 26 to $monitorField = "vrm_current" I hope that will helps it, will give updates

Falcosc commented 3 years ago

No, something else is setting 0A every X secounds

X is unknown, you have to find this out by testing it.

$Script:repeatWaitTimeSeconds = 1 does only control how often the script does its work

Using the monitorField function is already a good start, but maybe it is too slow, because each time you use it, it will set your whole configuration. Your 9 Adjustment calls are translated to about 50 calls to your CPU power management.

I guess you can improve your results by setting the current limit first, so move it on top of your configuration

bagusnl commented 3 years ago

Alright, I have set the config. It does give me a warning about the vrm_current not set on my profile so I set the adjust also on the Battery profile. Here are my current configuration:

################################################################################
#### Configuration Start
################################################################################
# WARNING: Use at your own risk!

$pathToRyzenAdjDlls = Split-Path -Parent $PSCommandPath #script path is DLL path, needs to be absolut path if you define something else

$showErrorPopupsDuringInit = $true
# debug mode prints adjust success messages too instead of errorss only
$debugMode = $true
# if monitorField is set, this script does only adjust values if something did revert your monitored value. Clear monitorField String to disable monitoring
# This needs to be an value which actually gets overwritten by your device firmware/software if no changes get detected, your settings will not reapplied
$monitorField = "vrm_current"
# Does reapply adjustments if power slider did change position, check $Script:acSlider or $Script:dcSlider to apply slider specific values
$monitorPowerSlider = $true
# HWiNFO needs to be restartet after this script did run the first time with this option
$updateHWINFOSensors = $true

function doAdjust_ACmode {
    $Script:repeatWaitTimeSeconds = 1   #only use values below 5s if you are using $monitorField
    adjust "vrm_current" 45000
    adjust "fast_limit" 20000
    adjust "slow_limit" 25000
    adjust "slow_time" 60
    adjust "tctl_temp" 95
    adjust "vrmmax_current" 55000
    adjust "vrmsoc_current" 15000
    adjust "vrmsocmax_current" 20000
    adjust "stapm_limit" 20000
    #adjust "<any_other_field>" 1234

    #custom code, for example set fan controll back to auto
    #values (WriteRegister: 47, FanSpeedResetValue:128) extracted from similar devices at https://github.com/hirschmann/nbfc/blob/master/Configs/
    #Start-Process -NoNewWindow -Wait -filePath "C:\Program Files (x86)\NoteBook FanControl\ec-probe.exe" -ArgumentList("write", "47", "128")

    if($Script:acSlider -eq $Script:betterBattery){
        #put adjustments for energie saving slider position here:
        enable "power_saving" #add 10s boost delay for usage on cable to reduce idle power consumtion
    }
}

function doAdjust_BatteryMode {
    $Script:repeatWaitTimeSeconds = 10   #do less reapplies and less HWiNFO updates to save power
    adjust "vrm_current" 40000
    adjust "fast_limit" 26000
    adjust "slow_limit" 10000
    #adjust "<any_other_field>" 1234

    if($Script:dcSlider -eq $Script:betterBattery){
        #put adjustments for energie saving slider position here: for example disable fan to save power
        #Start-Process -NoNewWindow -Wait -filePath "C:\Program Files (x86)\NoteBook FanControl\ec-probe.exe" -ArgumentList("write", "47", "0")
    }

    if($Script:dcSlider -eq $Script:bestPerformance){
        #put adjustments for highest performance slider position here:
        enable "max_performance" #removes 10s boost delay on battery
        doAdjust_ACmode #set limits from cable mode on battery
    }
}
################################################################################
#### Configuration End
################################################################################
bagusnl commented 3 years ago

Wait, question. Since I deleted inpoutx64.dll is it gonna affect the monitorField function?

Falcosc commented 3 years ago

Yes, it doesn't monitor anymore 😄 I did update the FAQ https://github.com/FlyGoat/RyzenAdj/wiki/FAQ#inpoutx64-got-blocked-by-anti-cheat-software

Falcosc commented 3 years ago

If you don't mind the overhead, you can skip monitoring and spam your cpu power management with all your adjustments each second. You need to disable monitorPowerSlider as well because only If all monitoring are disabled, it will adjust every time

The output will tell what it is doing, it is either monitoring every seconds or applying every seconds.

bagusnl commented 3 years ago

I see, at this point I think there is no other option than doing that since its either having a risk of getting hardlocked during game/working due to 0A state, or can't play some games because inpout dll.

I don't see too much CPU usage from either powershell or ryzenadj calls so I think its okay for now. Still kind of upset that the 0A state always happens on 75C regardless of tctl_temps overwrite though, maybe there is a way to disable that?

Falcosc commented 3 years ago

If you did already confirm that the same issue happens on a Linux live system booted from USB, then you need a bios mod.

If this problem doesn't happen on Linux, you could try to find the windows tool or driver which is causing this. But you don't have any clues, you would need to blindly disable stuff. For that reason, a Linux Live System should be used to test if it is related to software or firmware.

Let me know if this workaround does work at all. I would like to add it somewhere in the wiki. What kind of Device do you use? If you really want to, you can try to improve your workaround by applying it twice per second. But you should only set your current limit and not all settings. Doing 50 related writes every second to the power management for your 9 adjustments is already beyond the designed purpose of this interface.

If you can confirm this issue on Linux, too. And if you could share some results of the workaround. I may think about how we could add it for other users of the same device.

bagusnl commented 3 years ago

I did not recall installing anything that could change the value other than basic AMD Drivers, for both GPU and Chipset (including Ryzen Power Plant) which I don't think can cause any issue.

For the workaround I think I'll do that but with 2 scripts, one with only vrm_current that applies every 0.5 secs and one with the rest of the settings that only applies on boot.

For testing on linux, I'll work on it. Might take awhile since I'm fairly full on my time to do something else, but I'll try to find time to do it.

My laptop is Acer Aspire A315-41-R69L AMD Ryzen 3200U iGPU Vega 3 2GB default 16GB RAM 2x8GB (13.9 GB usable) 240GB M.2 SATA SSD + 1TB HDD

Falcosc commented 3 years ago

Be careful with using 2 scripts, this could hit a known issue: https://github.com/FlyGoat/RyzenAdj/issues/138

One of 1000 calls could fail if both run at the same time, pretty slim odds, but still possible.

bagusnl commented 3 years ago

Thank you for making me aware of that issue, I might use powershell script for the vrm_current and AATU for everything else for the sake of simplicity and easier to maintain at the moment, so if anything fails to apply I can reapply it fairly easily.

Just kind of a random question though. Is it possible to get something like Easy Anti Cheat to allow this exact software to use the inpout driver by lets say contacting them? Since RyzenAdj (and its derrivatives) is starting to get more popular as more AMD Ryzen laptops making its way into costumers

Falcosc commented 3 years ago

Yes, it is possible to get on a whitelist, but this is maybe release dependent.

Mixing AATU and Powershell RyzenAdj calls has the same 1 of 1000 calls does fail issue ;)

Falcosc commented 3 years ago

@bagusnl please have a look on https://github.com/FlyGoat/RyzenAdj/actions/runs/832192125 and check if temp reading does work.

bagusnl commented 3 years ago

@bagusnl please have a look on https://github.com/FlyGoat/RyzenAdj/actions/runs/832192125 and check if temp reading does work.

Temp reading looks okay on that artifact

 .\ryzenadj -i
CPU Family: Picasso
SMU BIOS Interface Version: 8
Version: v0.8.2
PM Table Version: 1e0004
|       Name       |   Value   |      Paramter      |
|------------------|-----------|--------------------|
| STAPM LIMIT      |    20.000 | stapm-limit        |
| STAPM VALUE      |     9.223 |                    |
| PPT LIMIT FAST   |    20.000 | fast-limit         |
| PPT VALUE FAST   |    11.155 |                    |
| PPT LIMIT SLOW   |    25.000 | slow-limit         |
| PPT VALUE SLOW   |     9.176 |                    |
| StapmTimeConst   |   200.000 | stapm-time         |
| SlowPPTTimeConst |    60.000 | slow-time          |
| PPT LIMIT APU    | -nan(ind) | apu-slow-limit     |
| PPT VALUE APU    | -nan(ind) |                    |
| TDC LIMIT VDD    |    45.000 | vrm-current        |
| TDC VALUE VDD    |     1.902 |                    |
| TDC LIMIT SOC    |    15.000 | vrmsoc-current     |
| TDC VALUE SOC    |     1.087 |                    |
| EDC LIMIT VDD    |    55.000 | vrmmax-current     |
| EDC VALUE VDD    |    37.219 |                    |
| EDC LIMIT SOC    |    20.000 | vrmsocmax-current  |
| EDC VALUE SOC    |     2.001 |                    |
| THM LIMIT CORE   |    95.000 | tctl-temp          |
| THM VALUE CORE   |    47.158 |                    |
| STT LIMIT APU    | -nan(ind) | apu-skin-temp      |
| STT VALUE APU    | -nan(ind) |                    |
| STT LIMIT dGPU   | -nan(ind) | dgpu-skin-temp     |
| STT VALUE dGPU   | -nan(ind) |                    |
Falcosc commented 3 years ago

@bagusnl please interrupt your work on the workarround script. Somebody with a similar issue shared the solution.

You need to install nitrosense https://global-download.acer.com/GDFiles/Application/Nitro%20Sense/Nitro%20Sense_Acer_3.01.3016_W10x64_A.zip?acerid=637399032918120303&Step1=NOTEBOOK&Step2=NITRO&Step3=NITRO%20AN515-43&OS=ALL&LC=en&BC=ACER&SC=PA_6

After installation you should have access to a power plan named "Balanced (Acer Optimized)"

This plan does fix the Limit issue according to 2 other users.

Please give us feedback, because the 2 other users didn't confirm the 0A limit, so I am not sure if this does solve your issue.

You may not be able to install the application, but if you are lucky, somebody will share the power plan for you :)

Falcosc commented 3 years ago

Somebody was so kind to share the plan for you

Acer Powerplan from Nitrosense.zip

https://winaero.com/export-import-power-plan-windows-10/

This user did share it: https://discord.com/channels/772105072720871435/772118974640685076/841991393123893248

bagusnl commented 3 years ago

That looks cool, will give it a try. Thanks!

On Wed, May 12, 2021, 17:56 Falcosc @.***> wrote:

Somebody was so kind to share the plan for you

Acer Powerplan from Nitrosense.zip https://github.com/FlyGoat/RyzenAdj/files/6465677/Acer.Powerplan.from.Nitrosense.zip

https://winaero.com/export-import-power-plan-windows-10/

This user did share it: https://discord.com/channels/772105072720871435/772118974640685076/841991393123893248

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/FlyGoat/RyzenAdj/issues/151#issuecomment-839677248, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGWHM5MWDBTCGVOZOQLQJ3LTNJNE5ANCNFSM44UH6RNQ .

Falcosc commented 3 years ago

You should try it without adjustments after a fresh reboot first. Maybe your tunings does trigger some failsafe method from acer.

bagusnl commented 3 years ago

The thing is that, it only happens on certain workload. Last time I tested it, I need my friend to stream at 1080p60 on discord and I watch, then it would randomly just happen. I'll try and look if I can make my friend streams again so I can test it.

I just so happen to changed my thermal paste today to much better one (CM MasterGel Maker) so I might have some troubles triggering it again. But we'll see