Closed h1z1 closed 3 years ago
Just noticed this, not sure if it's a bug in Corefreq or platform?
Not a bug but experimental with your Zen model: https://github.com/cyring/CoreFreq/blob/717e444e89caa0c74038f9c0d09ae68b725007cd/corefreqk.c#L5682
So these SMU addresses only work with Zen 2 & 3: https://github.com/cyring/CoreFreq/blob/717e444e89caa0c74038f9c0d09ae68b725007cd/corefreqk.c#L5618
Zen1: I believe TDP has to be queried on the PM table with mailbox protocol. Do you know any SMU single address to read your TDP from ?
What about the others: Min, Max, PPT, EDC, TDC: do you confirm values are also wrong ?
No idea tbh. Wasn't aware TDP could be read, thought it was inferred and even that varied with vendors, implementations and phases of the moon. AMD goes so far as to conflate Thermal TDP with Electrical. GN did an excellent job covering it iirc.
Hello,
Can you please show me what values you are getting from this project ?
Hi, sorry not sure why this didn't send a notification... hmm
Which values are you looking for, from syslog?
[1343745.127347] ryzen_smu: CPUID: family 0x17, model 0x1, stepping 0x1, package 0x7
[1343745.127917] ryzen_smu: SMU v4.25.118.0
[1343745.128734] ryzen_smu: SMU v4.25.118.0
[1343745.128759] sysfs: cannot create duplicate filename '/kernel/ryzen_smu_drv'
[1343745.128761] CPU: 8 PID: 19460 Comm: insmod Tainted: P O 5.4.70 #1
[1343745.128763] Hardware name: Gigabyte Technology Co., Ltd. X399 AORUS Gaming 7/X399 AORUS Gaming 7, BIOS F12 12/11/2019
/sys/kernel/ryzen_smu_drv/codename:05 /sys/kernel/ryzen_smu_drv/drv_version:0.1.0 /sys/kernel/ryzen_smu_drv/mp1_if_version:0 /sys/kernel/ryzen_smu_drv/version:4.25.118.0
Hi, sorry not sure why this didn't send a notification... hmm
Which values are you looking for, from syslog?
[1343745.127347] ryzen_smu: CPUID: family 0x17, model 0x1, stepping 0x1, package 0x7 [1343745.127917] ryzen_smu: SMU v4.25.118.0 [1343745.128734] ryzen_smu: SMU v4.25.118.0 [1343745.128759] sysfs: cannot create duplicate filename '/kernel/ryzen_smu_drv' [1343745.128761] CPU: 8 PID: 19460 Comm: insmod Tainted: P O 5.4.70 #1 [1343745.128763] Hardware name: Gigabyte Technology Co., Ltd. X399 AORUS Gaming 7/X399 AORUS Gaming 7, BIOS F12 12/11/2019
/sys/kernel/ryzen_smu_drv/codename:05 /sys/kernel/ryzen_smu_drv/drv_version:0.1.0 /sys/kernel/ryzen_smu_drv/mp1_if_version:0 /sys/kernel/ryzen_smu_drv/version:4.25.118.0
Because ryzen_smu implements the mailbox protocol, I'm interested in the TDP, PPT, EDC, TDC In fact the whole output from its monitor UI will be fine.
But apparently you are facing issues to start it up ?
Not sure what you mean by start up. Module appeared to load despite the error though none of the utils run...
# ./userspace/monitor_cpu
rd_buf: 0.1.0
PM Tables are not supported on this platform.
# ./scripts/test.py
Failed to write SMU arguments
# ./scripts/
cpuid.py dump_pm_table.py monitor_cpu.py __pycache__/ read_dump.py test.py
# ./scripts/monitor_cpu.py
PM Table: Unsupported
# ./scripts/dump_pm_table.py
PM Tables are not supported for this model of processor.
#
Edit: Just realized what you mean, some sysfs files did not in fact get created in ryazen_smu_drv. pm_table being one.
drwxr-xr-x 2 root root 0 May 19 01:27 .
drwxr-xr-x 14 root root 0 May 3 11:35 ..
-r-------- 1 root root 4096 May 19 01:27 codename
-r-------- 1 root root 4096 May 19 01:27 drv_version
-r-------- 1 root root 4096 May 19 01:27 mp1_if_version
-rw------- 1 root root 4096 May 19 01:27 mp1_smu_cmd
-rw------- 1 root root 4096 May 19 00:52 rsmu_cmd
-rw------- 1 root root 4096 May 19 01:27 smn
-rw------- 1 root root 4096 May 19 01:30 smu_args
-r-------- 1 root root 4096 May 19 01:27 version
Couple lines were truncated above too sigh
[1343745.127347] ryzen_smu: CPUID: family 0x17, model 0x1, stepping 0x1, package 0x7
[1343745.127917] ryzen_smu: SMU v4.25.118.0
[1343745.128734] ryzen_smu: SMU v4.25.118.0
[1343745.128759] sysfs: cannot create duplicate filename '/kernel/ryzen_smu_drv'
[1343745.128789] ryzen_smu_probe+0x114/0x390 [ryzen_smu]
[1343745.128812] ryzen_smu_driver_init+0x23/0x1000 [ryzen_smu]
[1343745.128841] kobject_add_internal failed for ryzen_smu_drv with -EEXIST, don't try to register things with the same name in the same directory.
[1343745.128843] ryzen_smu: Unable to create sysfs interface
[1343745.128846] ryzen_smu: probe of 0000:40:00.0 failed with error -12
Not sure what you mean by start up. Module appeared to load despite the error though none of the utils run...
# ./userspace/monitor_cpu rd_buf: 0.1.0 PM Tables are not supported on this platform. # ./scripts/test.py Failed to write SMU arguments # ./scripts/ cpuid.py dump_pm_table.py monitor_cpu.py __pycache__/ read_dump.py test.py # ./scripts/monitor_cpu.py PM Table: Unsupported # ./scripts/dump_pm_table.py PM Tables are not supported for this model of processor. #
Edit: Just realized what you mean, some sysfs files did not in fact get created in ryazen_smu_drv. pm_table being one.
drwxr-xr-x 2 root root 0 May 19 01:27 . drwxr-xr-x 14 root root 0 May 3 11:35 .. -r-------- 1 root root 4096 May 19 01:27 codename -r-------- 1 root root 4096 May 19 01:27 drv_version -r-------- 1 root root 4096 May 19 01:27 mp1_if_version -rw------- 1 root root 4096 May 19 01:27 mp1_smu_cmd -rw------- 1 root root 4096 May 19 00:52 rsmu_cmd -rw------- 1 root root 4096 May 19 01:27 smn -rw------- 1 root root 4096 May 19 01:30 smu_args -r-------- 1 root root 4096 May 19 01:27 version
Couple lines were truncated above too sigh
[1343745.127347] ryzen_smu: CPUID: family 0x17, model 0x1, stepping 0x1, package 0x7 [1343745.127917] ryzen_smu: SMU v4.25.118.0 [1343745.128734] ryzen_smu: SMU v4.25.118.0 [1343745.128759] sysfs: cannot create duplicate filename '/kernel/ryzen_smu_drv' [1343745.128789] ryzen_smu_probe+0x114/0x390 [ryzen_smu] [1343745.128812] ryzen_smu_driver_init+0x23/0x1000 [ryzen_smu] [1343745.128841] kobject_add_internal failed for ryzen_smu_drv with -EEXIST, don't try to register things with the same name in the same directory. [1343745.128843] ryzen_smu: Unable to create sysfs interface [1343745.128846] ryzen_smu: probe of 0000:40:00.0 failed with error -12
With my 3950X I have to force loading the software with option -f
make sure no other instance of its driver is running or any SMU software is already present
insmod ryzen_smu.ko
ryzen_smu: CPUID: family 0x17, model 0x71, stepping 0x0, package 0x2
./userspace/monitor_cpu -f
╭────────────────────────────────────────────────┬─────────────────────────────────────────────────╮
│ CPU Model │ AMD Ryzen 9 3950X 16-Core Processor │
│ Processor Code Name │ Matisse │
│ Cores │ 16 │
│ Core CCDs │ 2 │
│ Core CCXs │ 4 │
│ Cores Per CCX │ 4 │
│ SMU FW Version │ v46.63.0 │
│ MP1 IF Version │ v11 │
╰────────────────────────────────────────────────┴─────────────────────────────────────────────────╯
╭─────────┬────────────────┬─────────┬─────────┬─────────┬─────────────┬─────────────┬─────────────╮
│ Core 0 │ 1 MHz | 0.003 W | 0.200 V | 30.51 C | C0: 0.0 % | C1: 0.0 % | C6: 100.0 % │
│ Core 1 │ 0 MHz | 0.002 W | 0.200 V | 30.32 C | C0: 0.0 % | C1: 0.0 % | C6: 100.0 % │
│ Core 2 │ 3 MHz | 0.004 W | 0.201 V | 30.57 C | C0: 0.1 % | C1: 0.0 % | C6: 99.9 % │
│ Core 3 │ 1 MHz | 0.003 W | 0.200 V | 30.45 C | C0: 0.0 % | C1: 0.0 % | C6: 100.0 % │
│ Core 4 │ 3 MHz | 0.004 W | 0.201 V | 30.43 C | C0: 0.1 % | C1: 0.0 % | C6: 99.9 % │
│ Core 5 │ 5 MHz | 0.006 W | 0.201 V | 30.52 C | C0: 0.1 % | C1: 0.0 % | C6: 99.9 % │
│ Core 6 │ 0 MHz | 0.003 W | 0.200 V | 30.50 C | C0: 0.0 % | C1: 0.0 % | C6: 100.0 % │
│ Core 7 │ 1 MHz | 0.003 W | 0.200 V | 30.42 C | C0: 0.0 % | C1: 0.0 % | C6: 100.0 % │
│ Core 8 │ 2 MHz | 0.004 W | 0.201 V | 31.41 C | C0: 0.1 % | C1: 0.0 % | C6: 99.9 % │
│ Core 9 │ 2 MHz | 0.008 W | 0.200 V | 31.59 C | C0: 0.1 % | C1: 0.0 % | C6: 99.9 % │
│ Core 10 │ 1 MHz | 0.004 W | 0.200 V | 31.40 C | C0: 0.0 % | C1: 0.0 % | C6: 100.0 % │
│ Core 11 │ 0 MHz | 0.002 W | 0.200 V | 31.41 C | C0: 0.0 % | C1: 0.0 % | C6: 100.0 % │
│ Core 12 │ 2 MHz | 0.002 W | 0.201 V | 31.56 C | C0: 0.1 % | C1: 0.0 % | C6: 99.9 % │
│ Core 13 │ 26 MHz | 0.059 W | 0.204 V | 31.77 C | C0: 0.6 % | C1: 0.1 % | C6: 99.4 % │
│ Core 14 │ 2 MHz | 0.003 W | 0.201 V | 31.61 C | C0: 0.1 % | C1: 0.0 % | C6: 99.9 % │
│ Core 15 │ 12 MHz | 0.013 W | 0.202 V | 31.63 C | C0: 0.3 % | C1: 0.1 % | C6: 99.7 % │
╰─────────┴────────────────┴─────────┴─────────┴─────────┴─────────────┴─────────────┴─────────────╯
╭────────────────────────────────────────────────┬─────────────────────────────────────────────────╮
│ Peak Core Frequency │ 26 MHz │
│ Peak Temperature │ 45.25 C │
│ Package Power │ 16.8100 W │
│ Peak Core(s) Voltage │ 0.255874 V │
│ Average Core Voltage │ 0.200765 V │
│ Package CC6 │ 92.028168 % │
│ Core CC6 │ 99.890877 % │
╰────────────────────────────────────────────────┴─────────────────────────────────────────────────╯
╭────────────────────────────────────────────────┬─────────────────────────────────────────────────╮
│ Thermal Junction Limit │ 95.00 C │
│ Current Temperature │ 32.13 C │
│ SoC Temperature │ 28.75 C │
│ Core Power │ 0.1460 W │
│ SoC Power │ 8.3634 W | 7.6601 A | 1.092350 V │
│ PPT │ 16.8060 W | 142 W | 11.84 % │
│ TDC │ 0.1376 A | 95 A | 0.14 % │
│ EDC │ 0.1376 A | 140 A | 0.10 % │
│ FIT Limit │ 0.243201 % │
╰────────────────────────────────────────────────┴─────────────────────────────────────────────────╯
╭────────────────────────────────────────────────┬─────────────────────────────────────────────────╮
│ Coupled Mode │ ON │
│ Fabric Clock (Average) │ 293 MHz │
│ Fabric Clock │ 1833 MHz │
│ Uncore Clock │ 1833 MHz │
│ Memory Clock │ 1833 MHz │
│ VDDCR_Mem │ 6.5007 W │
│ VDDCR_SoC │ 1.1000 V │
│ cLDO_VDDM │ 0.9504 V │
│ cLDO_VDDP │ 0.9976 V │
│ cLDO_VDDG │ 1.0477 V │
╰────────────────────────────────────────────────┴─────────────────────────────────────────────────╯
Power, Current & Thermal
|- Junction Temperature TjMax [49: 95C]
|- Digital Thermal Sensor DTS [Capable]
|- Power Limit Notification PLN [Missing]
|- Package Thermal Management PTM [Missing]
|- Thermal Monitor 1 TTP [Capable]
|- Thermal Monitor 2 HTC [Capable]
|- Thermal Design Power TDP [ 105 W]
|- Minimum Power Min [ 105 W]
|- Maximum Power Max [ 105 W]
|- Package Power Tracking PPT [ 142 W]
|- Electrical Design Current EDC [ 140 A]
|- Thermal Design Current TDC [ 95 A]
|- Units
|- Power watt [ 0.125000000]
|- Energy joule [ 0.000015259]
|- Window second [ 0.000976562]
No change with -f. Doesn't appear that supports Threadripper? There is no case for it in smu_init
No change with -f. Doesn't appear that supports Threadripper? There is no case for it in smu_init
- I'm finding two
CODENAME_THREADRIPPER
cases of SMU interface detection:
kobject_add_internal failed for ryzen_smu_drv with -EEXIST, don't try to register things with the same name in the same directory.
ryzen_smu: Unable to create sysfs interface
Presuming "ryzen_smu_drv"
was uncleanly unloaded (kernel reboot may be required)
There are various reasons which lead to those error messages.
Perhaps, you may have a better chance with the original ryzen_smu project ?
* [`CODENAME_THREADRIPPER`](https://gitlab.com/CyrIng/ryzen_smu/-/blob/smu_debug/smu.c#L212)
Neither of the references I could find actually initialize it though: int smu_resolve_cpu_class
smu_init has no case:
switch (g_smu.codename) {
case CODENAME_SUMMITRIDGE:
case CODENAME_THREADRIPPER:
case CODENAME_PINNACLERIDGE:
g_smu.addr_rsmu_mb_cmd = 0x3B1051C;
g_smu.addr_rsmu_mb_rsp = 0x3B10568;
g_smu.addr_rsmu_mb_args = 0x3B10590;
goto LOG_RSMU;
Hello,
Latest develop
branch is bringing the HSMP mailbox which is specified for the AMD Family 19h Model 01h
Fortunately it works with my 3950X
. See this discussion.
I would like to know if it works with your Threadripper ?
Hi
Would depend on what you mean by works :) TDP is reported as missing now
FWIW AMD as part of their boosting intentionally over reports temperatures so the OS/BIOS preemtively increases cooling.
Thermal Design Power Platform [Disable]
Power Limit PL1 [Missing]
Power Limit PL2 [Missing]
Package Power Tracking PPT [Missing]
Electrical Design Current EDC [Missing]
Thermal Design Current TDC [Missing]
Sorry I forgot to add you need to start the driver with the Experimental mode.
insmod corefreqk.ko Experimental=1
Eeep thought I did but nope. Sorry.
Values still appear backward though
I really don't find any good registers for Ryzen, Threadripper, EPYC of generations Zen1 & Zen+
I was hoping that the HSMP protocol to the SMU will provide a universal mean to query those Power values. Unfortunately, HSMP is specified within the Zen3 PPR documentation. I'm lucky it works with Zen2/Matisse, however I still have not received any report from CoreFreq Users, owner of others APU, TR, EPYC generation 2 and 3
It's up to you to close or not this issue, but I have to admit that I'm facing the limits of the AMD public specifications.
I'm still searching on the subject: do you see any option in your BIOS to activate HSMP or names like BMC ?
According to this readme https://github.com/AMDESE/libhsmp/commit/a3bb6efac327645a18835b2efcc542a12d49012c can you confirm the HSMP state ?
I don't see anything about those but then again it's not exactly a server board (especially wrt the BMC).
I have updated BIOS to 3703 and I now see a value change in PL2
Which motherboard is that with?
Which motherboard is that with?
This 395OX setup https://github.com/cyring/CoreFreq/wiki/CoreFreq-Lab
I tried today Ryzen Master (Windows) and learned from it that 395W
is the Max power PPT
This confirmed for my 3950X
that PL2
= 395 W
is the value read from HSMP when PPT
is left as AUTO into BIOS.
Your WhiteHaven
should have other Max value(s) when PPT
is AUTO. You may read this using Ryzen Master. Please let me know what you get ?
Just noticed this, not sure if it's a bug in Corefreq or platform?
I have noticed with PBO enabled and Power Limits as AUTO that my 3950X is showing some specific constant values:395W
for PL1
and PL2
I wonder if you are getting the same behavior but with different constants ?
HSMP is now a per model feature.
Still haven't had a chance to install windows on these boards to test sorry. With latest head though they appear the same ?
Was chasing some crashes that I"m not entirely sure are directly related to corefreq or not but I can confirm at least that no amount of disabling c6/power states appear to have any impact. Values are still inverted shrug
The other crashes happen with kernel tracing and corefreq. It's unfortunately a hard lock and only appears to happen in X. Maybe a race somewhere
Was chasing some crashes that I"m not entirely sure are directly related to corefreq or not but I can confirm at least that no amount of disabling c6/power states appear to have any impact. Values are still inverted shrug
The other crashes happen with kernel tracing and corefreq. It's unfortunately a hard lock and only appears to happen in X. Maybe a race somewhere
I have been chasing an incompatibility with kdump
See Wiki if it is part of the issue.
Adding crashkernel=0
in the boot command line had solved the problem; but I've not found its root cause.
perf top
is also killed if started after CoreFreq, and not the reverse.
selinux
is also a no go with CoreFreq
nmi_watchdog
and fixed performance counters can not be re-enabled
About the inverted values, HSMP is not doing great with Zen v1 families. I'm developing the RMI protocol for Naples but not sure if it is available in Threadripper ?
Don't think so from what I gather of varorum and a tool from amd. NGL, Corefreq runs circles around both. https://github.com/amd/esmi_oob_library looks neat too but limited.
Don't think so from what I gather of varorum and a tool from amd. NGL, Corefreq runs circles around both. https://github.com/amd/esmi_oob_library looks neat too but limited.
Can you plz post the output of what you're getting from those software ?
Just noticed this, not sure if it's a bug in Corefreq or platform?