Closed LexJackson closed 5 years ago
Hello,
Thanks a lot for trying CoreFreq Those are the very first results of the Opteron architectures I'm seeing
I have to digg into specs to find the thermal registers (and btw voltage id)
I'm also noticing that the base clock estimation is pretty low compared to a factory 100 MHz
I'm also wondering if this architecture is capable with energy counters (current, power), such as the Intel RAPL registers ?
MC and DRAM will also be a subject of work. I don't believe you get something from the memory controller view
Can you list the PCI ids:
lspci - nn
Regards CyrIng
Thank you for the rapid reply!
Here the output requested.
Base clock i'm unsure about but the CPU's tend to clock pretty low when idle. Here's an output of cat /proc/cpuinfo | grep MHz. They get pretty sleepy.
Also adding, looks like detailed temp per core can be obtained. I'm using Turion Power Control to get this info. (tpc -temp). I believe it's using the cpuid kernel module for that info.
Hello,
For your testings, new code is available to read the processor temperature.
Remarks:
(sensor x 5) / 40
Thanks for the work! For some reason I'm not seeing any temp data now.
I loaded corefreq with: insmod /var/lib/dkms/corefreqk/1.39/4.20.0-arch1-1-ARCH/x86_64/module/corefreqk.ko Experimental=1 systemctl start corefreqd
Here's what I'm seeing (currently running an x265 encode btw)
Can you try new code ?
You should also read Bulldozer/Piledriver
as the architecture name.
Sorry, I thought I did, did a git clone on the repository and built it. I may have done something wrong. So sorry.
In the working directory, just git pull
to get last source code then make clean all
to rebuild
Will building from this PKGBUILD not pull the latest code?
https://aur.archlinux.org/cgit/aur.git/tree/PKGBUILD?h=corefreq-git
That's how I've been installing it.
The PKGBUILD may pull the latest and rebuild withmakepkg -sif
; however for our testing session, which implies many small changes, I recommend to clone the source code directly from the github then pull and rebuild whenever I notify a new push.
make clean all
Using the latest code, please post screenshots of temperature b/c I need to know if the SMU queries are working? Thks
OK done. Still not seeing temperature data for some reason. Happy to keep testing, please let me know what I can do to help. THANK YOU!
If you do confirm that the Experimental mode has been activated prior reading temperature ?
insmod corefreqk.ko Experimental=1
So I need to debug:
Can you edit corefreqk.c
and replace these 2 functions with the code bellow
https://github.com/cyring/CoreFreq/blob/9142c6a404d5b128ec64bc77f7727252675862d1/corefreqk.c#L5382
#define Core_AMD_SMU_Thermal(Core, TctlRegister, \
SMU_IndexRegister, \
SMU_DataRegister) \
({ \
TCTL_REGISTER TctlSensor = {0}; \
\
WRPCI(TctlRegister, SMU_IndexRegister) ; \
RDPCI(TctlSensor, SMU_DataRegister); \
\
Core->PowerThermal.Sensor = TctlSensor.CurTmp; \
\
printk(KERN_INFO "CoreFreq[%d]: PowerThermal.Sensor[%d]\n", \
Core->Bind, Core->PowerThermal.Sensor) ; \
})
void Core_AMD_Family_15h_Temp(CORE *Core)
{
Core_AMD_SMU_Thermal(Core, SMU_AMD_THM_TCTL_REGISTER_F15H,
SMU_AMD_INDEX_REGISTER_F15H,
SMU_AMD_DATA_REGISTER_F15H);
printk(KERN_INFO "CoreFreq[%d]: Experimental[%d] " \
"SMU_AMD_THM_TCTL_REGISTER_F15H[%X] " \
"SMU_AMD_INDEX_REGISTER_F15H[%X] " \
"SMU_AMD_DATA_REGISTER_F15H[%X]\n",
Core->Bind,
Proc->Registration.Experimental,
SMU_AMD_THM_TCTL_REGISTER_F15H,
SMU_AMD_INDEX_REGISTER_F15H,
SMU_AMD_DATA_REGISTER_F15H);
if (Proc->Registration.Experimental) {
printk(KERN_INFO "CoreFreq[%d]: AdvPower.EDX.TTP[%d]\n",
Core->Bind,
Proc->Features.AdvPower.EDX.TTP);
if (Proc->Features.AdvPower.EDX.TTP == 1) {
THERMTRIP_STATUS ThermTrip = {0};
WRPCI( SMU_AMD_THM_TRIP_REGISTER_F15H,
SMU_AMD_INDEX_REGISTER_F15H);
RDPCI(ThermTrip, SMU_AMD_DATA_REGISTER_F15H);
Core->PowerThermal.Events = ThermTrip.SensorTrip << 0;
printk(KERN_INFO "CoreFreq[%d]: PowerThermal.Events[%d]\n",
Core->Bind, Core->PowerThermal.Events);
}
}
}
Then please rebuild, load the driver, dmesg
and post all lines starting with CoreFreq
Here's the confirmation on the Experimental=1 flag and result.
I am compiling the new code now. Thanks!
Can you replace and try with the following functions:
corefreqk.c
void Core_AMD_Family_15h_Temp(CORE *Core)
{
TCTL_REGISTER TctlSensor = {0};
RDPCI(TctlSensor, PCI_CONFIG_ADDRESS(0, 0x18, 0x3, 0xa4));
Core->PowerThermal.Sensor = TctlSensor.CurTmp;
if (Proc->Features.AdvPower.EDX.TTP == 1) {
THERMTRIP_STATUS ThermTrip = {0};
RDPCI(ThermTrip, PCI_CONFIG_ADDRESS(0, 0x18, 0x3, 0xe4));
Core->PowerThermal.Events = ThermTrip.SensorTrip << 0;
}
}
coretypes.h
#define COMPUTE_THERMAL_AMD_15h(Temp, Param, Sensor) \
(Temp = (Sensor * 5 / 40) - 49)
Looks like it's showing something.. Not sure what. ;-)
dmesg | grep Core [ 0.969195] ACPI: Core revision 20181003 [ 4.976436] systemd[1]: Listening on Process Core Dump Socket. [ 9920.687007] CoreFreq(31:-1): Processor [ 6F_02] Architecture [Bulldozer/Piledriver] CPU [32/32]
It's showing a negative value, please rollback to the formula bellow.
#define COMPUTE_THERMAL_AMD_15h(Temp, Param, Sensor) \
(Temp = Sensor * 5 / 40)
You should get ~ 35 C
Looks like that worked! What do I need to tweak to get all CPU's reporting? ;-) Thanks for your help!
According to the datasheet, it is a sensor per socket. To my understanding, no temperature per Core. But is your proc dual sockets, thus we could read two sensors.
Can you print the topology discovered by CoreFreq
corefreq-cli -m
I'm only seeing temp data for one of the 32 CPU's in that case shouldn't at least two temps be displayed since there are 2 CPU's?
So far only one sensor is queried (on the service thread; which is driven by the CPU wirh an enlightened number in the UI)
The remaining work consists in improving the AMD topology to make the difference between the two sockets: Node ID, Pkg, Module, Core and so on. Then, set a CPU affinity of the service thread to the each socket. Finally collect the PCI sensor from the determined CPUs.
Perfect, thank you very much for your help!
Btw do you know the size of the caches? The L3 looks wrong. (due to a unit change between Zen and previous architectures)
Looks like:
According to AMD:
Total L1 Cache 48KB Total L2 Cache 16MB Total L3 Cache 16MB
I have pushed the source code written above. Remark: Experimental mode not required.
Thanks for working on it!! Looking good so far.
I'm able to see temps pretty well with tpc for now. Oh, and I have 4 CPU's now (as of yesterday)
Amazing setup !
64 CPU SMT threads is so far the CoreFreq limit.
I see a zero minimum temp history issue in the corefreq-cli screenshot
What to understand from the Turion Power screenshot : the temperature granularity is at least per Node ?
Can you show the full topology with the 4 processors
Yes with the Turion screenshot I was simply showing that it's possible to see per node temp. How would you like me to show the topology? Happy to do it.
Yes with the Turion screenshot I was simply showing that it's possible to see per node temp. How would you like me to show the topology? Happy to do it.
Just copy/past the output corefreq-cli -m
with the Markdown code format
This last version 1.39.11 will compute the L3 cache size, including the Sub Caches configured by the Probe Filter when enabled. Please refresh the source code and post the topology back. Regards, CyrIng
I was expecting to read 16384 KB of L3 cache. Can you modify the source as bellow at these lines: https://github.com/cyring/CoreFreq/blob/1554e89f4b77fe80bd7a50a3df51e97c4208fd2e/corefreqk.c#L1194
case AMD_Family_15h:
if ((Proc->Features.Std.EAX.ExtModel == 0x0)
&& (Proc->Features.Std.EAX.Model >= 0x0)
&& (Proc->Features.Std.EAX.Model <= 0xf))
{
PROBE_FILTER_CTRL PF;
RDPCI(PF, PCI_AMD_PROBE_FILTER_CTRL);
/* if (PF.Mode != 0b00) {*/
/* Add to L3 the Sub Caches in 512 KB unit size. */
Core->T.Cache[3].Size = Core->T.Cache[3].Size
+ PF.SubCache0En ? (1 << (1 + (PF.SubCacheSize0 & 0b01))) : 0
+ PF.SubCache1En ? (1 << (1 + (PF.SubCacheSize1 & 0b01))) : 0
+ PF.SubCache2En ? (1 << (1 + (PF.SubCacheSize2 & 0b01))) : 0
+ PF.SubCache3En ? (1 << (1 + (PF.SubCacheSize3 & 0b01))) : 0;
/* }*/
}
The purpose above is to mute the condition on PF.Mode
Indeed in AMD family 15h specs, it is explained how a MP system employs part of the L3 cache as sub-cache of size of 1 or 2 MB.
Thus with 4 processors, L3 should equal to:
(4 x 1) + 12 = 16 MB
Then please rebuild and test again
Bump
Thanks for the bump, so sorry, work and life! Looks like I did something wrong, I get build errors on the make clean all.
Is this pasted in correctly?
New version pushed. Please refresh source code then post:
corefreq-cli -m
corefreq-cli -u
Here yo go! Thanks!
Hello, This version 1.39.15 is focusing on the L3 Cache size. If your Piledriver processor is detected then any L3 Sub-Cache has to be summed up to L3
Thanks I will rebuild and post results. What would you like to see?
The size of the L3 Cache in the Topology, please.
Hello, Do you have any result of the L3 cache size in Topology ? Regards, CyrIng
Hope this helps! Thanks!
Thanks for your return.
The Core ID topology looks better, but the L3 cache size remains at 12MB despite the PCI queries of the Sub-Caching data.
The next coding steps will consist in implementing the temperature sensor reading for each cores identified with an ID of # 0 . Thus four sensors (1 per package processor), if available in this architecture ?
CPU Pkg Apic Core Thread Caches (w)rite-Back (i)nclusive
# ID ID ID ID L1-Inst Way L1-Data Way L2 Way L3 Way
00: BSP 0 0 -1 64 2 16 4 2048 16 12288 14
...
16: 1 32 0 -1 64 2 16 4 2048 16 12288 14
...
32: 2 64 0 -1 64 2 16 4 2048 16 12288 14
...
48: 3 96 0 -1 64 2 16 4 2048 16 12288 14
Hello, In the last code, you will get the temperature for each Core ID 0 (per Package) and the Voltage ID for any CPU.
I don't find a precise formula how to convert the VID to voltage, in Specs and this AMD FX tuning guide
The same formula of the Zen architecture is copy-past to this function COMPUTE_VOLTAGE_AMD_15h
https://github.com/cyring/CoreFreq/blob/24bbcfe1dce4c580d8637b2bb7ef157154164788/coretypes.h#L284
You will get wrong voltage results but the VID should be correct. Feel free to modify the formula.
In the UI, please returns screenshots of the view "Power & Voltage".
The VID is specified to be per P-State, but you will have to experiment and determine if the VID is per Package, Node, or Core ? For example, stress individually or group of Cores and observe how the VID differs.
Hello, Any result from above request ? Regards CyrIng
My apologies Cyring I must have misunderstood your request. My fault. I added the line above to coretypes.h as seen here:
I did a "git pull" Then "make clean all"
The make did not build due to errors in corefreqk.c and .h. Let me know what I need to do next. THANK YOU!
[lex@LexBeast CoreFreq]$ make clean all rm -f corefreqd corefreq-cli make -j1 -C /lib/modules/4.20.1-arch1-1-ARCH/build M=/home/lex/pkgbuilds/CoreFreq clean make[1]: Entering directory '/usr/lib/modules/4.20.1-arch1-1-ARCH/build' CLEAN /home/lex/pkgbuilds/CoreFreq/.tmp_versions make[1]: Leaving directory '/usr/lib/modules/4.20.1-arch1-1-ARCH/build' cc -Wall -pthread -c corefreqd.c \ -D FEAT_DBG=1 -o corefreqd.o cc -Wall -c corefreqm.c -o corefreqm.o cc -Wall corefreqd.c corefreqm.c \ -D FEAT_DBG=1 -o corefreqd -lpthread -lm -lrt cc -Wall -c corefreq-cli.c -o corefreq-cli.o cc -Wall -c corefreq-ui.c -o corefreq-ui.o cc -Wall -c corefreq-cli-rsc.c \ -o corefreq-cli-rsc.o cc -Wall -c corefreq-cli-json.c \ -o corefreq-cli-json.o cc -Wall -c corefreq-cli-extra.c \ -o corefreq-cli-extra.o cc -Wall \ corefreq-cli.c corefreq-ui.c corefreq-cli-rsc.c \ corefreq-cli-json.c corefreq-cli-extra.c \ -o corefreq-cli -lm -lrt make -j1 -C /lib/modules/4.20.1-arch1-1-ARCH/build M=/home/lex/pkgbuilds/CoreFreq modules make[1]: Entering directory '/usr/lib/modules/4.20.1-arch1-1-ARCH/build' CC [M] /home/lex/pkgbuilds/CoreFreq/corefreqk.o /home/lex/pkgbuilds/CoreFreq/corefreqk.c: In function ‘Map_AMD_Topology’: /home/lex/pkgbuilds/CoreFreq/corefreqk.c:1191:3: error: unknown type name ‘PROBE_FILTER_CTRL’ PROBE_FILTER_CTRL PF; ^
~~~~ In file included from /home/lex/pkgbuilds/CoreFreq/corefreqk.c:34: /home/lex/pkgbuilds/CoreFreq/corefreqk.c:1192:13: error: ‘PCI_AMD_PROBE_FILTER_CTRL’ undeclared (first use in this function); did you mean ‘UPROBE_FILTER_MMAP’? RDPCI(PF, PCI_AMD_PROBE_FILTER_CTRL); ^~~~~~~~~ /home/lex/pkgbuilds/CoreFreq/corefreqk.h:467:11: note: in definition of macro ‘RDPCI’ : "ir" (_reg) \ ^~~~ /home/lex/pkgbuilds/CoreFreq/corefreqk.c:1192:13: note: each undeclared identifier is reported only once for each function it appears in RDPCI(PF, PCI_AMD_PROBE_FILTER_CTRL); ^~~~~~~~~ /home/lex/pkgbuilds/CoreFreq/corefreqk.h:467:11: note: in definition of macro ‘RDPCI’ : "ir" (_reg) \ ^~~~ /home/lex/pkgbuilds/CoreFreq/corefreqk.c:1196:7: error: request for member ‘SubCache0En’ in something not a structure or union
- PF.SubCache0En ? (1 << (1 + (PF.SubCacheSize0 & 0b01))) : 0 ^ /home/lex/pkgbuilds/CoreFreq/corefreqk.c:1196:36: error: request for member ‘SubCacheSize0’ in something not a structure or union
- PF.SubCache0En ? (1 << (1 + (PF.SubCacheSize0 & 0b01))) : 0 ^ /home/lex/pkgbuilds/CoreFreq/corefreqk.c:1197:7: error: request for member ‘SubCache1En’ in something not a structure or union
- PF.SubCache1En ? (1 << (1 + (PF.SubCacheSize1 & 0b01))) : 0 ^ /home/lex/pkgbuilds/CoreFreq/corefreqk.c:1197:36: error: request for member ‘SubCacheSize1’ in something not a structure or union
- PF.SubCache1En ? (1 << (1 + (PF.SubCacheSize1 & 0b01))) : 0 ^ /home/lex/pkgbuilds/CoreFreq/corefreqk.c:1198:7: error: request for member ‘SubCache2En’ in something not a structure or union
- PF.SubCache2En ? (1 << (1 + (PF.SubCacheSize2 & 0b01))) : 0 ^ /home/lex/pkgbuilds/CoreFreq/corefreqk.c:1198:36: error: request for member ‘SubCacheSize2’ in something not a structure or union
- PF.SubCache2En ? (1 << (1 + (PF.SubCacheSize2 & 0b01))) : 0 ^ /home/lex/pkgbuilds/CoreFreq/corefreqk.c:1199:7: error: request for member ‘SubCache3En’ in something not a structure or union
- PF.SubCache3En ? (1 << (1 + (PF.SubCacheSize3 & 0b01))) : 0; ^ /home/lex/pkgbuilds/CoreFreq/corefreqk.c:1199:36: error: request for member ‘SubCacheSize3’ in something not a structure or union
- PF.SubCache3En ? (1 << (1 + (PF.SubCacheSize3 & 0b01))) : 0; ^ make[2]: [scripts/Makefile.build:298: /home/lex/pkgbuilds/CoreFreq/corefreqk.o] Error 1 make[1]: [Makefile:1563: module/home/lex/pkgbuilds/CoreFreq] Error 2 make[1]: Leaving directory '/usr/lib/modules/4.20.1-arch1-1-ARCH/build' make: *** [Makefile:68: all] Error 2
I have done successful non-regression builds on several Linux (Arch, Ubuntu, Suse, CentOS)
You have to git clone
from scratch again.
No code to edit. Everything has been back-ported; in last version 1.39.18, you should read the temperature and voltage ID for each processor.
First, this is fantastic software and exactly what I was looking for! Thank you for writing this!
I read through the thread about no temperature report from Ryzen and suspect I have a similar issue.
With an insmod /var/lib/dkms/corefreqk/1.39/4.20.0-arch1-1-ARCH/x86_64/module/corefreqk.ko Experimental=1
Output of corefreq-cli -s is:
I see no thermal data of course.
The output of sensors does show the CPU Temps here:
At your leisure, let me know what I should try. THANK YOU!