cyring / CoreFreq

CoreFreq : CPU monitoring and tuning software designed for 64-bit processors.
https://www.cyring.fr
GNU General Public License v2.0
1.98k stars 126 forks source link

[SOLVED] No temperature report on Ryzen 2700X #54

Closed adatum closed 4 years ago

adatum commented 6 years ago

First I'd like to say I'm really impressed by this software. I've been looking for a lightweight program to monitor frequency (and hopefully temperature, vcore, etc) while I set up a new system, and CoreFreq works well and has a ton of features. Plus, I like that it runs in the terminal.

Now for the issue: there is currently no reporting of temperature, Vcore, or power on the Ryzen 2700X as can be seen in the following screenshots.

corefreq_load

cpufreq_vcore

The 2700X is a new CPU so it is understandable. If you need info or want me to try something, just let me know.

Even in the current state, CoreFreq is useful to me. Again, really impressed. It deserves more publicity!

adatum commented 6 years ago

I try the new push later. Anything specific you'd like me to check?

For RAPL I just found it mentioned, but not defined, in an AMD developer document you might have seen.

This article mentions RAPL and this hardware monitoring project on github that maybe contains some hints?

adatum commented 6 years ago

I see exactly two unique values: (VID, Vcore) = { (118, 0.8125), (54, 1.2125) }, where the first pair is at idle and the second pair under load.

cyring commented 6 years ago

Thank you.

Do the Vcore values match your BIOS data ?

Do you read the same Vcore when only one or many CPU are full loaded ? (you can press F3 in UI to apply CPU burning: random or round robin)

What about the Vcore when the OC is manual and 1 or many CPU are loaded ?

Do the RAPL hardware monitoring project match your BIOS ?

adatum commented 6 years ago

The Vcore readings in BIOS and sensors (with it87 module) vary a lot more, so it does not match since it always shows one of only those two values.

The Vcore mostly remains at 0.8125 under low or single core load, occasionally jumping to 1.2125. Under full load it stays at 1.2125. I used mprime to stress (especially all cores).

I haven't tried the manual OC again.

Sorry I'm not familiar with RAPL. What would you like me to check?

cyring commented 6 years ago

Thanks.

With a manual OC I would like to check if I stick with VID non boosted pstates or I have to switch to the VID of the Boosted register. In the last case the algorithm will just test the CPB bit as we did with temperature.

For RAPL, we need a strong reference to compare with. Any AMD or motherboard tool for instance.

cyring commented 6 years ago

This last RAPL commit tries the energy accumulators in cumulative mode. Results will unlikely be accurate, but I expect to read values above zero.

Please let me also know about Vcore in manual OC.

Regards CyrIng

adatum commented 6 years ago
Ratio BIOS CPB setting BIOS Vcore setting CoreFreq readings (VID, Vcore) it87 Vcore UI Max Ratio TURBO
40 Enabled Auto { (118, 0.8125), (39, 1.3062) } ?* 48 Green
40 Enabled Manual 1.35V { (118, 0.8125), (31, 1.3562) } 1.34V (1.33V ACL**) 48 Green
40 Disabled Manual 1.30V { (118, 0.8125), (39, 1.3062) } 1.29V 40 Grey
Auto Enabled Auto { (118, 0.8125), (54, 1.2125) } 0.80V to 1.51V (1.24V ACL**) 45 Green

* I did not note it87 voltage reading for this test ** ACL = all core load, mprime small FFT


When setting the ratio manually I believe CPB is effectively disabled. Yet the indicator was green until I specifically disabled CPB in BIOS. A few days ago I wrote

Yes, TURBO is grey and not green when CPB, PE, PBO are disabled and using manual OC.

I think I had disabled all those settings manually in BIOS for that test.


I might get a multimeter eventually and see if I can reach the voltage measurement points on the motherboard (no guarantee).

cyring commented 6 years ago

Excellent report.

Because the CoreFreq driver reads the P-States, we get only the VID programmed into these registers. That's why the computed Vcore is static; the VIDs set in BIOS are the values read later by the driver. My understanding is that the Processor switches to another voltage plan by selecting another P-State. However a VID is a target voltage, not the effective voltage. Can be the same or closed to it

Same rule is applied with the FID, Frequency Identifier. This time however CoreFreq computes and shows the relative frequency (based on performance counters rather than FID)

In the CoreFreq UI, 48, the max ratio is too hight, the driver Ryzen table needs to be adjusted.

Do you get some values in the Power view ?

adatum commented 6 years ago

Would it be possible to have effective voltage readings too?

For frequency, CoreFreq shows similar results to turbostat which I recently discovered. These frequencies are as low as ~0 MHz at idle. In contrast, scaling_cur_freq shows maybe a minimum of 1.8GHz at the same time. Clearly I'm not familiar with the distinction between these two measurements.

Still zero values for Power, Energy, exactly as in the previous screenshot.

cyring commented 6 years ago

I don't have any specifications to read voltage.

0 MHz is not for real. It is a frequency tendency based on usage. (in fact, number of clock cycles elapsed when processor is working) Off course, one can read the effective frequency, looping arround the scaling_cur_freq; but with a GHz processor, what will be the best sampling interval ? Every second, you won't record data at the right time. Every nano second, the measurements will itself put load on processor ! => performance counters have been created to solve this case.

RAPL: your previous msr readings show that we get values. Need to debug the scalling computation based on the retrieved power unit.

cyring commented 6 years ago

Can you please try the last RAPL commit ?

adatum commented 6 years ago

corefreq_power

Non-zero energy and power values show up, though they are static. I could compare very roughly with readings from the UPS (total system + monitor power, not just CPU).

What is the significance of the usage based frequency and when might someone want to know it instead of effective frequency?

I've been using watch -n1 "cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_cur_freq" but of course there's the issue of sampling rate. ( -n0 is crazy)

A tool with min/max/avg/current values like HWiNFO seems lacking for linux so far; do you know what approach it takes on sampling and readings? It exists so it must be possible.

cyring commented 6 years ago

Can't tell about HWInfo. Does it show, in the same time, the frequency increase/decrease ?

In first version of XFreq, I used to read the effective frequency without reaching the Turbo. See this thread at Intel.

adatum commented 6 years ago

Not sure what you mean by frequency increase/decrease, but you can search for screenshots where it shows current/min/max/avg values for each core.

I would think that generally end users want to know effective values inclusive of Turbo and any other enhancements. Like in Task Manager, etc.

Very interesting thread. So much to learn from it.

cyring commented 6 years ago

Definitely this is an idea to add on the roadmap. I have programmed the fixed frequency and I suggest to combine then with the performance monitoring counters (pmc): in the UI, the histogram bars would be limited by the min, max and turbo frequency ratios; and with the help of the pmc, the bars could slide progressively between each limit. This would be an innovation above the task mgr ? I don't find motivation to clone other softwares -;)

We have however to complete the current CoreFreq features for Ryzen ...

cyring commented 6 years ago

Hello Can you script 2 loops with a 1 second interval, arround respectively 0xc001029a 0xc001029b

I want see how these msr registers are moving.

cyring commented 6 years ago

AMD uProf to compare values with CoreFreq

adatum commented 6 years ago

Energy and Power values are updating for Package, Cores, and Uncore:

corefreq_power2

This is after I replaced the motherboard (for an identical one) and at default BIOS settings. Note that my system isn't the ideal for testing since I'm changing settings to tune it.

I can understand not wanting to clone other software. On the other hand, there's probably a reason why the programs that have those features are popular, and conversely, why the programs that are popular have those features.

Here's a text file with 100 pairs of values which was generated by:

#!/usr/bin/bash

MSR_A=0xc001029a
MSR_B=0xc001029b

echo $MSR_A $MSR_B > msr.txt

for ((i=1;i<=100;i++)); 
do
  echo `rdmsr $MSR_A; rdmsr $MSR_B;` | tee -a msr.txt
  sleep 1
done

I'll have to figure out how to use AMDuProf. I installed the rpm, but the documentation didn't even specify how to launch the program or where it installs to... I had to find that myself.

cyring commented 6 years ago

Great, thanks for the msr values file.

Uncore Energy & Power must be a bug; there are no such msr.

Based on your motherboard setup, I will rollback to the previous Pkg & Cores RAPL formulas.

adatum commented 6 years ago

I just found a possible explanation for why Energy and Power were zero before. This is about the Performance Enhancer (Precision Boost + Asus tweaks ?) setting on my motherboard:

Level 3 (OC)

Tweak from The Stilt which disables the power and current calculation, you might see the SMU calculated power/current in HWInfo showing 0 when using it.

I had it set to PE3 previously. For now I will avoid it.

cyring commented 6 years ago

For your testings, I have put back the RAPL code.

adatum commented 6 years ago

corefreq_power_nope3

cyring commented 6 years ago

RAPL additional code, tested w/ an i5-7500, below lines: https://github.com/cyring/CoreFreq/blob/43bbd9c8cbfc4488bff8b0798b393fd12e7a635a/corefreqd.c#L505

Shm->Proc.Power.Unit.Times *= 1000.0 / (double) (Shm->Sleep.Interval);
cyring commented 6 years ago

Hello, Using last commit, the L3 cache size should be equal to 16384 KB and the max boosted ratio equal to 44

adatum commented 6 years ago

corefreq_20180602_2

At the time of this screenshot, the UPS reported ~60W for the total system+display, sometimes spiking to above 100W, or down to ~50W.

cyring commented 6 years ago

Oops I've messed up with all cache size. Will fix asap...

cyring commented 6 years ago

L3 fixed: can you display the topology ?

Can you also monitor Power and Voltage using corefreq-cli -V in idle and load usage ?

cyring commented 6 years ago

In addition to the above requests, may you also check the Hyper-Threading state (in the Technologies view); and dump the CPUID full table using corefreq-cli -u

Both tests to be executed with Hyper-Threading, first enabled in BIOS, next disabled.

You have to download the last commit once again.

Regards

adatum commented 6 years ago

Topology w/SMT: corefreq_topology

Topology w/o SMT: corefreq_topology_nosmt

Idle corefreq-cli -V: corefreq_v_idle

Load corefreq-cli -V: corefreq_v_load

Technologies w/SMT: corefreq_tech_htt

corefreq -u w/SMT

Technologies w/o SMT: corefreq_technologies_nosmt

corefreq -u w/o SMT

Note that Technologies shows Virtualization=OFF, but I do have SVM enabled in BIOS.

cyring commented 6 years ago

Thanks a lot. That's great, HTT status is well queried. I'll rename it SMT when AMD processor is present; and HTT for Intel.

Virtualisation indicates that CoreFreq is running into a VM. A CPUID processor bit can be queried for this.

SVM may be reflected by VMX you will get in the Features view.

I'm not sure if PowerNow, Cool'n Quiet still make sens with Ryzen ? Nothing about in specs.

adatum commented 6 years ago

VME or VMX?

corefreq_vmx

I'm not sure if PowerNow, Cool'n Quiet still make sens with Ryzen ? Nothing about in specs.

I've read some obscure comment by overclockers about these technologies even with Ryzen, but don't recall seeing them in BIOS. Next time I'm in BIOS I will search for them.

cyring commented 6 years ago

Checking code, it is neither VME nor VMX. SVM capability can be decoded from the CPUID view at 80000001 register ECX bit position 2 which is present in your dump. SVM enablement should be stated in the System-Registers view at EFER The other AMD msr registers have not been implemented.

adatum commented 6 years ago

corefreq_efer

cyring commented 6 years ago

Sorry it requires more work. For the roadmap, I need to query the msr VM_CR at 0xc0010114 [Virtual Machine Control] to read:

adatum commented 6 years ago

No worries. Did you want those msr values?

$ sudo rdmsr 0xc0010114
8
$ sudo rdmsr 0xc0010118
0
cyring commented 6 years ago

According to the msr 0xc0010114

Bits Description Value
63:32 Reserved. 0
31:5 Reserved. Read-only,Error-on-write-1. Reset: 0. 0
4 SvmeDisable: SVME disable. Configurable. Reset: 0. 0=Core::X86::Msr::EFER[SVME] is read-write. 1=Core::X86::Msr::EFER[SVME] is Read-only,Error-on-write-1. See Lock for the access type of this field. Attempting to set this field when (Core::X86::Msr::EFER[SVME]==1) causes a #GP fault, regardless of the state of Lock. See the APM2 section titled “Enabling SVM" for software use of this field. 0
3 Lock: SVM lock. Read-only,Write-1-only,Volatile. Reset: 0. 0=SvmeDisable is read-write. 1=SvmeDisable is read-only. See Core::X86::Msr::SvmLockKey[SvmLockKey] for the condition that causes hardware to clear this field. 1
2 Reserved. 0
1 InterceptInit: intercept INIT. Read-write,Volatile. Reset: 0. 0=INIT delivered normally. 1=INIT translated into a SX interrupt. This bit controls how INIT is delivered in host mode. This bit is set by hardware when the SKINIT instruction is executed. 0
0 Reserved. 0

SVME is not disable and SVM is locked with a key of value 0 thus AMD Virtualization in BIOS is ON

cyring commented 6 years ago

Based on the CPUID dump files, below the extracted extended topology to compute the thread id

t = a AND h t = h x (a - (c x 2 x p))

CPU# (a) APIC ID (c) Core ID Node ID (p) Threads / Core (t) Thread ID
00 0 0 0 1 0
01 1 0 0 1 1
02 2 1 0 1 0
03 3 1 0 1 1
04 4 2 0 1 0
05 5 2 0 1 1
06 6 3 0 1 0
07 7 3 0 1 1
08 8 4 0 1 0
09 9 4 0 1 1
10 10 5 0 1 0
11 11 5 0 1 1
12 12 6 0 1 0
13 13 6 0 1 1
14 14 7 0 1 0
15 15 7 0 1 1
CPU# APIC ID Core ID Node ID Threads / Core (t) Thread ID
00 0 0 0 0 0
01 1 1 0 0 0
02 2 2 0 0 0
03 3 3 0 0 0
04 8 8 0 0 0
05 9 9 0 0 0
06 10 10 0 0 0
07 11 11 0 0 0
cyring commented 6 years ago

The code to map the Ryzen topology is committed. You should get the results as the 2 tables above.

--- EDIT --- I'm simplifying the thread id to a bitwise operation t = a AND h HTT detection is also optimized.

cyring commented 6 years ago

Can you also try the RAPL project at djselbeck/rapl-read-ryzen. I don't find in it many algorithm differences with CoreFreq beside the sampling time. Please, also print the msr 0xc0010299

adatum commented 6 years ago
$ sudo rdmsr 0xc0010299
a1003

SMT disabled in BIOS: corefreq_ryzen corefreq-cli -u

SMT enabled in BIOS: corefreq_ryzen_smt corefreq-cli -u

cyring commented 6 years ago

Can you print corefreq-cli -m ?

adatum commented 6 years ago
$ ./corefreq-cli -m
CPU Pkg  Apic  Core Thread  Caches      (w)rite-Back (i)nclusive              
 #   ID   ID    ID     ID  L1-Inst Way  L1-Data Way      L2  Way      L3  Way 
00: BSP     0     0      0      64  4        32  8       512  8     16384  8  
01:   0     1     0      1      64  4        32  8       512  8     16384  8  
02:   0     2     1      0      64  4        32  8       512  8     16384  8  
03:   0     3     1      1      64  4        32  8       512  8     16384  8  
04:   0     4     2      0      64  4        32  8       512  8     16384  8  
05:   0     5     2      1      64  4        32  8       512  8     16384  8  
06:   0     6     3      0      64  4        32  8       512  8     16384  8  
07:   0     7     3      1      64  4        32  8       512  8     16384  8  
08:   0     8     4      0      64  4        32  8       512  8     16384  8  
09:   0     9     4      1      64  4        32  8       512  8     16384  8  
10:   0    10     5      0      64  4        32  8       512  8     16384  8  
11:   0    11     5      1      64  4        32  8       512  8     16384  8  
12:   0    12     6      0      64  4        32  8       512  8     16384  8  
13:   0    13     6      1      64  4        32  8       512  8     16384  8  
14:   0    14     7      0      64  4        32  8       512  8     16384  8  
15:   0    15     7      1      64  4        32  8       512  8     16384  8 
cyring commented 6 years ago

Hyper-Threading detection

ECX [10:8] NpP
000b 1 node per processor.
001b 2 nodes per processor.
010b Reserved.
011b 4 nodes per processor.
111b-100b Reserved.
Model HTT LC NC TpC NpP
AMD Ryzen 7 2700X OFF 8 8 1 1
AMD Ryzen 7 2700X ON 8 16 2 1
AMD Ryzen 5 2500U ? 8 8 2 1
AMD Ryzen 3 2200G ? 8 4 1 1
AMD Ryzen 7 1700X ? 8 16 2 1
AMD Ryzen Threadripper 1950X ? 8 32 2 2
cyring commented 6 years ago

Hello, Hyper-Threading detection has be enhanced. Can you print the topology (including the visual HTT indicator) in these two BIOS cases: SMT ON, SMT OFF

-- EDIT -- Could you also add scenarios where some Cores are deactivated in BIOS and verify if CoreFreq is matching the HTT state and topology.

cyring commented 6 years ago

Looking at the RAPL measurements from the tom's Hardware review, I'm noticing that the CoreFreq's Package power is pretty closed to it, if value is divided by the number of physical cores (8). Based on your previous screenshots:

Load Pkg (W) div by 8 THR
High 869.6832275 108.71 104.7
Idle 99.68884277 12.46 12.7
adatum commented 6 years ago

SMT ON: corefreq_topology_smt

SMT OFF: corefreq_topology_nosmt2


SMT OFF, cores loaded: corefreq_vs_rapl_power

I noticed the readings from CoreFreq and the RAPL project were similar and just off by some factor.

CoreFreq's cores energy reading essentially matches RAPL's individual core energy readings (though they note watts instead of joules, maybe a typo).

I was about to say CoreFreq's package energy and cores power readings are similar, but then I saw this (also, is the 9 at uncore power an artifact?): corefreq_vs_rapl_power2

cyring commented 6 years ago

Thank you for your screenshots.

Unfortunately I have to refactor the code: with Ryzen, the energy consumed is a per Core msr register.

MSRC001_029A [Core Energy Status] (CORE_ENERGY_STAT) Core::X86::Msr::CORE_ENERGY_STATlthree[1:0]core[3:0]; MSRC001029A

cyring commented 6 years ago
adatum commented 6 years ago
cyring commented 6 years ago

What units do you get in Power & Thermal ?

Concerning Vcore, do you confirm those only two results when load testing on 1 or 2 random cores ? (feel free to use the CoreFreq tool to apply random & round robin turbo stress)

Did you start in Experimental ? If true then reading the legacy THERMTRIP register does not crash the Ryzen. However we can't tell if it is meaningful until the threshold is reached.