Closed travisdowns closed 6 years ago
Memory controller can be queried using the driver option Experimental=1
So far, tested successfully with i7-920 Nehalem QPI, some Core2 and Turion Hypertransport. Other architectures have been blinded programmed base on datasheets. Untested.
Beside UI, those are the reasons why CoreFreq is different to other tools. Its driver aims to provide a framework to query the processor registers, pci and other instructions using a low latency path for accuracy.
I think this is in the "uncore" not the "offcore" (where I think the memory controller stuff lives). That is, the clock that the L3 ring is on?
I'm excited also about the driver, currently I'm using libpfc which offers userspace reads of the PMC, but it would be great to have a drive to read thing things for which is there is no user-space access at all.
Here is what I can provide for the Nehalem IMC.
And here is the documentation
Uncore fixed counter has been implemented for Nehalem architecture.
Slowly in progress for SMB-EP & HSW-EP
Miss alpha testers for SNB, IVB & similar μArch
@cyring - I am on SKL, would that be useful testing for you?
Yes, it will be helpful. However, I have committed code for Nehalem only b/c my attempts to enable some msr registers for Xeon Uncore had crashed servers. So you will have to start the kernel module with the HNM architecture identifier:
insmod corefreqk.ko ArchID=19
Then run dmesg to verify architecture is acknowledged by driver.
Next in corefreq-cli using view "Pkg. cycles", look at the counter UNCORE
I tried it, but unfortunately immediately upon loading the kernel module with ArchID=19
I got a hard lockup and had to reboot with the power button, so I wasn't able to run the further tests.
The module loaded fine without the ArchID=19
though.
So SKL Uncore counter does not program like with NHM. Can you please return me the output of corefreq-cli -s
Here's what I got:
Processor [Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz]
|- Architecture [Skylake/S]
|- Vendor ID [GenuineIntel]
|- Signature [06_5E]
|- Stepping [ 3]
|- Microcode [ 132]
|- Online CPU [ 4/4 ]
|- Base Clock [ 98.9]
|- Core Boost Min Max 8C 7C 6C 5C 4C 3C 2C 1C
|- ratio : 8 26 - - - - 31 32 33 35
|- freq : 791 2571 3066 3165 3264 3461
Instruction set:
|- 3DNow!/Ext [N,N] AES [Y] AVX/AVX2 [Y/Y] BMI1/BMI2 [Y/Y]
|- CLFSH [Y] CMOV [Y] CMPXCH8 [Y] CMPXCH16 [Y]
|- F16C [Y] FPU [Y] FXSR [Y] LAHF/SAHF [Y]
|- MMX/Ext [Y/N] MONITOR [Y] MOVBE [Y] PCLMULDQ [Y]
|- POPCNT [Y] RDRAND [Y] RDTSCP [Y] SEP [Y]
|- SSE [Y] SSE2 [Y] SSE3 [Y] SSSE3 [Y]
|- SSE4.1/4A [Y/N] SSE4.2 [Y] SYSCALL [Y]
Features:
|- 1 GB Pages Support 1GB-PAGES [Present]
|- 100 MHz multiplier Control 100MHzSteps [Missing]
|- Advanced Configuration & Power Interface ACPI [Present]
|- Advanced Programmable Interrupt Controller APIC [Present]
|- Core Multi-Processing CMP Legacy [Missing]
|- L1 Data Cache Context ID CNXT-ID [Missing]
|- Direct Cache Access DCA [Missing]
|- Debugging Extension DE [Present]
|- Debug Store & Precise Event Based Sampling DS, PEBS [Present]
|- CPL Qualified Debug Store DS-CPL [Present]
|- 64-Bit Debug Store DTES64 [Present]
|- Fast-String Operation Fast-Strings [Present]
|- Fused Multiply Add FMA|FMA4 [Present]
|- Hardware Lock Elision HLE [Present]
|- Long Mode 64 bits IA64|LM [Present]
|- LightWeight Profiling LWP [Missing]
|- Machine-Check Architecture MCA [Present]
|- Model Specific Registers MSR [Present]
|- Memory Type Range Registers MTRR [Present]
|- OS-Enabled Ext. State Management OSXSAVE [Present]
|- Physical Address Extension PAE [Present]
|- Page Attribute Table PAT [Present]
|- Pending Break Enable PBE [Present]
|- Process Context Identifiers PCID [Present]
|- Perfmon and Debug Capability PDCM [Present]
|- Page Global Enable PGE [Present]
|- Page Size Extension PSE [Present]
|- 36-bit Page Size Extension PSE36 [Present]
|- Processor Serial Number PSN [Missing]
|- Restricted Transactional Memory RTM [Present]
|- Safer Mode Extensions SMX [Missing]
|- Self-Snoop SS [Present]
|- Time Stamp Counter TSC [Invariant]
|- Time Stamp Counter Deadline TSC-DEADLINE [Present]
|- Virtual Mode Extension VME [Present]
|- Virtual Machine Extensions VMX [Present]
|- Extended xAPIC Support x2APIC [ xAPIC]
|- Execution Disable Bit Support XD-Bit [Present]
|- XSAVE/XSTOR States XSAVE [Present]
|- xTPR Update Control xTPR [Present]
Technologies:
|- Hyper-Threading HTT [OFF]
|- SpeedStep EIST [ ON]
|- PowerNow! PowerNow [OFF]
|- Dynamic Acceleration IDA [ ON]
|- Turbo Boost TURBO|CPB [ ON]
|- Virtualization HYPERVISOR [OFF]
Performance Monitoring:
|- Version PM [ 4]
|- Counters: General Fixed
| 8 x 48 bits 3 x 48 bits
|- Enhanced Halt State C1E [ ON]
|- C1 Auto Demotion C1A [ ON]
|- C3 Auto Demotion C3A [ ON]
|- C1 UnDemotion C1U [ ON]
|- C3 UnDemotion C3U [ ON]
|- Frequency ID control FID [OFF]
|- Voltage ID control VID [OFF]
|- P-State Hardware Coordination Feedback MPERF/APERF [ ON]
|- Hardware-Controlled Performance States HWP [ ON]
|- Hardware Duty Cycling HDC [ ON]
|- Package C-State
|- Configuration Control CONFIG [ LOCK]
|- Lowest C-State LIMIT [ 0]
|- I/O MWAIT Redirection IOMWAIT [ ENABLE]
|- Max C-State Inclusion RANGE [ 0]
|- MWAIT States: C0 C1 C2 C3 C4
| 0 2 1 2 4
|- Core Cycles [Present]
|- Instructions Retired [Present]
|- Reference Cycles [Present]
|- Last Level Cache References [Present]
|- Last Level Cache Misses [Present]
|- Branch Instructions Retired [Present]
|- Branch Mispredicts Retired [Present]
Power & Thermal Monitoring:
|- Clock Modulation ODCM [Disable]
|- DutyCycle [ 87.50%]
|- Power Management PWR MGMT [ LOCK]
|- Energy Policy Bias Hint [ 15]
|- Junction Temperature TjMax [ 100]
|- Digital Thermal Sensor DTS [Present]
|- Power Limit Notification PLN [Present]
|- Package Thermal Management PTM [Present]
|- Thermal Monitor 1 TM1|TTP [ Enable]
|- Thermal Monitor 2 TM2|HTC [Present]
Thank you. I'm programming an algorithm for Skylake architectures (desktop, mobile, xeon)
Hello, I have programmed the Uncore fixed counter for SandyBridge and superior architectures. It has been tested OK with a Broadwell [06_3D] For Skylake, same algorithm but different msr registers : can you give a try, please ?
@cyring - do I still need the explicit ArchID=19
specifier? What you would like me to test?
Where can I see the core frequency? I only found it on the "dashboard" tab but it is also unreadable to me due to the large ASCII-art letters being used, but anyways it seems like there is a display problem:
Note the numbers in between the fields and what appears to be an "E" following the uncore clock.
In my experience the uncore clock varied between 0000 and 1844 when idle and around 0030 under heavy load, which doesn't seem right.
Thanks for this quick reply. Good news is that your processor is doing ok with Uncore readings without crashing -;) With Broadwell, I have the same issue of large number. It could be an overflow of the Uncore counter. I'm working on this. In the UI menu, you can also follow the "View" -> "Package cycles" where the Uncore frequency is displayed.
It's a relative frequency, whereas Nehalem Uncore is constant. Thus counter delta was negative over the period. I have commit a workaround (absolute difference).
You will need to apply some load in parallel b/c Uncore fixed counter does not count during stalled cycles (such as C-States). You may also notice a short erratic value when transitioning from Load to Idle : it's a side effect of my formula.
Hello, Please let me know if Uncore is showing up with your processor ?
@cyring is there a fix for the display issue? I can try again. Right, I also recently read that when the socket is in C1 the uncore doesn't tick...
@travisdowns : I have experimented a Broadwell/mobile processor, the fixed performance counter (FC0) of the Uncore is counting cycles when in C0. Thus during idle states, from C1 down to the lowest Cx, FC0 does not increment, and the measurement (previous FC0 - current FC0 over 1 second interval) is going down or near zero. This is confirmed in the Intel SDM specifications.
In short, you will have to put the processor in C0 to read its Uncore frequency. My way is "sha1sum /dev/zero" in another terminal.
I have commited UI fixes & changes, can you please show me which display issue you have ?
Also tested OK with a IVB i7 3770K
@cyring - I tested this on my Skylake using the "package cycles" view and indeed the "uncore" shows some number, but it doesn't seem correct.
With the system under load (intel_pstate governor set to performance) the value fluctuates between 50,000,000 and 150,000,000. There are no units - is that in Hz? The true uncore frequency should be similar to the CPU frequency of around 3 GHz, at least when loaded down like this.
@travisdowns : issue reopened. Could you post a photo of the BIOS showing Uncore value ? To my understanding, intel_pstate max governor is applying a profile; but not load yet. Do you have execute load using any command such as " taskset -c 0 sha1sum /dev/zero "
My BIOS doesn't show uncore clocks, sorry. You can find plenty of references that indicate that the uncore clocks have the same range as the core clocks, however - e.g., on a 3.5 GHz CPU the max uncore clock is also 3.5 GHz. Under load that isn't core local (i.e., you should use a load that touches enough memory to hit the L3 at least) I'd expect it to near the maximum almost all the time.
Sorry for the confusion: I was reporting my intel_pstate
setting because this is important for various power saving behaviors that greatly affect things like uncore clock rates (i.e., "memory efficient turbo") - but I applied load separately four different ways:
sha1sum /dev/zero
stress -c 1
stress -c 4
stress -vm
Hello travisdowns : indeed screenshots grabbed on the Web show that the i7-6700HQ uncore clocks the same range as the core clocks.
I'm reviewing code for Skylake/S [06_5E] against the Intel's reference manual to check if the Uncore fixed performance counter is correctly set ?
In the macro Uncore_Counters_Set()
, I touch the following bits of the SKL structures:
EN_FIXED_CTR0
EN_CTR0
Clear_Ovf_CTR0
(if overflowed during driver startup)
https://github.com/cyring/CoreFreq/blob/f1bc1c4f6db8155f47109a981df6ab24b06a9e38/corefreqk.c#L3165
However, because those are read-write operations, I save and leave other registers bits the same.
For example, I don't check if Interrupts such as _FRZ_ON_PMI
or OVF_EN
_ were activated by other software.
Could you trace values with printk in the macro ?
@cyring - yes, I can. Can you make a branch with the printk
tracing you want to see, or paste it here? I'll add it here and try it.
You can use a program like HWInFO
on Windows (if you have it) which shows the core clocks: they are usually the same as the highest clocked core on the socket.
@travisdowns : please try last commit. In the bottom of the Processor view (or using corefreq-cli -s
), you should read your processor Uncore ratios, such as:
|- Uncore
Min 1830.07 [ 12 ]
Max 2440.09 [ 16 ]
corefreq-cli -s
doesn't report anything at all for uncore:
|- Turbo Boost
1C 3461.46 < 35 >
2C 3263.66 < 33 >
3C 3164.76 < 32 >
4C 3065.86 < 31 >
|- Uncore
ISA Extensions:
I'm also getting some page allocation oopses in dmesg
from CoreFreq, e.g.:
[57261.076025] corefreqd: page allocation failure: order:10, mode:0x14040c0(GFP_KERNEL|__GFP_COMP)
[57261.076037] CPU: 1 PID: 2515 Comm: corefreqd Tainted: G W OE 4.10.0-42-generic #46~16.04.1-Ubuntu
[57261.076039] Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 01.02.00 04/07/2016
[57261.076040] Call Trace:
[57261.076049] dump_stack+0x63/0x90
[57261.076054] warn_alloc+0x13a/0x170
[57261.076058] ? __alloc_pages_direct_compact+0x4e/0x110
[57261.076062] __alloc_pages_slowpath+0x2ba/0xb30
[57261.076066] __alloc_pages_nodemask+0x21a/0x2a0
[57261.076073] alloc_pages_current+0x95/0x140
[57261.076076] kmalloc_order+0x18/0x40
[57261.076079] kmalloc_order_trace+0x24/0xa0
[57261.076089] SysGate_OnDemand+0x37/0x70 [corefreqk]
[57261.076095] CoreFreqK_mmap+0xb8/0x101 [corefreqk]
[57261.076099] mmap_region+0x384/0x600
[57261.076102] do_mmap+0x463/0x550
[57261.076106] ? common_mmap+0x4b/0x50
[57261.076110] ? apparmor_mmap_file+0x18/0x20
[57261.076114] ? security_mmap_file+0xda/0xf0
[57261.076118] vm_mmap_pgoff+0xba/0xf0
[57261.076121] SyS_mmap_pgoff+0x1c1/0x290
[57261.076125] SyS_mmap+0x1b/0x30
[57261.076131] entry_SYSCALL_64_fastpath+0x1e/0xad
[57261.076134] RIP: 0033:0x7fd7e65ab67a
[57261.076136] RSP: 002b:00007ffc76286b28 EFLAGS: 00000246 ORIG_RAX: 0000000000000009
[57261.076139] RAX: ffffffffffffffda RBX: 00007fd7e62a5700 RCX: 00007fd7e65ab67a
[57261.076141] RDX: 0000000000000003 RSI: 0000000000201000 RDI: 0000000000000000
[57261.076143] RBP: 00007ffc76286c40 R08: 0000000000000003 R09: 0000000000001000
[57261.076144] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
[57261.076146] R13: 00007ffc76286c3f R14: 00007fd7e62a59c0 R15: 0000000000000000
[57261.076148] Mem-Info:
[57261.076155] active_anon:2034997 inactive_anon:264676 isolated_anon:0
active_file:758176 inactive_file:401475 isolated_file:0
unevictable:8 dirty:2372 writeback:0 unstable:0
slab_reclaimable:91920 slab_unreclaimable:12469
mapped:157058 shmem:140149 pagetables:14770 bounce:0
free:466178 free_pcp:0 free_cma:0
In the package cycles view I'm seeing uncore cycles around 100,000,000 under load. Perhaps this is is the "base" value, like BCLK
and it needs to be be scaled up by a multiplier, like 35 to get 3.5 GHz? Similar to the way core frequencies work?
Your processor with a 06_5E is not listed for msr UNCORE_RATIO_LIMIT https://github.com/cyring/CoreFreq/blob/0e3a767baff2166504c43068420f903c608b0849/intelmsr.h#L559 Uncore ratios have to be queried by other means. Perhaps in the PCIE space, like I did recently for Nehalem architecture.
EDIT: When UCLK will be known, it's a good idea to multiply it by the # of Uncore cycles.
Oopses are due to issues with the mapping of a shared memory I have named SysGate. This last one provides the Kernel execution data such as tasks & memory usage, current idle driver & governor.
As a workaround, you can disable SysGate starting the daemon with the following argument:
corefreqd -goff
Because it's an on-demand algorithm, corefreqk driver will not receive an ioctl request to map the SysGate. However, the Kernel data won't be available and the UI will deactivate the associated features.
Have you noticed those oopses previously ? Kernel has been updated or compiled differently ?
Regards CyrIng
Yeah my CPU is Skylake i7-6700HQ, aka "skylake client". It's going to be a pretty common configuration since KBL also shares this uarch.
I haven't noticed the opposed before. It seems that the allocation failed maybe because it is too big? Order: 10 (nearly 4 MB AFAIK) is a pretty big ask for kmalloc since physical memory may be fairly fragemented?
I have used several different kernels with CoreFreq since Ubunut updates them pretty frequently. Currently I'm on 4.10.0-42-generic.
In "6th Generation Intel® Processor Datasheet for S-Platforms Datasheet - Volume 2 of 2" , I'm reading: SA System Agent Performance status. Indicates current SA PLLs ratios.
Offset: [B:0, D:0, F:0] + 5918h
31:24 UCLK_RATIO: RING UCLK RATIO. Reference=100Mhz
23:16 ICLK_RATIO: IMGU ICLK RATIO. Reference=25Mhz
15:8 FCLK_RATIO: SA FCLK RATIO. Reference=100Mhz
7 QCLK_REFERENCE: DDR QCLK REFERENCE. 0=133Mhz, 1=100Mhz
6:0 QCLK_RATIO: DDR QCLK RATIO. Reference detemined by QCLK_REFERENCE
The document provides 2 PCI DID to query the "HOST and DRAM Controller" for
DID (S--Processor Line)
Dual Core - 190Fh
Quad Core - 191Fh
Could you please confirm you have one of those ? (see lspci -nn
)
Here's my full lscpi -nn
output:
00:00.0 Host bridge [0600]: Intel Corporation Sky Lake Host Bridge/DRAM Registers [8086:1910] (rev 07)
00:01.0 PCI bridge [0604]: Intel Corporation Sky Lake PCIe Controller (x16) [8086:1901] (rev 07)
00:02.0 VGA compatible controller [0300]: Intel Corporation Skylake Integrated Graphics [8086:191b] (rev 06)
00:04.0 Signal processing controller [1180]: Intel Corporation Skylake Processor Thermal Subsystem [8086:1903] (rev 07)
00:14.0 USB controller [0c03]: Intel Corporation Sunrise Point-H USB 3.0 xHCI Controller [8086:a12f] (rev 31)
00:14.2 Signal processing controller [1180]: Intel Corporation Sunrise Point-H Thermal subsystem [8086:a131] (rev 31)
00:15.0 Signal processing controller [1180]: Intel Corporation Sunrise Point-H LPSS I2C Controller #0 [8086:a160] (rev 31)
00:15.1 Signal processing controller [1180]: Intel Corporation Sunrise Point-H LPSS I2C Controller #1 [8086:a161] (rev 31)
00:16.0 Communication controller [0780]: Intel Corporation Sunrise Point-H CSME HECI #1 [8086:a13a] (rev 31)
00:17.0 SATA controller [0106]: Intel Corporation Sunrise Point-H SATA Controller [AHCI mode] [8086:a103] (rev 31)
00:1c.0 PCI bridge [0604]: Intel Corporation Sunrise Point-H PCI Express Root Port #1 [8086:a110] (rev f1)
00:1c.1 PCI bridge [0604]: Intel Corporation Sunrise Point-H PCI Express Root Port #2 [8086:a111] (rev f1)
00:1d.0 PCI bridge [0604]: Intel Corporation Sunrise Point-H PCI Express Root Port #9 [8086:a118] (rev f1)
00:1d.4 PCI bridge [0604]: Intel Corporation Sunrise Point-H PCI Express Root Port #13 [8086:a11c] (rev f1)
00:1d.6 PCI bridge [0604]: Intel Corporation Sunrise Point-H PCI Express Root Port #15 [8086:a11e] (rev f1)
00:1f.0 ISA bridge [0601]: Intel Corporation Sunrise Point-H LPC Controller [8086:a14e] (rev 31)
00:1f.2 Memory controller [0580]: Intel Corporation Sunrise Point-H PMC [8086:a121] (rev 31)
00:1f.3 Audio device [0403]: Intel Corporation Sunrise Point-H HD Audio [8086:a170] (rev 31)
00:1f.4 SMBus [0c05]: Intel Corporation Sunrise Point-H SMBus [8086:a123] (rev 31)
01:00.0 3D controller [0302]: NVIDIA Corporation GM107M [GeForce GTX 960M] [10de:139b] (rev a2)
02:00.0 Network controller [0280]: Broadcom Corporation BCM43602 802.11ac Wireless LAN SoC [14e4:43ba] (rev 01)
03:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS525A PCI Express Card Reader [10ec:525a] (rev 01)
04:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller [144d:a802] (rev 01)
Thank you, I'm implementing the register for [8086:1910]
I also have noticed that I mistakenly called the i7-6700HQ architecture: "Skylake/S". In fact, the DID 1910 is listed into the datasheet for H-Platforms !
EDIT: If the SysGate shared memory issue can't be solved, feel free to create another issue. I will debug it next. issue #45
Hello,
Code has been committed as version 1.17.5. You need to start the driver with the following argument:
insmod corefreqk.ko Experimental=1
With the new changes, -s
says this:
|- Uncore
Max 25218.44 [ 255 ]
The value doesn't change between runs.
Looking at the package cycles dashboard, the value bounces around a fair bit, but under stress a common value is around 15,000,000.
I didn't see any more oopses.
Still requires more work b/c 255 is not the UCLK ratio I was expected from the PCI space !
Uncore ratios and counters have been verified ok on Kaby/Coffee Lake
Just to check, you mention OK on Kaby/Coffee, should it also be OK on Skylake? Kaby and Skylake are pretty much micro-architecturally idenical as far as I know.
I could verified uncore ratios only with KBL/CFL i5-7500 I'm getting min and max ratios and the fixed perf counter. 2 algorithms implemented : SKL_SA SKL_IMC In corefreqk.h if the first does not work (SKL_SA), please replace and try with the other (SKL_IMC)
{
PCI_DEVICE(PCI_VENDOR_ID_INTEL,PCI_DEVICE_ID_INTEL_SKYLAKE_H_IMC_HAQ),
.driver_data = (kernel_ulong_t) SKL_SA
},
Hello, They had been additional code for Skylake. Do you get correct min and max Uncore frequencies ?
@cyring - can you be specific about what I should check? ./corefreq-cli -s
seems to report unchanging max and min for uncore.
On the frequency screen I see a value above UNCORE x26
which varies from 1.00 or so up to 4000, but spends most of its time between 20 and 60. So I'm not sure what is going on or how to interpret the uncore figure.
Can you show me your screen like this one ?
Here you go:
Those numbers look reasonable: but are they dynamic or read from an MSR or something?
All ratios are read from MSR and write to the same MSR with your choosen ratio if the Processor has Turbo or Uncore unlocked.
However MSR bit may return the opposite Unlocked status which crashes Processor when trying to OC.
Thus I have to add an exception...
Please let me know if you OC your 6700HQ ?
I do not OC my chip and it is not unlocked (I don't even think you can OC it as it is a mobile chip).
Are the two min and max values on that screen the (mostly fixed) configured min and max from the MSRs?
What I am mostly interested in is seeing the dynamic currently value of the uncore frequency, just like CoreFreq shows the dynamic core frequency on many screens. This I have never seen working properly yet. Do you agree?
Did we try this MSR in the past ? https://github.com/cyring/CoreFreq/blob/5b6b4f61dbd9c1c1d2e407eed7c534a444749a0e/intelmsr.h#L69 Can you test with it with the command rdmsr ?
Dynamic: you mean the Uncore counter ?
$ sudo rdmsr 0x00000620
823
By "dynamic" I mean the current (specifically: over some very short interval ending with now) frequency. Just like CoreFreq shows my current CPU frequency(ies) like 2.592 GHz, bouncing all over the place as cores go up and down in p-states and c-states, I would like to see the (single) uncore frequency in the same way. Perhaps using uncore counting and converting it to GHz as necessary.
Great. Using the following bit-field, we can decode the min and max Uncore ratios. https://github.com/cyring/CoreFreq/blob/5b6b4f61dbd9c1c1d2e407eed7c534a444749a0e/intelmsr.h#L751
Max=35
Min= 8
I have not solve the differences between Nehalem and superior architectures to compute the Uncore's relative frequency: with 2 samples of the fixed counter, spaced by one second interval, Nehalem is giving me a direct value in Hz whereas the others, I believe, return a delta of clocks that I should scale with the Uncore Base Clock ?
I have pushed a fix to decode the Uncore min and max, can you please start the driver:
insmod corefreqk.ko Experimental=1
and post back the results of corefreq-cli -s
Thank you
@cyring - yes, those min/max ratios look good for my machine (in fact they are the same ratios as the core CPU freq: 800 MHz to 3500 MHz).
What MSR are you reading for the uncore fixed counter? Or is it in PCI space? We could check the documentation.
Here's my -s
output:
Processor [Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz]
|- Architecture [Skylake/S]
|- Vendor ID [GenuineIntel]
|- Signature [ 06_5E]
|- Stepping [ 3]
|- Microcode [ 194]
|- Online CPU [ 4/4 ]
|- Base Clock [ 98.90]
|- Frequency (Mhz) Ratio
Min 791.18 [ 8 ]
Max 2571.33 [ 26 ]
|- Factory [100.00]
2600 [ 26 ]
|- Turbo Boost [UNLOCK]
1C 3461.40 < 35 >
2C 3263.61 < 33 >
3C 3164.71 < 32 >
4C 3065.81 < 31 >
|- Uncore [UNLOCK]
Min 791.18 < 8 >
Max 3461.40 < 35 >
|- TDP Level [ 0:3 ]
|- Programmable [UNLOCK]
|- Configuration [UNLOCK]
|- Turbo Activation [UNLOCK]
Nominal 2571.33 [ 26 ]
Level1 2076.84 [ 21 ]
Turbo 3362.50 [ 34 ]
ISA Extensions:
|- 3DNow!/Ext [N,N] AES [Y] AVX/AVX2 [Y/Y] BMI1/BMI2 [Y/Y]
|- CLFSH [Y] CMOV [Y] CMPXCH8 [Y] CMPXCH16 [Y]
|- F16C [Y] FPU [Y] FXSR [Y] LAHF/SAHF [Y]
|- MMX/Ext [Y/N] MONITOR [Y] MOVBE [Y] PCLMULDQ [Y]
|- POPCNT [Y] RDRAND [Y] RDTSCP [Y] SEP [Y]
|- SGX [Y] SSE [Y] SSE2 [Y] SSE3 [Y]
|- SSSE3 [Y] SSE4.1/4A [Y/N] SSE4.2 [Y] SYSCALL [Y]
Features:
|- 1 GB Pages Support 1GB-PAGES [Present]
|- 100 MHz multiplier Control 100MHzSteps [Missing]
|- Advanced Configuration & Power Interface ACPI [Present]
|- Advanced Programmable Interrupt Controller APIC [Present]
|- Core Multi-Processing CMP Legacy [Missing]
|- L1 Data Cache Context ID CNXT-ID [Missing]
|- Direct Cache Access DCA [Missing]
|- Debugging Extension DE [Present]
|- Debug Store & Precise Event Based Sampling DS, PEBS [Present]
|- CPL Qualified Debug Store DS-CPL [Present]
|- 64-Bit Debug Store DTES64 [Present]
|- Fast-String Operation Fast-Strings [Present]
|- Fused Multiply Add FMA|FMA4 [Present]
|- Hardware Lock Elision HLE [Present]
|- Long Mode 64 bits IA64|LM [Present]
|- LightWeight Profiling LWP [Missing]
|- Machine-Check Architecture MCA [Present]
|- Model Specific Registers MSR [Present]
|- Memory Type Range Registers MTRR [Present]
|- OS-Enabled Ext. State Management OSXSAVE [Present]
|- Physical Address Extension PAE [Present]
|- Page Attribute Table PAT [Present]
|- Pending Break Enable PBE [Present]
|- Process Context Identifiers PCID [Present]
|- Perfmon and Debug Capability PDCM [Present]
|- Page Global Enable PGE [Present]
|- Page Size Extension PSE [Present]
|- 36-bit Page Size Extension PSE36 [Present]
|- Processor Serial Number PSN [Missing]
|- Restricted Transactional Memory RTM [Present]
|- Safer Mode Extensions SMX [Missing]
|- Self-Snoop SS [Present]
|- Time Stamp Counter TSC [Invariant]
|- Time Stamp Counter Deadline TSC-DEADLINE [Present]
|- Virtual Mode Extension VME [Present]
|- Virtual Machine Extensions VMX [Present]
|- Extended xAPIC Support x2APIC [ xAPIC]
|- Execution Disable Bit Support XD-Bit [Present]
|- XSAVE/XSTOR States XSAVE [Present]
|- xTPR Update Control xTPR [Present]
Technologies:
|- System Management Mode SMM-Dual [ ON]
|- Hyper-Threading HTT [OFF]
|- SpeedStep EIST < ON>
|- Dynamic Acceleration IDA [OFF]
|- Turbo Boost TURBO <OFF>
|- Virtualization VMX [ ON]
|- I/O MMU VT-d [ ON]
|- Hypervisor [OFF]
Performance Monitoring:
|- Version PM [ 4]
|- Counters: General Fixed
| 8 x 48 bits 3 x 48 bits
|- Enhanced Halt State C1E < ON>
|- C1 Auto Demotion C1A < ON>
|- C3 Auto Demotion C3A < ON>
|- C1 UnDemotion C1U < ON>
|- C3 UnDemotion C3U < ON>
|- Frequency ID control FID [OFF]
|- Voltage ID control VID [OFF]
|- P-State Hardware Coordination Feedback MPERF/APERF [ ON]
|- Hardware-Controlled Performance States HWP [ ON]
|- Hardware Duty Cycling HDC [ ON]
|- Package C-State
|- Configuration Control CONFIG [ LOCK]
|- Lowest C-State LIMIT [ 0]
|- I/O MWAIT Redirection IOMWAIT [DISABLE]
|- Max C-State Inclusion RANGE [ 0]
|- MWAIT States: C0 C1 C2 C3 C4
| 0 2 1 2 4
|- Core Cycles [Present]
|- Instructions Retired [Present]
|- Reference Cycles [Present]
|- Last Level Cache References [Present]
|- Last Level Cache Misses [Present]
|- Branch Instructions Retired [Present]
|- Branch Mispredicts Retired [Present]
Power & Thermal Monitoring:
|- Clock Modulation ODCM <Disable>
|- DutyCycle < 6.25%>
|- Power Management PWR MGMT [ LOCK]
|- Energy Policy Bias Hint [ 0]
|- Junction Temperature TjMax [ 100]
|- Digital Thermal Sensor DTS [Present]
|- Power Limit Notification PLN [Present]
|- Package Thermal Management PTM [Present]
|- Thermal Monitor 1 TM1|TTP [ Enable]
|- Thermal Monitor 2 TM2|HTC [Present]
|- Units
|- Power watt [ 0.125000000]
|- Energy joule [ 0.000061035]
|- Window second [ 0.000976562]
Uncore ratio min/max look pretty good, but actually all frequencies seem a bit off (including CPU) because the base-clock is calculated at 98.9, but AFAIK it should be 100.0. How is the base clock calculated?
Much of this tool shows is already included in the
turbostat
tool included in most distributions (but the UI is much nicer!) - but showing the uncore clock(s) would be something awesome and new.