Closed olejon closed 4 years ago
Will fix asap. Some dmi might be available with latest kernel 5.x
Great :-)
Just checked (w/ Smartphone) the minimum Kernel version is 4.18.x for those missing definitions (DMI_PRODUCT_SKU,...)
I will refactor code using conditions to let previous versions build through.
Also thinking about writing myself parts of the DMI parsing to offer the remaining strings
I usually install the Ubuntu LTS Enablement Stacks once they are out (newer kernel and Xorg/Wayland/Mesa/etc for desktops and just newer kernel for servers).
Pretty sure the one's in the repo now is > 4.18. On my phone now.
https://wiki.ubuntu.com/Kernel/LTSEnablementStack
The 18.04.2 and newer point releases will ship with an updated kernel and X stack by default for the desktop. Server installations will default to the GA kernel and provide the enablement kernel as optional. The 18.04 HWE Stacks will follow a new Rolling Update Model as documented at the following location: https://wiki.ubuntu.com/Kernel/RollingLTSEnablementStack Installing the HWE stack is simple: DESKTOP sudo apt-get install --install-recommends linux-generic-hwe-18.04 xserver-xorg-hwe-18.04
ANYWAY most Ubuntu LTS users don't do this manually.
And my installation has been totally stable. The open shipped with amdgpu
driver, which AMD maintains together with the community, is actually faster than installing their latest open one from their repo for exactly Ubuntu 18.04 so Ubuntu maybe optimize more or regression in later Mesa packages or other libs it updates, as it updates a ton of packages as well from the AMD repo, including Xorg stuff IIRC. The Pro driver is constantly beaten by the open one, also. Really don't need the Pro one anyway unless you do some specific stuff. I don't even need a dedicated GPU as I don't play games or use GPU-heavy software. Most of that does my Raspberry Pi's GPU in form of HW acceleration of my Terabytes of video stored on this exact workstation. Well the Android Emulator is one example, but the iGPU from Intel is waaaay good enough. Only difference I see is in glmark2
:P And of course vulkan-smoketest
.
Basically bought it intrigued by reading about how open and good AMD GPUs have become on Linux (especially the first one meaning it'll only get better), with everything plug and play, making it open enough to ship with the kernel, and actually fully working with the community, which NVIDIA does not (just randomly sometimes, the open driver is a mess if you want to use that, gotta check that table with supporting this but not that for everything - the closed really good and faster still, but must be installed as a separate package, and Wayland compatibility well, dunno).
Basically supporting AMD :-)
And Intel since all their HW is dead stable out of the box on Linux, including iGPU, much easier basically than Windows, and they actively contribute to the kernel for people to make tools such as your own, and even have created an own super fast distro, not for your daily driver but if supported by your CPU fun to check out! It has its own package system, more and more packages, including full GNOME desktop, all vanilla packages compiled for max speed on Intel cipset MB and CPU. Rolling release IIRC,
BTW will install the above and check if I get to it before you do :) Don't worry, no rush for me.
Just gotta say CoreFreq, although praised in the article I found it in, was way above my expectations! You should try to get it into the Debian/Ubuntu repos (can use dkms to allow updates to latest tested stable version of your code - or fixed version is probably preferred by distro maintainers, updated when distro is, as long as you make a well tested deb package from latest code, as it is already feature rich as heck, and probably systemd service to get approval I suppose but that isn't that hard when FOSS...).
EDIT: Found CoreFreq here: https://medium.com/@datamove/over-clocking-under-linux-67d04cfb0974
Hello, for your tests, fix is committed.
Thanks for responding so fast and fixing. I will test soon, with the kernel mentioned in the issue.
Confirmed building as before and working on Linux 4.15.0-52-generic
without installing the LTS Enablement Stacks.
intel-microcode
packageThanks again!
I have a suggestion. Basically toggling NMI Watchdog on demand, because... NMI Watchdog is there for a reason, as far as I have read I do not want it off permanently on such an important workstation + server.
My solution:
I have created a script like this as /usr/local/bin/cf
. I have added /usr/local/bin/cf
to the whitelist using visudo
so it does not require a password to run, so I can simply run sudo cf
.
#!/bin/bash
### CF SCRIPT IS RUN
bash -c "echo 0 > /proc/sys/kernel/nmi_watchdog"
sleep 1
insmod /opt/corefreq/corefreqk.ko
sleep 1
/opt/corefreq/corefreqd &
sleep 2
su -l -c "/opt/corefreq/corefreq-cli" MYDESKTOPUSERNAME
### CTRL+C IS PRESSED TO STOP THE CLI
sleep 4
rmmod corefreqk.ko
sleep 1
bash -c "echo 1 > /proc/sys/kernel/nmi_watchdog"
The sleeps are necessary to give some time before starting and stopping. I have added a little extra because you never know. Sometimes it may fail to rmmod
the module if only 2 seconds after quitting the cli, for instance. As you can see I copy the 3 necessary files to /opt/corefreq/
after make
.
BTW: The reason for bash -c
is that running echo X > Y
does not work with sudo
in these cases. Pages often suggest running sudo su
first, but that is ugly, not for scripts made to avoid prompts, and just plain wrong. Those using sudo su
should run sudo -s
or sudo -i
, depending on need. That is the right way, NEVER sudo su
(some exceptions may exist, usually the first time doing something then never again).
Clue is NMI Watchdog toggle - on-demand - like:
When starting, first run:
bash -c "echo 0 > /proc/sys/kernel/nmi_watchdog"
When everything is stopped/unloaded, run as last command:
bash -c "echo 1 > /proc/sys/kernel/nmi_watchdog"
Kernel confirmes in dmesg
and journalctl -b --no-pager
that NMI Watchdog is enabled again with:
kernel: NMI watchdog: Enabled
Pretty sure you can deactivate/reactivate NMI Watchdog in corefreqd
daemon on start/stop? At least it can be an option? People should at least know the consequences of NMI Watchdog off permanently, and that it can be toggled on-demand instead of kernel parameter.
Add it to README.md
?
Thanks, I will process your returns.
Meanwhile, nmi_watchdog is not the source of the incompatibility issue; it just rewrites periodically the fixed counter register. And that biases the readings. Evidence can be found in the Kernel source code.
As a workaround you can also build CoreFreq with directives to use other registers: APERF and MPERF
Just type make help
You can then keep the nmi_watchdog while corefreqk.ko
reads MSR_IA32_APERF
and MSR_IA32_MPERF
Thanks. Just adding a little information about this to README.md
would seem enough. It's a tool for power users anyway. Maybe a link to what the NMI Watchdog does. You can maybe add my script to the repo as an example if you want to. My solution works just fine so I won't touch the build process. I don't use CF that often. It does replace a lot of other BASH aliases I've created for frequencies and temperature, updated every X seconds. I use them still to verify (frequencies polled from the kernel and temperature from sensors
). Also having the GUI PSensor program is nice. Lots of flexibility. Uses lm-sensors, AMD/NVIDIA APIs if found and udisks or udev for disks temperature, no need for the hd monitor daemon package. One nice thing is that you can set alarms if a certain part, say the CPU package or individual core goes above a certain temperature. A secure way to avoid parts being destroyed if HW and kernel modules built in protection fails and a heavy process has freezed and causes 100 % usage, especially if over clocked and basic protection like voltage regulator is disabled in UEFI. That heats up the CPU extremely fast under stress -c numcores
. Just set command to run, if above X °C, run bash -c reboot
for instance. Could be something for this tool, the daemon at least. Well AFAIK CF doesn't write any configuration, so would need that. htop
does. Nice since I always change the settings. Basically the only basic monitoring tool CF can't replace, but CF isn't really meant to either I suppose.
BTW I can probably just fine host CF as a deb in my repo (only for packages that never are in the Debian/Ubuntu repos). Build it from latest source, (tested by me first on Debian and Ubuntu) on first install and add to dkms, or simply ask user to apt install --reinstall corefreq
after kernel update. I saw a PPA it seems you made but it's totally out of date, and no link to it from here. But could use it as a base for my deb. Of course only if you're interested, like having a link to an up to date CF repo for debs.
Getting your support for distributions repo will be a great help. Indeed I want to focus on R&D.
CoreFreq is monitoring thermal thresholds, those which are Processor internal such as thermal throatling. For instance if temp reaches tjmax, HOT
will light on. Press the H
key to clear the associated event register bit.
I don't think about programming what other monitoring software are doing much better. The goal of CoreFreq is more to offer features which are not pretty centralized in the Linux space: Processor + BIOS + OS + Performance Although I'm noticing that CoreFreq is used in areas I was not expecting to see it: Trading, Production,...; CoreFreq remains an experimental open source subject and I have tons of idea not developed yet.
In your script I don't see when the daemon is requested to close. It may explain why the driver has shutdown issue. Shared memory is not cleanly released.
Please give a look into corefreqd.service
: the daemon is listening to signals.
Daemon is then sending a shutdown signal to the client ==> It's a cascade procedure.
Disabling the nmi_watchdog from the user-space won't guarantee the counter register is freed by kernel. That's why I recommend to blacklist the counter usage straight from the Kernel boot command line. However, as stated previously, you can build CoreFreq with APERF/MPERF counters and let nmi_watchdog runs.
Give a look into the other monitoring GUI tools documentation and source code for hardware probing incompatibilities. I have started a list of CoreFreq conflicts into the Wiki - Software incompatibility
Regards Cyril
First of all, with this script, the daemon, for some reason, shuts itself down like immediately after pressing Ctrl+C, BUT only when the commands are run in a script like that, even when the CLI program is run as the desktop user. Easy fix to be 100 % sure on all systems anyway if you read below.
EDIT: Without using any *.service
file, just those 3 files. No signals then I suppose.
HOWEVER, when the commands are run separately one by one, all commands as root except the CLI program which is run by the desktop user either using su -l -c
or simply in a desktop terminal, it does NOT shut itself down. Interesting. But running this script on my system, this way, always makes the daemon shut itself down, before the module is unloaded, because it won't unload of course if corefreqd
is still running, it then refuses to rmmod
the module (still in use), but that never happens when run as a script this way on my system, only when the commands are run one-by-one in 2 terminals, one for the root commands and one for the CLI program, or using su -l -c
in same terminal after running the daemon.
Easy to fix to be 100 % sure. Before rmmod
, add command:
pkill corefreqd
Note: pkill
does not give any output if the wanted-to-kill command is not running either so no need to send unnecessary output to /dev/null
or anything, but one can use pkill || command
or pkill && command
and such, or combine with pgrep
.
This I discovered and have now confirmed, WITHOUT the pkill
command, by running ps aux | grep coref
and also lsmod
, which gives that both the CLI program and the daemon is not running anymore when closing the CLI using Ctrl+C, NONE of them, SO fast that I can not even switch to the other terminal using Alt+Tab to run ps
and see it - it is ALREADY shut down. The module as per the script not unloaded before 4 seconds, so it is a "safety buffer" to 100 % avoid any error. lsmod
confirmes it is unloaded. It will say still in use
when the daemon is running so...
I can not explain why, but that is why I have not added a command to shut down the daemon - because it does it itself on my system. The sleep time of 4 seconds was because a few times just got the message that the module was still in use with a sleep time of just 1 second for instance. But 4 seconds is a sure "safety buffer", more than enough seconds, just added to NEVER ever get that the module is still in use.
Now my script is like this and seems to work just fine:
#!/bin/bash
bash -c "echo 0 > /proc/sys/kernel/nmi_watchdog"
insmod /opt/corefreq/corefreqk.ko
/opt/corefreq/corefreqd &
sleep 2
su -l -c "/opt/corefreq/corefreq-cli" MYDESKTOPUSERNAME
pkill corefreqd
sleep 2
rmmod corefreqk.ko
bash -c "echo 1 > /proc/sys/kernel/nmi_watchdog"
Both ps
and lsmod
shows everything is killed and unloaded fine. No errors from kernel or terminal.
Only output from terminal (run and exit):
CoreFreq Daemon 1.57.0 Copyright (C) 2015-2019 CYRIL INGENIERIE
...terminated.
Only output from dmesg (run and exit):
CoreFreq(2:-1): Processor [ 06_5E] Architecture [Skylake/S] CPU [4/4]
CoreFreq: Unload
NMI watchdog: Enabled [again]. Permanently consumes one hw-PMU counter.
Note 1: Kernel does not report when NMI Watchdog is disabled, only when enabled, but can be verified disabled by checking value of /proc/sys/kernel/nmi_watchdog
of course, which is also 0 when using the kernel parameter.
Note 2: As you can see I have removed some unnecessary sleep commands (on my system at least), and shortened sleep time to CLI to 2 seconds and exit to 2 seconds only. Not annoying for such a tool. Now, this should maybe be tested on a slow system, slow in every way, not sure if 100 % enough to tweak UEFI and kernel to "make a slow system", and since CoreFreq is logically not made for VMs, it is harder then. Basically having both more sleep commands and sleep seconds makes sure it should work on slow systems as well.
Does it work correctly on 32-bit? If so I may test on a really old laptop... Maybe CoreFreq does not support that CPU even, like not enough "info and toggles" in the kernel for CoreFreq to use, or CoreFreq is aimed at newer CPUs than that one.
Regarding NMI Watchdog: I was not aware of that, hmm... And do not have this high level of understanding you have of CPUs and the Linux kernel. Just seen many tutorials, blog posts, wikis etc around monitoring tools and overclocking tends to use on-demand toggles, like my command, or disabling NMI Watchdog in sysctl
or something in a way that makes it reversible without a reboot, not as a kernel boot parameter.
In MY case I do not see any difference if it is enabled or disabled. I will double check. Especially interesting then to compare CoreFreq values when NMI Watchdog is disabled as a kernel boot parameter compared to not disabled in any way.
Regarding deb package: Have a pretty good idea. Instead of in a repo, you simply host it here on GitHub, since updating it will basically never be needed after it is well tested (2 of the 4 dependency packages in question may change names on newer versions of Debian/Ubuntu, but not like every year, so, and easy peasy to fix then, because you can add packages with alternative names).
control
file, we only need 4 small dependencies on Debian/Ubuntu derivatives. One is build-essential
which is a metapackage that depends on basic build from source tools, but does not bloat the system or use much bandwidth or space/usr/bin
or /usr/sbin
, and the copyright file standard for debs)postinst
script does all the magic, including acting as an upgrade script if CoreFreq already found to be installed (easy check), cloning latest from GitHub and compiling into suitable folder like /opt/corefreq
prerm
script simply deletes /opt/corefreq
. The 2 other files are automatically deleted as they are part of the package.Note: This is EXACTLY what the VirtualBox Guest Additions install script for VMs under VirtualBox does, puts files in /opt
, compiles Guest Modules, I think it now adds them to DKMS, BUT when a new version of the VB Guest Additions are available, always when a new version of VirtualBox is out, then you get a warning to install the new VB Guest Additions (and reboot the VM). I always use the CLI way on Linux VMs, run the script directly as root and reboot. IDK if running the software you are asked to run under certain Desktop Environments, when inserting the virtual CD with the guest additions, actually work, and if so, I am sure it just opens the default terminal and runs the script.
Not like on Windows, where it has a standard GUI installer with "Next > Next > Bla bla > Compiling drivers > Windows reacts to new drivers > Allow > Bla bla > Next > Reboot", or something in that typical order. I do not use VirtualBox anymore in Linux, use KVM+QEMU+libvirt, which is way better of course. Just on macOS I use VB.
EDIT: Some changes, typos, formatting etc in post above, sure there are more typos :P
And yes, that's what great with CoreFreq, it does not try to do things other tools do as good as possible anyway. It pulls together things that otherwise needs various commands/scripts/tools.
I have seen myself small projects I made like a decade ago and put on the Internet being used in ways I never thought. Even pages where people offer > 500 USD to any freelance developer taking such jobs - to modify it to do something more or integration... Always fun to find, especially small web apps I have made and by searching finding it hosted on some university loooong away from Norway :P Although stated not supported in any way except read the docs and be aware that the code and the docs will become deprecated with newer libs, server software, web standards etc. People find me through all sorts of channels, and basically want "free help" to rewrite a ton of the code for just their use case... NOPE! And darn that Facebook Messenger allowing for non-friends in any way to send a super long "Message requests", just as an e-mail basically, but more intrusive/pushing IMO...
My bad, regarding asking about 32-bit, didn't even read the main description - that it is made for 64-bit CPUs.
I ran make
so that NMI Watchdog can be on. At a glance it seems to show the same information as before. "SMBIOS data" shows "To be filled by O.E.M." but pretty sure it was like that before, and if I understood correctly could show the info if kernel is >= 4.18, which can be done just by installing LTS Enablement Stacks (one command). All other data shows, basically. Although my RAM is running at 3200 MHz (Intel [Certified] XMP mode), but shows standard DDR4 value - 2166 MHz.
Q: That data does not really matter for me anyway, but does it make CoreFreq more accurate if available?
SMBIOS data is just informative in CoreFreq
It's a work in progress as the main idea is to backport some code of my previous project CoreMod (see my repo)
Because DMI data are not fully fulfilled by motherboard manufacturers, I don't rely on them to compute the various indicators.
As I understand you have a Ryzen processor and I still don't produce DRAM information yet.
Currently searching for the family 17h IMC registers, specifications, and so on
Ok. No, as written in first post:
- Straight forward unlocked SkyLake Workstation CPU: Intel Core i5-6600K
Great, have you been able to overclock or downclock your i5-6600K as you wish ?
Can you also post screenshots showing at least :
corefreq-cli -s -m -M -k -B
Can you please post these screenshots and outputs for a new section in CPU support
Thanks for helping CoreFreq Cyril
Yes I can do that for you, and a MacBook Air Mid-Summer 2013 13" top specced CPU and RAM, running Fedora 30 and CoreFreq works.
BUT, to do what you ask, I need your help on something I wanted to ask you for your expertise anyway. I am a pro-user of Linux, but do not have the deep understanding of CPUs + Linux Kernel as you do. I will divide into 2 sections, one for "Strange Phenomenon", and one for "What I have Done and Tried".
Strange Phenomenon and the problem:
Intel XMP
activates an overclock that always works (CPU & RAM).Intel XMP
, which is STILL overclock, EVEN WHEN Intel XMP is disabled in UEFI...Intel XMP
is not enabled, it resets to Intel XMP
"standards", meaning STILL CPU overclocked for ALL cores to Turbo Boost to 3.9 GHz, but no more to 4.4 GHz for instance, or even 4.1 GHz, Sync All Cores to Value = 44/41, or whatever much lower value I choose manually, for example it also happens it set to Value = 41 (4.1 GHz). All tools shows it can go to 4.4 but goes to 3.9, dead stable and very low temperatures, below 70 always - that after X hours after boot.Intel XMP
is activated, it sets for CPU "Sync All Cores" = "Auto", meaning all cores can Turbo Boost at the same time to max standard Turbo Boost, and have no problem keeping it up constantly when running stress -c 4
, because the temperature is well below 80 Celsius, actually it keeps itself around 60-69, way below what it can do before CPU throttling happens, which happens on Intel CPUs when reaching 100 Celsius, seen very well using CoreFreq on my MacBook Air, it throttles immediately when 1 core reaches 100 Celsius, which it does super fast just maxing out 1 core with stress -c 1
, as that laptop does not have, like all MacBooks, good cooling at all, they all throttle as you may have seen on YouTube recently with new MacBook Pros and even iMacs (Pro included), Apple's specs are total lies, never close to achieve advertised Turbo Boost GHz which can be up to 5 GHz, spikes like crazy on something that costs thousands of USD and may reach max Turbo for 1 second once in a while... I understand it on my Air but...OC Tuner II
, which checks my cooling and suggests a max overclock. For me it sets "Sync All Cores" at 44, meaning all cores can Turbo Boost to 4.4 GHz, and this works fine with temperatures between 75-83 Celsius (checked with sensors
and CoreFreq etc), WELL within safe zone, nowhere near 100 Celsius that should cause throttling, but I have never managed to reach 100 Celsius (maybe with the CPU SVID disabled I can but never deared...).Intel XMP
off or on with manual overclock, same result, on then means it just adjusts UEFI automatically to my correct RAM Voltage and MHz, 1.35 V and 3200 MHz DDR4, it does not adjust the CPU settingsturbostat
, i7z
etc all confirms this, also looking in /sys
and /proc
values, they all show (tools as well) everywhere that Max Freqency is 4.4 GHz, it just does not get there, but stays dead stable on 3.9 GHz, with temperatures so low it does not makes sense for any system or BIOS component to do that (stays at 60-70 Celsius, fans barely make noise).Why the heck does this happen after a random time when temperatures are all good, and the same, according to all reliable tools available?
What I have Done and Tried:
thermald
of course. Stupid thing to have on a PC without a battery anyway. CPUs and the kernel has plenty of safeguards against overheating, like my MacBook throttling all cores immediately when just 1 core reaches 100 Celsius. AFAIK Intel allows 1 CPU core to stay a little while at 100 or over, and the "relax it a little bit extra" and send the job to another core, but this happens so quickly one can not see it without changing the time in UEFI./sys
or /proc
the command to shut down the system, poweroff
(systemd), so the kernel knows how to cleanly shut down if it is overheatingdmesg
and no values in /sys
or /proc
related to the CPU changes when this happens. It still says it can go to 4.4 GHz (or whatever manually set ratio).turbostat
and my alias freq
which runs: watch -n 2 cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_cur_freq
, can run for a longer time than I have bothered at 4.4 GHz, never throttlingintel_rapl, intel_powerclamp, intel_cstate, intel_rapl_perf
intel_cstate
module, and can not be 0, which does nothing on my systems, and on some systems disables C-states completely, so it is always at C0stress -c 4
than on kernel 4.15, while maintaining the same performance... W00t? From like 70-80 to just above 60 Celsius fully stressed on all cores. So silent fans I can not comprehend why the system thinks anything above 3.9 GHz should be clocked down to 3.9 GHz when even 4.4 GHz stays mostly under 80 Celsius.I suspect these:
intel_pstate
and intel_idle
. I suspect intel_pstate
, which has excellent documentation here for kernel 4.18, although suspected both before, but checked stuff in /sys
and/or /proc
and could not find anything intel_idle
should do "wrong". The thing about intel_pstate
, since not a module I can not just blacklist it, but it has A TON of kernel parameters, all documented, but the language is unclear for me and no parameter seems to completely disable it either, it just does X or Y which I do not understand or wamt to try them all, some must be used in combination... You may do? This PC runs server services so as little downtime as possible basically, if possible.< toggle >
and things like "Hardware-Controlled Performance States" makes me interested, it says ON. I guess these are C-states? Anyway, maybe there is a toggle you can recommend to try in CpuFreq, that simple? Seems to be stuff I have found in /sys
but yeah, do not know exactly what they do, using CpuFreq is easier than messing with echo X > /blabla/bla
.If it is the CPU protecting itself, why does it so when barely hitting above 80 Celsius and only that if stressed for a while? CpuFreq never says "HOT". On the MacBook it clearly does when hitting 100, as it should. Never hitting that here.
Remember: The thing is to reproduce this I must reboot, which immediately makes manual overclock to ANY ratio work again, then basically just wait and do a test now and then, using stress -c 4
and checking one of turbostat
my alias freq
or CpuFreq. They ALL show the same when it is working and later when underclocked without trace or reason.
There are some log entries I have found that happens on boot, that has to do with the CPU, but none of these modules/kernel stuff posts anything when it just blindly downclocks my overclock to what is still an overclock, but always the default Intel XMP standard overclock (as said it can be off in UEFI, but those are the values). Can post them if interested.
I hoped to write this more structured, but things come up in my head. Hope you understand the problem at least, and may have suggestions. Surely I have forgotten things I have tried, but at least everything in UEFI, but combinations might have been wrong, although impossible try all, but tried those who can be a reason for this in different combinations (and yes it has the latest BIOS/UEFI). No success so far. Should not have to do with the DE (GNOME) as it is like this before login as well (just GDM running), no session active if SSH-ing in.
Many things to read in your post !
First I don't have an XMP to face your issue but I'll experiment as soon as I put my hands on it... Meanwhile what you describe can be linked to the HWP management.
If my Cli says HWP is ON then change its profile to low (zero value) and stress a single Core again (press F3)
In Processor window, try to play with HWP min, HWP max and HWP target.
Intel says to set Min=Max=Highest HWP
HWP can be enabled once, it may explain why you are recovering after reboot
Til HWP is disabled, the legacy OSPM, its profile (from 0 to 15), and its target ratio are in effect.
Edit: See this Processor screenshots about the various HWP settings.
The Energy Policy governs the HWP profile, select 0
Report the HWP capabilities to the HWP operating ratios: Min=Lowest, Max=Highest , Target=Highest
Keep an eye on the Vcore while setting the policy: you should notice the immediate effect
It may also help to let CoreFreq manage fully the Processor with Register_CPU_Freq=1 Register_CPU_Idle=1
corefreqk.ko arguments.
These require to blacklist other drivers. See Q&R of Readme
I have now disabled intel_pstate completely at boot as a kernel parameter. One can also just disable HWP. And set that the CPU can not go below C1. If that works, then I'll play more with, if just say disabling HWP in the intel_pstate module is enough, or that + active vs passive mode (latter requires HWP off as well), basically play with intel_pstate kernel parameters. I don't want it completely disabled if not necessary. It has its benefits (like now there's no info from it in /sys)
Because now it uses acpifreq as cpufreq driver (IIRC, I'm on my phone, but another driver), not Intel cpufreq, hence the "old standard governors".
intel_idle is set to disabled (confirmed by kernel at boot). This enables all usual governors, like it now defaults to "ondemand", while a modern Intel CPU system has just "powersave" and "performance", and will default to "powersave", which basically works as ondemand but actually better as I can read, more optimized, at least where power is a concern. Only other option by default is "performance". I doubt changing to that will help. Must be changed for each core as well. There is another thing when intel_pstate is enabled regarding "energy efficiency". Those toggles are not available for me now with intel_pstate disabled.
Keeping C-state to never go below 1 is no concern for me. Yes, it uses more power, but it's not a laptop and Norway is 100 % hydro so... speeds up the system response as going from deeper C to C0 takes more time (it's actually noticable). Pages for low latency servers, like Red Hat docs etc recommend it. It does allow the CPU to idle at minimum speed as well (800 MHz), although somewhat more likely to spike to ~1,000 MHz+, probably the less optimized ondemand governor. Doesn't matter for me though.
Please see my previous answer, I've edited the HWP instructions
BTW: Don't bother with Intel XMP. It's just a "Intel certified overclock and detect faster than standard DDR4 RAM chips" as far as I understand it. My RAM chips, which supports 3200 MHz, way above the default DDR4, says "Intel XMP certified", which was what led me to enable XMP in UEFI in the first place. Nice way to see the UEFI then adjusting voltage and MHz to exactly what it says on the chips themselves. What was unknown to me was that it overclocks by default the CPU as well (with the warning about sufficient cooling it's logical), but as said this set Sync All Cores = Auto. So basically leaves to the UEFI and what it knows about the CPU I guess to set a supported overclock, which at least in my case defaults to that all cores can Turbo Boost instead of just one, to the max advertised TB speed
Will do after seeing if this "total disabling" works. See my post above regarding XMP.
Anyway if it's HWP, as said I'll just disable it as one can with intel_pstate kernel parameter. Although cool to play with at runtime, according to the Intel docs in the kernel docs, if HWP is enabled at boot it can not be fully "played with" at runtime. But I'll check, would be cool if so, and to know what kernel toggles CpuFreq uses then to achieve this.
HWP is a processor feature that your UEFI BIOS can let disabled. Be sure no other driver will enable this register afterwards. Then, the Processor is in OSPM where you can set the BIAS HINT for Energy Policy and the Target P-State for desired frequency.
Next:
Test:
CoreFreq brings you all these features
Hooray! After an uptime of over 8 hours, running stress -c 4
(still the tool that best stresses my cores to the max), shows that it gets to 4.4 GHz without any problems!
turbostat
(which I've made a tbs
BASH alias of to show only the rows I'm interested in and whitelisted for sudo) is the most reliable tool there for me since it's developed by Intel and FOSS, but all tools show the same, including CpuFreq and probing /sys/xxx
regularly as my alias described in a previous post.
I'm pretty sure there's no setting for HWP in UEFI, (which on a decent ASUS MB has a lot of settings, all you need, in "Advanced Mode"), although it seems like it should be a setting, and if it's actually there it's so hiding under a weird name with a bad description at a weird place in the settings.
As with C-states that's no problem, Linux would surely ignore the UEFI setting there as well. Since intel_pstate
has a parameter for exactly just that, disabling HWP, no problem then...
I've, as I would anyway, gone through all UEFI Advanced options, and of the ones having to do with the CPU, which are in various places, no one mentions HWP in the name or description. I would probably recognize something similar (different name for it), but can't think of any.
SpeedStep and Turbo is if course on. As long as HWP is set to off as a kernel parameter, I doubt the rest you write is necessary at all. I read the docs for intel_pstate and yes it uses BIAS hints etc (very complicated docs), but as I understand it these are just my OC settings/values read from what set in UEFI/BIOS anyway (where it is of course set more intuitive with logical names and good descriptions).
With "Sync All Cores" one can't set "Turbo ratios" in plural, just one that goes for all cores. Everything above 35 multiplier by default become Turbo, as it should since default core frequency without is 3.5 GHz (35).
So, if running totally stock Intel Turbo Boost, a tool like turbostat
(preferred to be sure since basically a tool developed by Intel), shows it's in 39/38/37/36 multipliers mode, meaning if all cores are stressed it can go to 3.6 GHz max, and if just 1 core is stressed it can go to 3.9 GHz, which is what Intel etc advertise as their "Turbo Boost speed", although it's really only possible for a single thread at a time. In real life use, unless doing single threaded work, since most applications take advantage of all the cores, these turbo spikes are shared between cores, so no core never reach full Turbo, but they "share between them the extra 0.4 GHz".
This of course all changes when Sync All Cores is set, or Per Core is set and the multiplier the same for all, or there I can set say 44/43/42/41 instead. But syncing all cores to 44 (same as 44/44/44/44), is of course better when the CPU can handle it with the cooling I have, which is nothing spectacular, but nicely designed big tower chassis + the pretty massive CoolerMaster heatsink+ fan that came with the CPU.
There is no Intel stock cooling solution for i5 and i7 for this CPU at least, just the i3 model, which I have laying around, both the CPU and the Intel stock cooling solution. It is a 2-core with hyperthreading to make it seem like 4. It runs by default at a higher clock speed as well, but that is normal for an i3 compared to an i5 or even i7 some times. As you know hyperthreading heats up the CPU a lot, but no problems when only 2 cores. There's a bigger change if one goes from an i5 typically without hyperthreading to an i7, usually not more cores, and just a little higher max speed and max turbo speed, but it also has hyperthreading so say for such a family it would have 4 cores + 4 HT, and those heat up pretty quickly if say running stress -c 8
while that does not stress an i5 without HT any more than when running the same command with -c 4
.
These multipliers I can change individually, which is "Per Core" (default for Stock Default Turbo Boost also) instead of "Sync All Cores" (which I prefer when it can OC so much without any problems.
Sync All Cores mean greys out the other 3 values and when I set 44 as the (first) multiplier for instance, they all change graphically in UEFI, and a tool like turbostat
confirms on start that all are on 44 multiplier.
They can all be the same value or the first value is max, then the next ones must be either the same or a lower than the previous going down. For example the ASUS built in OC Tuners (I mentioned OC Tuner II suggesting Sync All Cores = 44 multiplier), the OC Tuner I suggests a "safer" overclock, with typically 43/42/41/41.
In real life usage one wouldn't notice that, the TB by default is shared among the cores, and when necessary one or more can spike.
But maybe it's not there since just as with C-states, the OS can ignore the BIOS on that, which Linux does, always. Several pages says, and they are correct, that for example disabling C-states or setting the CPU to not be able to go deeper than C1 will not work for Linux when set in UEFI, although all the settings are there. They must using kernel the different parameters for intel_idle
(and possibly some for intel_cstate
, which is a module, blacklisting it does not make Linux honor UEFI settings since it's really up to intel_idle
parameters) which do just the same.
In UEFI I can fine tune to things like disable a specific C-states, say C6, in UEFI, or C-states support all together, or max deep level, but as said the Linux kernel ignores them Windows folow them. According to one page "Changing C-state settings in BIOS/UEFI only has effect on Windows", which is exactly what I'm seeing, not that it's a problem. Even changing all the C-state settings in UEFI to "do something ridiculous" (something that if they take effect it should be clearly visible in tools and even noticable in usage). But yeah, page is right. Linux does not honor these, but according to the page Windows does.
NO PROBLEM again since these for Linux can be adjusted in/by the kernel.
Well how I want my C-states are up to my no? I like it staying at C1 when idle TBH. Everything even snappier when I thought it couldn't become. Booted already in 2 seconds etc, but it is noticable in tiny operations when otherwise the CPU would have to go all the way from C7 to C0.
What do you mean by disabling clock modulation?
What do you mean by disabling clock modulation?
This ODCM setting: if enabled in conjunction with DutyCycle percent will reduce performances
Btw: do you really want to help CoreFreq ?
Yes I do. You asked for if I can overclock my CPU freely, and for screenshots, and now I can give you such screenshots, after finding the sinner, HWP. IDK why I should touch Clock Modulation if overclock works as it should? On my phone but will check what it says.
I also want to make that deb package for you. Should be pretty easy. I'm used to packaging those + Arch Linux packages (but that you already have in AUR).
But yeah, first I needed some help here, around intel_pstate and HWP, although I seemed to have fixed it myself I don't like it without an explanation from you as the docs are too complicated or obscure....
Although it's completely disabled now I'm pretty sure I can enable it again, also intel_idle and default C-states, and it will keep my overclock. That will be my next test. Well actually first removing all kernel parameters and see if using CoreFreq to disable HWP at runtime, even with HWP enabled at boot, fixes it, which it seemed you meant it should be able to? Or did I not understand that one? At least it can be toggled there so... Kind of contradicts what I've read in the docs, but might have misunderstood them.
So to add, thanks for explanations! Why this is not on any page, and googling "right", which I'm pretty good at, usually finding an answer with the first search, although technical (my previous Googlers over at Silicon Valley would laugh if I wasn't :P), is beyond me. Only found out about intel_pstate reading about one of the other modules in the kernel source (while pstate is built-in), or maybe saw it just mentioned in a post about OC on Linux, but without any more about that one may have to tune it, nothing about HWP for instance.
The point is, I can't be the only one experiencing this on such a popular motherboard and CPU, and no search gives that any other sees the same and a solution to it. Weird! Maybe because it happens in silence and people don't check their actual CPU performance after it has been checked after boot, since this occurs X hours after without any trace/hints in logs, kernel etc
This ODCM setting: if enabled in conjunction with DutyCycle percent will reduce performances
Clock Modulation
already says: ODCM <Disable>
. IDK if because of intel_pstate
being disabled, or a result of a setting in UEFI. Will revert back to defaults, no kernel parameters, meaning enabling intel_pstate
and intel_idle
and all C-states, and see what the relevant values then are.Hardware-Controlled Performance States
now says: HWP <OFF>
. This I know said <ON>
before so disabling intel_pstate sure did that.Of course going back to defaults will cause the problem again, but will see if can be fixed just using the toggles in CpuFreq. If not, simply see if it is enough to boot with kernel parameter intel_pstate=no_hwp
.
Hope these settings to reach Turbo (even w/ HWP) will help.
Processor [Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz]
|- Architecture [Skylake/S]
|- Vendor ID [GenuineIntel]
|- Microcode [ 186]
|- Signature [ 06_5E]
|- Stepping [ 3]
|- Online CPU [ 8/8 ]
|- Base Clock [100.25]
|- Frequency (MHz) Ratio
Min 802.00 [ 8 ]
Max 3408.50 [ 34 ]
|- Factory [100.00]
3400 [ 34 ]
|- Performance
|- OSPM
TGT 4010.00 < 40 >
|- HWP
Min 100.25 < 1 >
Max 4010.00 < 40 >
TGT 4010.00 < 40 >
|- Turbo Boost [UNLOCK]
1C 4010.00 < 40 >
2C 3909.75 < 39 >
3C 3809.50 < 38 >
4C 3709.25 < 37 >
|- Uncore [UNLOCK]
Min 802.00 < 8 >
Max 4010.00 < 40 >
|- TDP Level [ 0:3 ]
|- Programmable [UNLOCK]
|- Configuration [ LOCK]
|- Turbo Activation [UNLOCK]
Nominal 3408.50 [ 34 ]
Instruction Set Extensions
|- 3DNow!/Ext [N,N] ADX [Y] AES [Y] AVX/AVX2 [Y/Y]
|- AVX-512 [N] BMI1/BMI2 [Y/Y] CLFSH [Y] CMOV [Y]
|- CMPXCH8 [Y] CMPXCH16 [Y] F16C [Y] FPU [Y]
|- FXSR [Y] LAHF/SAHF [Y] MMX/Ext [Y/N] MONITOR [Y]
|- MOVBE [Y] MPX [Y] PCLMULDQ [Y] POPCNT [Y]
|- RDRAND [Y] RDSEED [Y] RDTSCP [Y] SEP [Y]
|- SGX [Y] SSE [Y] SSE2 [Y] SSE3 [Y]
|- SSSE3 [Y] SSE4.1/4A [Y/N] SSE4.2 [Y] SYSCALL [Y]
Features
|- 1 GB Pages Support 1GB-PAGES [Present]
|- Advanced Configuration & Power Interface ACPI [Present]
|- Advanced Programmable Interrupt Controller APIC [Present]
|- Core Multi-Processing CMP Legacy [Missing]
|- L1 Data Cache Context ID CNXT-ID [Missing]
|- Direct Cache Access DCA [Missing]
|- Debugging Extension DE [Present]
|- Debug Store & Precise Event Based Sampling DS, PEBS [Present]
|- CPL Qualified Debug Store DS-CPL [Present]
|- 64-Bit Debug Store DTES64 [Present]
|- Fast-String Operation Fast-Strings [Present]
|- Fused Multiply Add FMA|FMA4 [Present]
|- Hardware Lock Elision HLE [Present]
|- Long Mode 64 bits IA64|LM [Present]
|- LightWeight Profiling LWP [Missing]
|- Machine-Check Architecture MCA [Present]
|- Model Specific Registers MSR [Present]
|- Memory Type Range Registers MTRR [Present]
|- OS-Enabled Ext. State Management OSXSAVE [Present]
|- Physical Address Extension PAE [Present]
|- Page Attribute Table PAT [Present]
|- Pending Break Enable PBE [Present]
|- Process Context Identifiers PCID [Present]
|- Perfmon and Debug Capability PDCM [Present]
|- Page Global Enable PGE [Present]
|- Page Size Extension PSE [Present]
|- 36-bit Page Size Extension PSE36 [Present]
|- Processor Serial Number PSN [Missing]
|- Restricted Transactional Memory RTM [Present]
|- Safer Mode Extensions SMX [Present]
|- Self-Snoop SS [Present]
|- Time Stamp Counter TSC [Invariant]
|- Time Stamp Counter Deadline TSC-DEADLINE [Present]
|- Virtual Mode Extension VME [Present]
|- Virtual Machine Extensions VMX [Present]
|- Extended xAPIC Support x2APIC [ xAPIC]
|- Execution Disable Bit Support XD-Bit [Present]
|- XSAVE/XSTOR States XSAVE [Present]
|- xTPR Update Control xTPR [Present]
Technologies
|- System Management Mode SMM-Dual [ ON]
|- Hyper-Threading HTT [ ON]
|- SpeedStep EIST < ON>
|- Dynamic Acceleration IDA [ ON]
|- Turbo Boost TURBO < ON>
|- Virtualization VMX [ ON]
|- I/O MMU VT-d [ ON]
|- Hypervisor [OFF]
Performance Monitoring
|- Version PM [ 4]
|- Counters: General Fixed
| 4 x 48 bits 3 x 48 bits
|- Enhanced Halt State C1E < ON>
|- C1 Auto Demotion C1A < ON>
|- C3 Auto Demotion C3A < ON>
|- C1 UnDemotion C1U < ON>
|- C3 UnDemotion C3U < ON>
|- Frequency ID control FID [OFF]
|- Voltage ID control VID [OFF]
|- P-State Hardware Coordination Feedback MPERF/APERF [ ON]
|- Hardware-Controlled Performance States HWP < ON>
|- Capabilities (MHz) Ratio
Lowest 100.25 [ 1 ]
Efficient 1303.25 [ 13 ]
Guaranteed 3408.50 [ 34 ]
Highest 4010.00 [ 40 ]
|- Hardware Duty Cycling HDC [ ON]
|- Package C-State
|- Configuration Control CONFIG [ LOCK]
|- Lowest C-State LIMIT [ 8]
|- I/O MWAIT Redirection IOMWAIT [Disable]
|- Max C-State Inclusion RANGE [ 8]
|- MONITOR/MWAIT
|- State index: #0 #1 #2 #3 #4 #5 #6 #7
|- Sub C-State: 0 2 1 2 4 1 0 0
|- Core Cycles [Present]
|- Instructions Retired [Present]
|- Reference Cycles [Present]
|- Last Level Cache References [Present]
|- Last Level Cache Misses [Present]
|- Branch Instructions Retired [Present]
|- Branch Mispredicts Retired [Present]
Power & Thermal
|- Clock Modulation ODCM <Disable>
|- DutyCycle [ 0.00%]
|- Power Management PWR MGMT [ LOCK]
|- Energy Policy Bias Hint < 0>
|- Energy Policy HWP EPP < 0>
|- Junction Temperature TjMax [ 0:100]
|- Digital Thermal Sensor DTS [Present]
|- Power Limit Notification PLN [Present]
|- Package Thermal Management PTM [Present]
|- Thermal Monitor 1 TM1|TTP [ Enable]
|- Thermal Monitor 2 TM2|HTC [Present]
|- Units
|- Power watt [ 0.125000000]
|- Energy joule [ 0.000061035]
|- Window second [ 0.000976562]
I have taken the screenshots and captured the output you asked me to. I was a little unsure since you wrote:
Package C-States (view [g])
... while view [g] is named Package cycles. I guess you meant that anyway since you can already see C-States on the others.
Can you please verify them for me? Found here:
https://www.olejon.net/files/CF/
If you need more, or something changed please tell me.
Conditions:
stress -c numcores
, but with stress
I can choose the number of cores and many other non-CPU loads to do at the same time, like memory, disk etc, which might both raise or lower temperature, depending. In my experience, the most stressful on my system is a long Super HQ video convert using HandBrakeGRUB_CMDLINE_LINUX_DEFAULT="quiet intel_pstate=no_hwp nmi_watchdog=0"
EDIT: Added info regarding which F3-button-stresstest I used. Just saw your comment. Will check out later. Although is there any disadvantage of HWP = OFF for me? I saw you have it OFF in your Skylake CPU screenshot too.
Regarding the deb package:
make MSR_CORE_PERF_UCC=MSR_IA32_APERF MSR_CORE_PERF_URC=MSR_IA32_MPERF
seems to work fine with NMI Watchdog ON here. Should I make it the default or is it too uncertain to say it will work correctly on supported systems?README.md
if the script finds that NMI Watchdog is on (basically it will always be so maybe add the warning anyway)make
as above, and warn that it might not be accurate on all systems (if that's true?)Right now I have NMI Watchdog disabled as a kernel parameter for your screenshots, and built it normally, just make
, but before I have built it as above. Haven't spotted any difference on my system, but CoreFreq really seems to work fine with normal build and NMI Watchdog on also on my system, at least when using the on-demand toggles as in my script in the earlier comment, but after your comment on that I changed to make
with APERF/MPERF.
Can you please verify them for me? Found here:
Thanks for your screenshots ! The Memory Controller output shows that I've to improve the decoding. Indeed a CAS latency decoded as 5 seems wrong. Also the DRAM geometry and size don't show up which means that decoding is incomplete.
Regarding the deb package:
I presume that APERF/MPERF is working with any x86 architecture (especially w/ AMD).
I've not found an accuracy difference w/ the UCC + URC counters.
It just that I had started the project using those counters, long time a go
APERF/MPERF should be safe in all cases; I just miss hardware for non regression tests.
Can ask the user if he/she prefers to build CoreFreq "with NMI Watchdog support"
Following the ArchLinux PKGBUILD guidelines: I avoided to request user for actions.
Arch package has to be fully automated and has to care about dependencies.
I can't really tell about Debian/Ubuntu package requirements then I would suggest an auto build based on APERF/MPERF registers
The Memory Controller output shows that I've to improve the decoding. Indeed a CAS latency decoded as 5 seems wrong. Also the DRAM geometry and size don't show up which means that decoding is incomplete.
Well, a little technical for me :P Tell me if I can check up against UEFI monitor/values or something for comparison that could help.
APERF/MPERF should be safe in all cases
OK, then I'll just use that and not ask the user.
Following the ArchLinux PKGBUILD guidelines: I avoided to request user for actions.
deb packages totally supports advanced user interaction upon install/upgrade/uninstall, pretty advanced too, with input fields, multiple choice, yes/no etc. There is a reason why dpkg-reconfigure
exists ;-) It can set up heck let's say phpMyAdmin for you, and if no web server or MySQL/MariaDB installed, will install those and do the basic setup for a working secure system of those, then configure phpMyAdmin to connect to the created root DB user, and which web server alias to use (which web path should point to install path).
BUT: I would not do it that way in a small package like this one. Never used the feature. In this case a simple BASH readline is enough for input, with (yes/no)... Since you say I can just compile with APERF/MPERF by default, no need for asking. Although a text with a link to your GitHub repo for documentation should show IMO when the installation has finished, maybe with the most important reminders. BUT when CoreFreq notices the kernel has changed, then it must tell the user it must be recompiled, the user will be warned and must agree that to work it will recompile itself, and when doing so first downloading the latest version (if not accepting, user gets same warning next time trying to run CoreFreq), and simply run the postinst
script again which does it all (download, compile, take wanted files if compile success and place them where we want them, then delete the source files since no longer needed).
Of course not many packages use this advanced setup for packages, but some must, like accepting a license from a package in official Ubuntu repos that say, downloads MS fonts, some really makes it easier to set up the basics (various server services), and some just say e.g. when updating timezone data package, says something like "Timezone set to: "Europe/Oslo". To change it run: sudo dpkg-reconfigure tzdata". Usually an upgrade will never trigger "action required", as it is already configured, but an info text like this is shown when it's a good idea.
I know Arch Linux "guidelines", and how the hostile the community can be even though one tries his best, from the forum to packages. Say a person upload the first ever package to AUR that this person knows only a few people will use, and receives very good feedback from those (they even bother to comment on how much they like it), but THEN comes some stranger that I'm 99 % sure does not use the package - nor intend to - but just checks the PKGBUILD and download the package and looks at it just to find something to nitpick on.
Erhm which is stupid because according to the Arch Wiki, AUR is not and will not be supported in any way by Arch directly, meaning unless a package make its way into their main (not community) repos, and pacman
will not support it natively, and the wiki tells people to use e.g. yaourt
if they want a package manager for AUR. I guess some just have a hobby of hunting packages with 5-20 votes, to completely trash the user who have tried to contribute for the first time ever (like a little tool/app that is great but has a limited user base), even when there is NO way installing/upgrading/uninstalling the package can harm the system in ANY way. The problem for these guys? It's not done 100 % the, and everyone seems to have their own version of, "The Arch Way".
Seen the most stupid comments from such people on their high horse just complaining, with no suggestion for a "better way", even though people comment back that it is their first upload to AUR and are wide open for suggestions to fix/improve the package, like a link to a relevant section on the wiki, but answer is always "read the wiki stupid", which most people has already done, but not all guidelines and best practices are actually explained with examples. Basically you must download tens of in-repo packages doing something similar as yours, and try to understand them, they can be complicated or simple, but often they do things slightly different, so you must just go for the common things and then for the different approaches choose the one that seems cleanest or you understand what does at least. But NO NO, on AUR & the forum such knowledge is no guarantee that you're not 2 seconds away from being called "stupid", "ugly code" or "messy package" etc.
Those people are just there to keep your spirit down, when you try to contribute... I mean what the heck? Arch is all about breaking your system doing something stupid or simply because it's bleeding edge so even the main packages breaks stuff sometimes. So why a little package with like 5 votes on AUR gets bashed by senior Arch people when they constantly tell people "Arch is a rolling bleeding edge distro so expect your system to break once in a while, and learn how to find the problem and fix it, often it can be found on the forum". At least that was what they said before, about their OWN system, meaning packages THEY have pushed into the MAIN repos.
The Arch Wiki is where those pretty harsh people on the forum contribute best, and should stick to it. I've never asked there but found threads of course after googling with people discussing what I've searced for, just hoping it has a solution/explanation, since many just ends in like /dev/null
for the poor person asking something "stupid" (according to some guy and no one dares say against if he is "kind of right" so...).
Don't get me wrong, I LOVE the Arch Wiki, since it is the place to find stuff, examples, tweaks etc you can find other places but without any good explanation what it does or why it works, like a random blog post saying I found "adding/changing X made Y better!". Then google it and usually the Arch Wiki comes up and has an explanation. Just know your distro and you'll know where to apply stuff that are in different locations etc.
TLDR: Will create a deb package. I will make it as stable as possible, all deps required and set to support only x86_64, with checks in place, and make sure it works on Ubuntu 18.04 LTS (both GA kernel on my server hosting various Ubuntu Barebone Cloud VMs, and my workstation with the HWE kernel). If you have a Debian running on HW (not VM) it would be nice. I guess I can always create a VM with Debian and check that the package installs and compiles correctly. Running CoreFreq may at least launch it, although incorrect values. IDK. As long as it builds in a VM it should work.
Thanks a lot !
To face any kernel upgrade I use dkms to rebuild corefreqk.ko
About the IMC, do you have a screenshot of corefreq-cli
showing the Memory Controller window ?
Hello,
New entry for the i5-6600K added into the Wiki CPU support Thanks a lot
Although is there any disadvantage of HWP = OFF for me?
Look rather this screenshot, you will notice that HWP is ON
To reach the Turbo frequency with HWP:
1- Activate HWP
|- Hardware-Controlled Performance States HWP < ON>
|- Capabilities (MHz) Ratio
Lowest 100.25 [ 1 ]
Efficient 1303.25 [ 13 ]
Guaranteed 3408.50 [ 34 ]
Highest 4010.00 [ 40 ]
2- Set the HWP ratios accordingly to the capabilities
|- Performance
|- OSPM
TGT 4010.00 < 40 >
|- HWP
Min 100.25 < 1 >
Max 4010.00 < 40 >
TGT 4010.00 < 40 >
3- Set the EPP policy to zero
|- Power Management PWR MGMT [ LOCK]
|- Energy Policy Bias Hint < 0>
|- Energy Policy HWP EPP < 0>
4- Stress a single Core
Thanks for making the Gist and host it here so I can delete them later from that server. You know better than me what structure you prefer, so I hoped you'd do it, thanks again.
Are you also interested in the same screenshots and output from my MacBook Air with Fedora 30?
Here is the screenshot you asked for - although incorrect values like, the clock speed seems to just default to standard DDR4 - don't know if that's something you can fix/control (like detect that in UEFI DRAM voltage and speed is actually set to 1.35V/DDR4-3200MHz, easiest by enabling Intel XMP, as said it just adjusts the settings a few comments above, XMP isn't like a "system/service/module" enabled, just a convenience setting to have for setting certified overclock values):
https://www.olejon.net/files/CF/CF-MC-Window.png
Quote from previous comment:
intel_pstat
interface is there in /sys/devices/system/cpu/intel_pstate
, only seems to be missing one "point" that has the name "hwp" in it when booting with HWP enabled. I doubt that points is writeable. I see you also use corefreqk
as both CPU-Freq and CPU-Idle driver. I'm not very interested in that (and wonder what happens if I set that and then I update the kernel forgetting to compile CoreFreq against it and add the module before starting the new kernel).intel_idle
to not let the system go below C1. It still allows idle at Min MHz (800), while only consuming a little more power when idle (according to CoreFreq), but the system is snappier.intel_pstate
added to the Kernel docs are right. If booting with HWP disabled, it can be activated at runtime, with CoreFreq for instance (shows both "enable" and "disable"), but as soon as you do that, you can not disable it again (shows just "enable", which it already is). If booting with HWP enabled, you can not disable it in any way on runtime. CoreFreq then just shows "enable", which it already is. Now I may be ahead of you, that if using corefreqk
as driver it is possible(?), but TBH I don't want that, but probably others, we all have different usecases for CoreFreq. For me it's the best monitoring tool with all the stuff I otherwise use 2-3 commands at the same time to check, and CoreFreq adds more details as well in the standard Frequency View, which is always nice!Possible bug?
TURBO BOST TURBO <OFF>
and I can turn it on again, BUT, both other tools and stresstests, even CF's own, contradicts that, and the value of cat /sys/devices/system/cpu/intel_pstate/no_turbo
is 0
always, and that means that TURBO = ON
, because if this says 1
Turbo is then actually off, as stresstests and other tools confirm. Also the Intel-made turbostat
shows that Turbo is on when CoreFreq indicates it is not. ALSO stresstests, even in CoreFreq shows that, when CoreFreq indicates Turbo is off, it goes to 4.4 GHz as usualy, AND CoreFreq says in the Frequency View on the X axis before the C-states: Turbo: 125%
, and of course `C0: 100%, meaning it also shows Turbo is going on right(?). A little "hmm" for me.BTW: The "Energy Policy" thing makes sense. I mentioned it in a previous comment, that it is in the intel_pstate
(and thought could be a possible solution to allow HWP on without it downclocking after X time):
https://www.kernel.org/doc/html/latest/admin-guide/pm/intel_pstate.html#energy-vs-performance-hints
Note: Every change requires reboot, X hours to see if solution works, meaning to see if overclocks holds up over time, which it now does, so basically testing and failing can take up an entire day to check... That's why I'm not "jumping in and testing straight away" (since now HWP is off and well, turning on, if your suggestion does not work abd underclocking happs again, will require reboot). As said, PC runs server services for others as well. Not so ideal for testing that may require several reboots (which it will here if it comes to testing and failing).
If intel_pstate works in the active mode with the HWP feature enabled in the processor, additional attributes are present in every CPUFreq policy directory in sysfs. They are intended to allow user space to help intel_pstate to adjust the processor’s internal P-state selection logic by focusing it on performance or on energy-efficiency, or somewhere between the two extremes
Anyway, if I try this through CoreFreq and it works, of course I want an official way to do it. Meaning like through sysfs
as said in the docs, or something, to set on boot, rather than opening CoreFreq after every boot :P Though Pretty happy with HWP off and intel_idle.max_cstate=1
. Never underclocks and darn snappy.
Meanwhile read my above comment, more important.
Thanks for your returns.
That will interesting to test with a MacBook: should be x86 compatible, isn't it. However, be prepared for a processor crash; save your files before launching the driver.
HWP is a technology. Hard to tell if it's better than the legacy PM across the capable architectures. I have however found that the Vcore is lower when CPU is idling. HWP is managed by hardware and not controlled by OS as w/ OSPM: I believe transitions are "smoother", and you can still give a Hint and Target frequencies (incl. Min, Max) ; also other settings such as the Activity Windows (not implemented yet).
To quote the SDM:
14.4 ... Hardware-Controlled Performance States (HWP), which autonomously selects performance states while utilizing OS supplied performance guidance hints. ... When HWP is enabled, the processor autonomously selects performance states as deemed appropriate for the applied workload and with consideration of constraining hints that are programmed by the OS. ... preference towards energy efficiency or performance, and the specification of a relevant workload history observation time window. ...
14.4.2 Enabling HWP IA32_PM_ENABLE MSR (bit 0, R/W1Once) ... can only be enabled once from the default value. Once set, writes to the HWP_ENABLE bit are ignored. Only RESET will clear this bit. Default = zero (0).
That's why I only offer an Enable option. But you can escape the Pop-up ;-)
Memory Controller will be a long road to debug. I have access to the same architecture, but different Processor model. The DDR speed is based on bits decoding (from the datasheet) : standards values, no XMP involved.
CoreFreq's Idle and Freq sub-drivers are a mean to take the control over P-States and the C-States. My idea is to offer deterministic tunings which are controlled by users. Imagine a system which wants full perf or full savings : you can dynamically ask the kernel to maintain C-States as bellow Changes don't require a reboot. Remember, CoreFreq aims to be a BIOS straight from the Linux session.
Many other use cases you will make your own. Such as 1- disabling all Cores beside one for overclocking purpose 2- select the coldest Cores on which to run intensive algorithms
For me it's the best monitoring tool with all the stuff
Appreciate it a lot; thank you
rather than opening CoreFreq after every boot
I try to understand your need but what you ask is one my goal :
change -> tune -> test -> stress -> decide (if happy or not with results)
And repeat the cycle without leaving the UI
For those "control" reasons, CoreFreq implements its own drivers: corefreqk-perf
and corefreqk-idle
try however to be compatible with the /sys interface. At most, to query the current values, but changes have to go through the UI (the spirit of the software)
Edit:
forgot to tell that the corefreqk.ko
driver can be started with arguments if you want to make your settings permanent (without opening the Daemon and the Client)
Yes, I totally understand that. Well I don't exactly use it for seeing my RAM info. With the incorrect, in the Dashboard View it shows then 2133 MHz and of course then not 16-18-18-36
, which XMP, as said sets automatically in UEFI when enabling it (actually UEFI detects and sets it without XMP).
When enabling XMP in UEFI, upon "Save changes and exit", it shows a confirmation dialogue of values changed. You see what it's done. After that you can basically disable XMP back to disabled, which will reset these values, and set them manually, which will give the same results. So XMP is just a quick way to detect certified overclock for RAM and CPU and set those values based on HW found.
The CPU is always AFAIK set then to Sync All Cores (only lets you modify one multiplier as it syncs it to all cores) = Auto (on first of X cores multiplier), which in my case enables the full CPU spec Turbo Boost, but on all cores, instead of just one at the time (in my case 3.9 GHz, or a multiplier of 39, although it says "Auto", confirmed by turbostat
, so I could overclock the CPU manually in a certified way by just setting Sync All Cores = 39).
Same goes for memory, the 3 values of 16-18-18-36
, (actually that the UEFI seems to detect/do anyway, without XMP enabled), but setting DRAM voltage to 1.35 and DRAM speed to DDR4-3200 MHz must be done then manually. I've done this with XMP off, same result (one of the things I tried when I saw the downclocking was to keep XMP disabled, as per default, but set those values manually).
The MacBook Air (mid-summer 2013) 13", with top specced CPU (which gives a better iGPU) and RAM, of course chosen on purchase from apple.no (with student discount), I've already run CoreFreq on several times, no crashes or anything.
It's very standard x86 HW, just needs, after install, USB tethering through an Android phone, instantly recognized by Fedora/NetworkManager as Ethernet connection > Wi-Fi through my phone, install RPM Fusion and then one can install the driver for the cheap Broadcom chip Apple still uses instead of using Intel's which they could've as it can be part of the SoC. So install the wl
driver (a package in RPM Fusion) and akmods
(like dkms) and it works kaboom, disconnect phone and you'll have Wi-Fi in the GNOME Control Center. Actually set it to boot as primary OS too (some weird key + mouse combo to do that, but easier than GRUB for sure), booting macOS needs Alt-pressed down to show Apple's multiboot menu heh (this runs straight on the metal, not Bootcamp or anything, that only supports Windows).
CoreFreq builds fine as soon as kernel-devel and necessary build tools and libs are installed on Fedora (although I think I only needed to install kernel-devel, Fedora now even includes git and wget etc, not before, just curl, but not nano still ehh).
You saw the weird TURBO bug, if it is. Seems so, since it's the only tool reporting it off, more official tools and sysfs reports it on. Pattern now with NMIW off at boot is that at first launch CoreFreq says TURBO is on. Then after shutdown/unload, every time it says it's off, and can be turned on, but even during a full stress test it doesn't affect ANYTHING. So the TURBO indicator and toggle on my system in CoreFreq is clearly "false" or "unusable" in the sense that it indicates wrong and activating does nothing. I think deactivating when using standard XMP overclock did actually deactivate it, but now the toggle does nothing. Hard to say it's because HWP is disabled since intel_pstate says turbo is on in sysfs...
Perhaps booting using the standard timings and speed of the DRAM (no XMP and other auto-tunings), you could confirm if CoreFreq's Memory Controller output is matching the BIOS ?
MacBook Air: may be you could boot it with the CoreFreq Live image for a quick test
I've already run CoreFreq on several times, no crashes or anything.
I love to read this. Thanks
You saw the weird TURBO bug
I'm not sure to have notice a Turbo bug . Can you please point it ?
It's in my comment a few comments above under bold as "Possible bug?". In case you just read the emails, I do tend to edit the comments afterwards with more info, if you haven't responded yet, so please read the comment here and not just the email. Although the part about possible turbo bug was not edited in. Doesn't matter as you can see. It's a false reading as you can see as well. Other tools, sysfs and stresstests, even in CoreFreq shows it goes full turbo but CoreFreq reports turbo as off, like randomly.
OK I see. But maybe not a bug.
CoreFreq reacts to some bits changed by other software.
W/ your architecture , the Turbo conditions are:
If the combination is not set, Turbo is marked OFF
Thus if Turbo toggles, it means that another logic is changing those bits on behalf of CoreFreq
And that's another reason, I now programming the CPU Freq and Idle sub drivers
Give a try, you will understand.
Can't wait too see your results but it's late in Paris.
Regards Cyril
BUT: How do you explain this then?
Run conditions:
Load Module > Launch Daemon > Launch CLI (as desktop user) > Ctrl+C > Stop Daemon > Unload Module
ps
, lsmod
and dmesg
confirms this. Nothing is staying running/loaded afterwardsResults:
stress -c 4
which does not change any CPU settings)turbostat
without --quiet
shows a lot of info about the CPU before starting monitoring, many with the keyword "MSR".I can check tomorrow if I see any difference in turbostat
when CoreFreq has TURBO = ON/OFF, and paste the 2 outputs here if I don't see any difference (you may spot something I don't).
PS1: Not sure if it was this "jumping around" when NMI-W was off as a kernel parameter and CoreFreq only compiled with make
, because now NMI-W is back on and CoreFreq compiled with APERF/MPERF. Can test, but can not reboot this often as lately :-)
PS2: Remember every test you suggest (big thanks!) to me involves turning on HWP, which then can not be undone. If your suggestions does not fix the underclocking happening after X hours, I will have to reboot again, and both people on LAN and WAN relies on my server services having as much uptime as possible. EVERY test & fail requires then reboot > make changes in CoreFreq > wait X hours and see if it has been underclocked or not. That's the problem. Time consuming, and I see no disadvantage with my system now... But rest asure, I will try as I am way too curious to not do it :P
BTW: Can also give you screenshots and output from my Intel NUC running Ubuntu Server 18.04 LTS 64-bit (with LTS HWE stack installed, meaning kernel 4.18
). It does run VMs (KVM + QEMU using Libvirt), but I can shut them down without problems if that messes up results.
Regarding this approach to the deb package and its advantages:
corefreq
(or whatever you want the main command to run it named), say installed to /usr/sbin
by the deb package (that BASH script + copyright file will be the only installed files by the actual package), will automatically check when run, if CoreFreq is already installed fully (downloaded latest code + built)/opt
(really just corefreqk.ko, corefreqd, corefreq-cli
)prerm
script will remove the directory with the files in /opt
, and the 2 other files (copyright + run script) will be removed automatically as they are part of the actual packagecorefreq -u
can download and build the latest code even though the kernel module loads, just so that users can easily run the latest versionlibpthread-stubs0-dev
), but that I can easily check in packages.ubuntu.com for all Ubuntu versions supported today, and add alternative names if I see the 3rd having changed name (or for some reason is different on Debian). BUT, I will add, to be 100% sure, dependencies for linux-headers
, wget
and unzip
. The other 2 are build-essential
and libelf-dev
(which make
recommends because without it it can not check something).TLDR regarding below: So Livepatch + (probably not) HWE Stacks + users running with Secure Boot = Enabled (typically without knowing it), can run into problems, if using DKMS or not (must be tested).
Arch Linux users rarely run with Secure Boot = Enabled, heck hard to find one dual-booting with Windows :P Doubt if there is a signed kernel with signed modules in the Arch repos.
I will try to check how Ubuntu reacts if I try to insmod corefreqk.ko
with Secure Boot enabled vs modprobe
it.
Now that "Ubuntu Livepatch" is so easy and promoted on Ubuntu, especially for beginners who read the easy explanation "With this [hot-swapping in a new kernel update], you will almost never be aksed to reboot after system updates". IDK how this works with third-party modules relying on DKMS either. Another problem with DKMS is that, I suppose very many, Ubuntu users installs it on a system with Windows already installed, meaning Secure Boot is Enabled. That is not a problem for Ubuntu, it comes with a Signed Kernel + Included Modules and uses it by default. No problem for me enabling secure boot, as long as not using third-party modules, which I don't (except corefreqk.ko on-demand of course), thanks to the excellent amdgpu
driver which ships with the kernel, which just shows the progress AMD + the community makes together, updating from kernel 4.15 to HWE 4.18 basically made the only advantage of installing the same open driver from the AMD-repo obselete (more sensor data), even though that updates a ton of Mesa/Xorg/Wayland/libs+++ packages as well, but with a logical naming system so easy to find packages installed by it, and an uninstall script included/installed so easy to revert.
The Ubuntu 4.18 kernel driver just got even better, and the 4.15 already crushed the AMD-repo one, even though the come from the same open code base. IDK why. AMD also has a "Pro" driver, using the open one with some proprietary stuff, is just as easy to install and uninstall, but basically it is slower or equal on most tests I've seen on Phoronix on all kinds of older and newer AMD cards (Polaris and newer). AFAIK it has an advantage in OpenCL, but no generally (computing-stuff or something, don't remember). Certainly no test suggests it's worth it if gaming (with a few exceptions, but minor in tests), Vulkan works just as good with the open one, and certainly not for regular non-gamers like me.
Point is, Ubuntu has in place a system when Secure Boot is enabled and a third-party module is installed. This was especially needed because of the NVIDIA driver. Basically, IDK how it works, it automatic or the package or DKMS must trigger it, it asks you to sign the module, basically a prompt to enter a password, which can be just "12345678", then reboots into a special UEFI menu, where, depending on the manufacturer, you must find the place to enter this password, and even I had to "guess a little", but was right and then just typed "12345678" and it booted as "signed and secure". This was when installing the amdgpu
driver from the AMP-repo, as that is then third-party (replaces the signed one that comes with the Ubuntu Signed Kernel). IDK if this system recognises the "standard easy way" to add like the CoreFreq module to DKMS - meaning if it asks for this procedure to be done.
My system has been dead stable since built, and even though Ubuntu 18.10 was released then, no way I will use a non-LTS, especially when rolling LTS HWE Stacks have become available. They've been around for a long time, always used it, but since 16.04 they have changed it so they only have to to focus on 2 kernels, the GA and only 1 HWE Stack Kernel - not like before. See charts comparing 14.04 to 16.04 and later here, and the detailed reason why (under "Justification"). LTS HWE stacks typically becomes available about 3 months after the next Ubuntu release (xx.10 release late October date + 3 months to bulletproof it for LTS), so typically 9 months after, the year after LTS-release, in February. Desktop users waiting until typically the 2nd point-release, if released then, will get into the rolling HWE Stack by default. People installing Desktop 18.04 before this can install it with one command. Ubuntu Server default to the GA kernel anyway it seems, but as written "these newer enablement stacks are meant for desktop and server and even recommended for cloud or virtual images", I have installed it on my Host Ubuntu Server and the barebone on-command-install Cloud VMs running on it, with the Cloud package uninstalled as soon as I have set a password for the default user (where the package then is linux-virtual-hwe-18.04
). For Desktops it keeps kernel and all Xorg/Wayland stuff much more up-to-date, and for servers only the kernel. For example when 19.04 is released, always late April as it is with October, next scheduled HWE Stack according to the chart then is in August (when I say 3 months after, you may think July, but since it's always late-April, almost May, it becomes August, unless they have serious problems, but they offer an edge-package so users can try the HWE stack for 19.04 already now, so both the devs and users make sure it is bulletproof when August comes, and of course any bug found afterwards is as with all packages fixed with a minor update ASAP).
Start the daemon in debug mode and watch its output
corefreqd -d
I'm interested by traces happening when Turbo flips.
Also be aware that ACPI can modify the Target frequency. One ACPI kernel module may run in your system
Yes, a NUC run, w/ screenshots, will help me to cover this architecture.
My non regression build environment test is made of virtualized Ubuntu, CentOS, OpenSuse. All of them in console mode only.
Whereas ArchLinux is my day to day developer environment. Updated every week. Stable since 12 years. A bare metal boot without unused drivers and no virtualisation
Xen w/ same Arch Kernel as dom0 is occasionally booted however.
awesome is my WM.
I don't make use of Ubuntu : too much CPU overhead to my taste. Can't really help with this distribution.
Your script is promising. A few recommendations :
try to suffix it with .sh
corefreq
name is already reserved
-u
has in the Cli the option to dump CPUID. Use instead --update
dkms is indeed not mandatory but helpful for kernel module beginners
git is the favorite way to sync source code. wget may help if user is behind a http proxy
think also about a Systemd unit to manage the daemon
take care that the driver, daemon and client are based on same version; or you will face crashes due to unaligned shared memory
sync
file systems before starting the driver
Here comes screenshots from the MacBook Air:
https://www.olejon.net/files/CF-MacBook-Air/
DEB:
*.sh
in a bin-directory is wrong IMO, it should be executable, first line #!/bin/bash
then, and without an extension. If you still have a problem with it, better to choose another name, like corefreq-cli
or cf-cli
- none of those are taken either. Whatever you like. The user will be told how to run it after successful installation, e.g. "Installation completed. Run CoreFreq with: sudo cf-cli"NUC:
linux-virtual
kernel, and the Cloud Package make installs a random private key and gives you the public key, which is added to the Libvirt XML so one can SSH in with a special command using that VMs' name, IIRC from anywhere in the world (maybe that requires an extra step for that which is shown when SSH-ing in). Needed because the default user has no password so you can not use virsh console vmname
and log in therevirsh
as all othersFirst time I can see CoreFreq driving an Haswell ULT. Thank you very much
Indeed, what needs to be completed for this architecture:
I'll put your screenshots in Wiki during WE.
Usage advises:
corefreq-cli -J <num>
where <num>
is the zero based index of your selected stringCan't wait to see the NUC in actions ...
Thanks again
make
fails onLinux 4.15.0-52-generic
, current latest kernel for Ubuntu 18.04 LTS 64-bit, with Vanilla GNOME (used as developer and main workstation + some server services)intel-microcode
was also updated recentlyIntel Core i5-6600K
Linux 4.15.0-52-generic
, it fails with the error belowlibelf-dev
sincemake
suggested to do so, and that warning goes away, but still build failsdkms
sincemake
takes like 5 seconds on my system (seems like that was a good decision)I see you did varios SMBIOS 6 days ago. That is what failing it seems.
I haven't installed/purged or changed anything that should have anything to do with building from source code.