Open NicolasDerumigny opened 2 years ago
Rebased onto master, also testes functional on the latest version of Arch (GNOME 43).
So, the main problem is the API with which the extension is interfacing with the cpufreqctl tool:
# cpufreqctl --format json info current
{
"min": 2095908,
"max": 4192231,
"avg": 2916799,
"rnd": 4192206
}
On my AMD Ryzen this creates no problems, but does create problems when
This API has 1 positive thing and 1 negative thing. The positive thing is that it provides 4 different ways to gather the CPU frequency with the rnd
method choosing a random core and not waking all the cores for getting "a" current frequency. The downside is (you might have already guessed) every method is called every time a frequency is fetched. So regardless which frequency-gathering method you choose in the preferences of the extension, all methods are queried every time a frequency is fetched. This is bad.
So, the first thing would be to redesign the API to something like this:
# cpufreqctl --format json info current min
1863003
# cpufreqctl --format json info current max
4192190
# cpufreqctl --format json info current avg
2597208
# cpufreqctl --format json info current rnd
2104293
Thereby, all the frequency method can be queried individually. This is the first step. The next step would be to make min
, max
, avg
and rnd
aware of sleeping cores. I figure this one is really hard, also considering the whole thing with performance and efficiency cores in the Intel world. I do not know if this is even possible. But, it is crucial for this extension to not wake cores which are already sleeping for a stupid frequency display. This is where I think the main problem lies when people report this extension making battery life of their laptops worse.
So why was this API designed like this in the first place?
Because I wanted to be able to easily add new frequency gathering methods as e.g. the min
and max
methods have clear disadvantages for efficiency/performance core architectures as not all cores share the same frequency spectrum. But I think the approach described above would be flexible enough to allow other frequency gathering methods to be added quite easily.
Apart from changing the cpufreqctl
interface, all uses of the old interface in the extension have to be adjusted to only gather the single selected method. I think it is straight forward to integrate the new interface, but maybe there are some gotchas.
I am unsure whether something like:
# cpufreqctl --format json info current list
[
"min",
"max",
"avg",
"rnd"
]
is necessary to query for the available gathering methods. Maybe hard-coding this into the extension is fine as all the methods require extension-specific UI elements like translations and explanations texts anyway.
So, I think the boosting and high CPU load are a side-effect of this utterly flawed implementation for gathering CPU frequencies. If adjusted to only query 1 of the active cores and not waking any core that is sleeping this problem would be solved for not only the pstate driver but for all drivers.
Thanks for your point of view!
I am unsure what you mean by "waking up sleeping core", as sleeping core in the sense of actual sleep state exists on Android, not x86. On Intel / AMD CPU, the "sleep state" is just a throttling of the frequency when the OS has nothing to execute on the CPU. Therefore, there is no way (as far as I know) to avoid this boosting, except (which is what I wackly propose in this PR) putting sleep
all around your program to limit it load.
Optimizing for battery life / not boosting can consist in rewriting the cpufreqctl
script in C so that it does not need all the overhead of being interpreted, and so that is scans only once the frequency for all cores, then uses the info to compute min/max/avg/rnd.
If the only call the extension is doing is cpufreqctl --format json info current
and a few others (I saw some backends
), I can rewrite it. Though, this will cause dependencies with libc
and building hell for new releases, unless we build it statically (but I have no idea of the compliance of that with distro-packaging rules).
Edit: Typo
A rewrite in C is not possible as this would violate the Gnome Extension Review Guidelines.
Extensions MUST NOT include binary executables or libraries
We are already stretching the guidelines as we are not providing the cpufreqctl
tool as a GJS script but as a Bash script:
Scripts must be written in GJS, unless absolutely necessary
Reviewing Python modules, HTML, and web JavaScript dependencies is out of scope for extensions.gnome.org. Unless required functionality is only available in another scripting language, scripts must be written in GJS.
Regarding your technical question:
I am unsure what you mean by "waking up sleeping core", as sleeping core in the sense of actual sleep state exists on Android, not x86.
Yes, that's ofc correct. But I thought this is generally available for ARM devices under Linux. Is this behavior Android specific? Nevertheless, for x86 this means not using a core unless it is absolutely necessary. The cpufreqctl
script includes the cpufreq
backend which is a generic driver, also for ARM CPUs.
On Intel / AMD
GPUCPU (?), the "sleep state" is just a throttling of the frequency when the OS has nothing to execute on the CPU.
Yes, so I would like to avoid this tool from pushing the frequency of a "low-frequency", NOP'ing core by querying its frequency. I am not really sure how to implement this without waking (e.g. pushing the frequency) the core, but the rnd
method is a first step where it will only potentially wake a single core.
But I'm open for any other suggestion solving this problem. However, I am not a big fan of the sleep
approach as we are already using GJS
timeouts for scheduling the frequency gathering and this is already a sleep. Just throwing in more sleeps will just slow down the extension and make it less responsive. I think we should avoid unnecessary frequency lookups in the first place.
Edit: add link to Gnome Extension Review Guidelines
I see the problem. I don't think probing the cores is the main behavior that lead to boosting. However, multiple calls to awk may be. Can we imagine relying on another back-end than cpufreqctl
to read frequency, such as a (separately installed) binary, with a fallback on cpufrectl
if the former is not present?
I will think about it, then come back if I have more ideas.
Hmm, I think our cpufreqctl
which ships with the extension ensures great compatibility regardless of distro and distro-release. I really prefer the "one click and go" approach of this and would not want users to not be able to use this extension after installation because something on their host system must be installed first (also in the correct version, etc.). But yes, the path for gathering the CPU frequency must be as slim as possible. Things like this really need cleanup. But it should be possible by putting each method into its own function. I think awk then is not needed anymore.
Is a GJS implementation out of question (for frequency reading)? This would avoid both the need of a script and the calls to awk
.
Another possibility is to send directly the readings in the JSON stdout return, the compute the aggregation method (min/max/avg/rnd) in the GJS extension, avoiding awk
in the read_current
code path?
Is a GJS implementation out of question (for frequency reading)? This would avoid both the need of a script and the calls to awk.
No, this is mostly for historic reasons as it started out as a small bash script and got bigger and bigger. The GJS requirement for scripts got added to the Guidelines after we already implemented the bash scripts, if I remember correctly.
If I remember correctly, reading and writing files has included a lot of boilerplate in GJS, whereas in bash it's just echo
and cat
for setting and reading config of the CPU. But I don't know if that is still the case. Feel free to investigate further into getting rid of bash (would also be a good thing from a security perspective :sweat_smile:)
Another possibility is to send directly the readings in the JSON stdout return, the compute the aggregation method (min/max/avg/rnd) in the GJS extension
This would still require some interface for reading the frequency of one core, for only performance cores, etc. I think it is simpler to make the cut for API at the min, max, avg, rnd border and not communicate all the "let's select a core for frequency reading" stuff. But if you come up with an elegant solution, I'm up for it. The main downside would be that you could not as easily replicate the readings displayed in the extension from the command line. I would say that's an inconvenience. However, I fear that a lot of details of the individual CPU backends would "bleed" through, so that the extension code would have to deal with pstate, cpufreq, etc.
On another note:
Can you write some test snippets where you infinitely read without a sleep from /sys/devices/system/cpu/cpufreq/policy0/scaling_cur_freq
and then from /sys/devices/system/cpu/cpufreq/policy{1,2}/scaling_cur_freq
and then 4, then 8 etc. and see how this affects your performance? I am unsure whether fast reads for a single core are the problem or concurrent reads to multiple cores. I suspected the latter, but maybe I'm wrong.
A first simple fix is to run renice -n 19 0 >/dev/null
at the very start of the program to decrease its priority. On my machine (Core i7-1065G7), idle frequency is supposed to turn around 1.2 GHz (which is correctly detected with the sleep
method). With the mainline detection, the CPU boost over 2 GHz. With the renice
one, the impact is limited to a 1.6-1.9 boost.
I analyzed with perf
1000 executions of cpufreq --format json info current
, and found that most of the time is spent in:
libc.so
: 46 %ld-linux.so
: 27 %bash
: 20 %[unknown]
: 3 %gawk
: 2 %However, looking at distribution of time across PID shows only 5 % of the time is actually spent in cpufreqctl
, and... 45 % on all the awk
calls. Then, 10 % roughly is spend in sort
. I thing the overhead of the actual call of cmd utilities is the main bottleneck (which is why libc
/ ld
account for most of the time). The only solution for that is to switch to GJS and limits call as much as possible. Do you agree?
For you experiments, I blocked my frequency to 1.4 GHz, no renice
and:
&
here) : ~11.3 sec wall cluck time (180 % CPU usage = 20.4 time.core)I don't really know what to think about that, except that both options look usable.
EDIT: I am printing to /dev/null
on those experiments to avoid shell I/O bottleneck
However, looking at distribution of time across PID shows only 5 % of the time is actually spent in cpufreqctl, and... 45 % on all the awk calls.
I think this is because awk
is reading from a pipe where it has to wait for the frequency to be read by cat (and the kernel). I still don't think this issue is caused by a specific tool, but by the overall design of the API. For min, max, and avg it is necessary to read the frequency of all cores, but not for rnd. But yes, your results show that it might still be beneficial to do a "cooldown" with a short sleep before reading the frequencies. However, if we do this in the pstate backend, we should do this in all backends.
I don't really know what to think about that, except that both options look usable.
Damn it :D Looks like your CPU is not affected by freezing and/or system lock ups. Especially server CPUs like AMD Threadripper and Intel Xeon seem to have difficulty keeping the system responsive when excessively reading CPU frequencies. I expected the parallel approach to have a way higher CPU load and introduce stutter, but this does not seem to be the case. I originally opted for the parallel read approach as especially older AMD CPUs take ages to read a single CPU frequency (1-2 seconds) and a 16 core CPU would then take 30+ seconds for a sequential read of all the frequencies. So although your results clearly show that an API change is not really necessary (your performance numbers look fine), I still think that the API should be changed so that people having CPUs that cannot handle reading the frequencies of all cores can switch to the rnd
method and only read the frequency of a single core. Also, this is good for people having higher power consumption due to this extension always polling the frequency of all cores (currently I tell them to completely disable the frequency display).
Do you agree?
You can try writing a proof-of-concept in GJS for just reading the current frequency with the 4 methods and see how the performance numbers change. For your use case this might be beneficial, for others I think an API change is more beneficial. (e.g. #184, #138, #52)
I rewrote it in pure bash (as I don't know JS well, so I'll see later for a pure GJS version) to remove the need of awk
and sort
on reading. Results:
I tested this (sequential version) on a i9-7980XE (18 cores/36 threads) locked at 1.2 GHz and:
So I cannot reproduce the hang. Maybe it's arch, maybe Intel. I would argue that the nice
should be enough to avoid system hangs, but I cannot be sure (GNOME may still hangs while the extension is waiting for the frequency, right? Then the script call should be async with a timeout).
With the latest changes, perf
reports 45% of the time in bash
, 41 % in libc
(I had a look, it is mainly malloc
/ "cfree") and 0.5 % in cat
; that look better in my opinion.
Let's got with one last try for today. Removing the array did not change the execution time on the 1065G7, but tripled it on the 7980XE (10,5 sec wall time). I'll test as it, then will head for the GJS implementation later.
Insert a
sleep
to avoid artificial high CPU frequency (boosting) when usingcpufreqctl info current
on a i7-1065G7 (Lenovo Yoga C940)