Open sylentprofet opened 5 years ago
As you can see in #4491, dom0 kernel have no full access to power management, including cpufreq, so some data may be inaccurate. This is also why acpi-cpufreq
driver does not load, or why intel-pstate
doesn't provide full info.
The thing you should look at is xenpm
, especially this part:
scaling_avail_freq : 2001000 2000000 1900000 1800000 1700000 1500000 1400000 1300000 1200000 1100000 1000000 800000 700000 600000 500000 *400000 scaling frequency : max [2001000] min [400000] cur [400000]
As you can see, right now it's at 400MHz.
So what's the solution here? Do I just assume that somehow I'm magically getting full performance, despite what every sensor tells me?
I spent most of yesterday looking at xenpm
, and I can see that it isn't scaling correctly.
Should I try cpufreq=dom0-kernel?
Your response doesn't provide much of an explanation. According to these links, pstate should work. Or I should be able to see some attributes for cpufreq.
https://wiki.xenproject.org/wiki/Xen_power_management
https://lists.xenproject.org/archives/html/xen-devel/2015-10/msg03048.html
I know terminology can be quite confusing here - there is acpi-cpufreq in both dom0 Linux and Xen. And the same for intel_pstate - there are two of them: one in dom0, one in Xen. By default, only Xen ones have real impact, even though dom0 one may report some data. For managing power management in Xen, user xenpm
tool and do not believe in anything that dom0 kernel reports (/sys
etc)...
The documentation you've linked is quite old (for example mentions Xen 4.0 as the latest version, which was released in 2010...). But I'm not sure if anything more up to date exists.
As for the xenpm get-cpufreq-para
output - there are two set of parameters:
xenpm set-scaling-*
There is also scaling_governor
and its parameters.
As for the frequency, xenpm get-cpufreq-states
output is IMO less confusing. You may also want to look at xenpm get-cpuidle-states
, including statistics there.
That said, it may be the case that frequency scaling doen't work properly on this machine(s). We've seen other power-management related issues with Intel Core 8th gen CPUs, specifically around system suspend. We have limited access to such machines (one of them), but there should be some progress soon.
I understand that the documentation is quite old - there's not a lot out there regarding this issue. I literally went through ~ 100 links yesterday trying to distill out good information.
It may be the case that xenpm
is reporting the correct frequencies in realtime (i.e. it is running at 400 MHz, 2GHz, etc.), but my point is that the pstate settings themselves are incorrect (i.e. it should be able to scale to 4 GHz and it does not).
As for xenpm get-cpufreq-states
it reports much the same information. All cores report similar to below, I truncated the output for readability. I was reluctant to copy it because it was somewhat redundant, but for posterity:
...
cpu id : 7
total P-states : 16
usable P-states : 13
current frequency : 400 MHz
P0 [2001 MHz]: transition [ 0]
residency [ 12 ms]
P1 [2000 MHz]: transition [ 0]
residency [ 0 ms]
P2 [1900 MHz]: transition [ 0]
residency [ 0 ms]
P3 [1800 MHz]: transition [ 1113]
residency [ 21579 ms]
P4 [1700 MHz]: transition [ 61]
residency [ 351 ms]
P5 [1500 MHz]: transition [ 37]
residency [ 275 ms]
P6 [1400 MHz]: transition [ 29]
residency [ 157 ms]
P7 [1300 MHz]: transition [ 43]
residency [ 206 ms]
P8 [1200 MHz]: transition [ 51]
residency [ 253 ms]
P9 [1100 MHz]: transition [ 44]
residency [ 189 ms]
P10 [1000 MHz]: transition [ 114]
residency [ 572 ms]
P11 [ 800 MHz]: transition [ 64]
residency [ 328 ms]
P12 [ 700 MHz]: transition [ 75]
residency [ 350 ms]
P13 [ 600 MHz]: transition [ 79]
residency [ 254 ms]
P14 [ 500 MHz]: transition [ 100]
residency [ 377 ms]
*P15 [ 400 MHz]: transition [ 881]
residency [ 17952 ms]
xenpm get-cpuidle-states
:
cpu id : 7
total C-states : 8
idle time(ms) : 682402
C0 : transition [ 1591933]
residency [ 66243 ms]
C1 : transition [ 1303701]
residency [ 23089 ms]
C2 : transition [ 131054]
residency [ 59419 ms]
C3 : transition [ 14775]
residency [ 16809 ms]
C4 : transition [ 33119]
residency [ 51541 ms]
C5 : transition [ 22137]
residency [ 53372 ms]
C6 : transition [ 61415]
residency [ 283879 ms]
C7 : transition [ 25732]
residency [ 180918 ms]
Mentioned in the OP, xenpm set-scaling-frequency
does not allow us to reach the correct frequencies. The governor can be set, but not much else beyond that of value.
On a side note, I just booted a USB key of Xubuntu 18.04 and experienced flawless performance. The scaling governor was correctly reported as intel_pstate
, and I saw frequencies from 400 MHz - 4 GHz. Interestingly, the number of pstates is reported at 37.
Has the scaling_driver
been purposefully forced to acpi-cpufreq
instead of intel_pstate
in Xen?
Has the
scaling_driver
been purposefully forced toacpi-cpufreq
instead ofintel_pstate
in Xen?
That's interesting question. When I look at the Xen version we currently use, with a grep
and sources in hand, I can't see intel_pstate
driver included there at all. To be hones, I can't see it in xen-unstable either. Looks like the patches adding intel_pstate
driver to Xen were never committed.
For reference, a few additional links we discovered:
https://lists.gt.net/xen/devel/376881 https://github.com/mirage/xen/commits?author=wei-w-wang-intel https://lists.xenproject.org/archives/html/xen-devel/2015-10/msg03047.html
These highlight the intel_pstate xen patch set from Wei Wang @ intel. It sounds like the correct implementation should be adding intel_pstate=enable
in the kernel cmd line, once the patches are merged.
Thank you for helping track this down!
I hadn't really looked into this, I just assumed that it was working just fine. Thank you for taking the time to look into this and sharing your findings @sylentprofet
I wonder if @wei-w-wang-intel would have any input here? If it's the same person from the commits..
Perhaps @wei-w-wang ?
The plot thickens.. My colleague ran across the following Citrix documentation:
https://support.citrix.com/article/CTX200390?_ga=1.158728573.690652182.1439902385
Using the command from that, xenpm start 1|grep "Avg freq"
appears to report correct frequencies. With this command, and only this command, are we seeing frequencies above the 2 Ghz range under load, in the 3 to 3.5 GHz range.
Strangely, if we simultaneously issue xenpm get-cpufreq-states
, it continues to report incorrect information (2 Ghz max frequency) while xenpm start...
reports frequencies above 3 GHz.
I have been unable to achieve max turbo state (4 GHz) with limited testing, but this is an interesting development nonetheless.
Any thoughts as to what might cause xenpm
to behave in such a way? Which readout is correct? And what frequencies do the VMs actually have access to?
Have you tried xenpm enable-turbo-mode
(even though it's already reported as enabled)?
BTW I've asked about intel_pstate patches here: https://lists.xenproject.org/archives/html/xen-devel/2018-12/msg01400.html
Matebook X Pro user here, it seems setting anything with xenpm has no affect.
[ddaly@dom0 ~]$ sudo xenpm disable-turbo-mode
[CPU1] failed to disable turbo mode (22 - Invalid argument)
[CPU3] failed to disable turbo mode (22 - Invalid argument)
[ddaly@dom0 ~]$ sudo xenpm set-scaling-minfreq 400000
[CPU1] failed to set scaling min freq (22 - Invalid argument)
[CPU3] failed to set scaling min freq (22 - Invalid argument)
[ddaly@dom0 ~]$ sudo xenpm set-scaling-maxfreq 400000
[CPU1] failed to set scaling max freq (22 - Invalid argument)
[CPU3] failed to set scaling max freq (22 - Invalid argument)
[ddaly@dom0 ~]$ sudo xenpm set-scaling-governor powersave
[CPU1] failed to set governor name (22 - Invalid argument)
[CPU3] failed to set governor name (22 - Invalid argument)
[ddaly@dom0 ~]$ # Forcing 400Mhz, on powersave
[ddaly@dom0 ~]$ xenpm start 1|grep Avg freq
Avg freq2421210KHz
Avg freq2421210KHz
Avg freq2381190KHz
Avg freq2381190KHz
[ddaly@dom0 ~]$ # Ignored.
[ddaly@dom0 ~]$ sudo xenpm get-cpufreq-para
cpu id : 0
affected_cpus : 0
cpuinfo frequency : max [2001000] min [400000] cur [2001000]
scaling_driver : acpi-cpufreq
scaling_avail_gov : userspace performance powersave ondemand
current_governor : powersave
scaling_avail_freq : 2001000 2000000 1900000 1800000 1700000 1500000 1400000 1300000 1200000 1100000 1000000 800000 700000 600000 500000 *400000
scaling frequency : max [400000] min [400000] cur [400000]
turbo mode : disabled or n/a
[CPU1] failed to get cpufreq parameter
cpu id : 2
affected_cpus : 2
cpuinfo frequency : max [2001000] min [400000] cur [2001000]
scaling_driver : acpi-cpufreq
scaling_avail_gov : userspace performance powersave ondemand
current_governor : powersave
scaling_avail_freq : 2001000 2000000 1900000 1800000 1700000 1500000 1400000 1300000 1200000 1100000 1000000 800000 700000 600000 500000 *400000
scaling frequency : max [400000] min [400000] cur [400000]
turbo mode : disabled or n/a
I've been able to achieve close to boost, but never anything close to 4Ghz (Boost)
sudo xenpm get-cpufreq-para
dosen't know what it's doing.
Under high load on a Lenovo X1C6:
dom0 $ xenpm start 1 | grep "Avg freq"
Avg Freq 3844830 KHz
Avg Freq 3844830 KHz
Avg Freq 3844830 KHz
Avg Freq 3844830 KHz
these are the largest set of values that were returned
At idle, xenpm start 1 | grep "Avg freq"
was not observed to be below 1995950 KHz
Is there any update on this? On my Lenovo T480 it seems the battery is lasting very short, that make me think the scaling could be really buggy. thank you so much!
for any useful debugging info just ask
I would like to get an update on this as well. It would be nice if we can are able to undervolt the cpu as well to prevent overheating.
According to the last response on the mailing list Wei looks reluctant to work on the patch any further. https://lists.xenproject.org/archives/html/xen-devel/2018-12/msg01548.html
Not sure what we should do here. Keep requesting?
I don't know how specific any of this is to certain CPU generations, but on Intel pre-Skylake models scaling seems to work correctly except that xenpm readouts are a little odd.
On a nominal 2.6GHz/3.3GHz turbo CPU, for example, if turbo mode is disabled then the highest speed shown by xenpm is 2600000
. But if turbo is enabled it shows up as 2601000
. This is reflected in xenpv get-cpufreq-para
output. I always assumed xenpm used that +1000 to denote turbo frequencies without showing the actual frequency, and that the CPU handled frequencies internally when turbo was engaged.
Did anyone found a solution? thank you for sharing.
If there is anything you need to test get back to me. thank you!
Same issue here, if anyone need to test something I can try to help!
I'm experiencing the exact same issue on XenServer 7.6 running on an HP DL 380 Gen10. When I boot the server using a live cd for CPU bench-marking. It shows the maximum allowed turbo speed of 3.7 Ghz on my Intel Xeon Gold 6132 Processors.
Same results with this command xenpm start 1|grep "Avg freq" as other people in this thread. See max "boost" to 3.2 instead of 3.7 Ghz. I believe due to a lack of c-states.
I tested using 0 VM's on the host, tried to pin vCPU to a couple of pCPU so I would be assured that the other pCPU's would go to sleep (c states above 3) but nothing works.
I think it has to do with the driver. If we can find a way of getting Xen to better understand the p-states we should be able to get to higher C-states. And therefore be able to boost.
Our Windows and Linux VM's only show the base frequency of 2.6 Ghz, It's just static. Our use-case is the fact that we are running many single threaded applications on our hosts, which I therefore want to run on 3.7 Ghz whenever possible.
Hmm, xenpm
seems pull four values on my four core system...but the odd numbered values are the same as the preceding even numbered values on R4.01 current-testing. Without the grep, you also see that much of the details of the odd numbered cores are missing.
Perhaps it is assuming that hyperthreading is turned on (it is not) and is confused and really only reporting on half of the cores?
EDIT: ah, known to have bugs: https://github.com/QubesOS/qubes-issues/issues/4456
Also...seven(???) (virtual?) cores listed in --get-topology:
[admin@dom0 ~]$ xenpm get-cpu-topology
CPU core socket node
CPU0 0 0 0
CPU1 0 0 0
CPU2 1 0 0
CPU3 1 0 0
CPU4 2 0 0
CPU5 2 0 0
CPU6 3 0 0
[admin@dom0 ~]$
B
I think I've got it figured out for my environment..
Xen reports CPUspeed in an odd way, the basefrequencie+1Mhz. If I test the CPU on Xen, using xenpm start 1|grep "Avg freq" I can see it boost to 3.2 Ghz. I believe it's not going to 3.7 Ghz, because the CPU is running AVX2 instructions as well.
The CPU speed that Intel uses to advertise with is only for NON (Intel) AVX code. More and more applications are running AVX2 instructions. And AVX2 has a much lower base core speed and much lower turbo speed. See this Intel document for more details: https://www.intel.com/content/www/us/en/processors/xeon/scalable/xeon-scalable-spec-update.html
Take a look at page 14 and 15 of the above document. That will show you the graphs.
I have been told that AMD works in a different way, and those not downscale the cores. I'm afraid I can't test this, because I don't have the hardware.
Hope this helps.
As for xenpm
reporting and hyperthreading being disabled, this is most likely the case. In many cases Xen tools confuses "number of CPUs" with "maximum CPU ID".
There were multiple such issues, but not all are suitable for backporting to Xen 4.8.x.
I have been able to test this issue against a AMD Ryzen 7 Pro 3700U
and it suffers from the same problem unfortunately. Doesn't seem to only be an Intel problem. I'm using R4.0 with kernel-latest (5.4.10-1) and linux-firmware (20200122-102.1).
xenpm get-cpufreq-para
reports the minimum clock speed is 1400 MHz and the max is 2300 MHz which is the base clock. It should have a max turbo clock of 4000 MHz. https://www.amd.com/en/products/apu/amd-ryzen-7-pro-3700u
Hello,
This is on my Lenovo Thinkpad x390 with an i7 8665u
/proc/cpuinfo reports 2100 static (which is base for this cpu) dmidecode -t processor reports base of 2100 current 1900 static aswell
xenpm get-cpufreq-states reports wrong freq as stated by many of you before. seeing 2100 on load.
Tried with xenpm start 1 and it does show higher boost, up to 4.2ghz (still far from its 4.8 max) but I can live with it.
I am a bit concern about battery duration... On idle cpu chills around 800mhz with 37-40º but still I am seeing lower battery duration than ubuntu (which i tested with live usb).
I know its not big trouble but is there any update/plan to find out about this?
I know it is off topic, but I can't find much around. Any of you have been able to undervolt? would be nice to know.
I tried with undervolt and I am getting ERROR:root:Failed to apply core: set -99.609375, read 0.0.
I've read somewhere it is an issue with new bioses.
Can anyone confirm?
Thank you for QubesOS! :D
Since it seems Wei would need 'an official request' made to patch this, how can we issue one? Or, is it already merged and all we need is 'intel_pstate=enable' within xen.cfg? My Dell laptop is running ~75C for no good reason...
Anyone tested under Qubes 4.1 yet (uses Xen 4.13)?
xen-4.8.5 kernel 5.5.7-1 X1 carbon 5th gen xenpm start 1 show CPU are always at constant maximum freq. So scaling is not working. now I understand the very low autonomy (around 4 hours real...supposed to top to 15.5h in windows).
same here, thinkpad (L390) gets way to hot (+90°C) with 10%CPU usage. Even with 1%CPU and nothing running I get over 60°C. Bios setting were set to balanced. This renders the device useless (Keyboard and device are really hot! Can't touch it. Battery performance is really bad) and destroys the CPU.
xen-4.8.5 kernel 5.5.7-1 X1 carbon 5th gen xenpm start 1 show CPU are always at constant maximum freq. So scaling is not working. now I understand the very low autonomy (around 4 hours real...supposed to top to 15.5h in windows).
Interestingly xenpm get-cpufreq-para says otherwise (scaling working).
Idle temp is around 60°C too (i7), while specs says it should be 35°C.
Checking with ubuntu focal live : idle temps is below 30°C, watching a video temperature stays below 40. Autonomy is announced > 10h. In qubes <5h.
Hello all,
I observed the same issue on a Thinkpad T420 laptop running Lenovo's latest BIOS, but with an Intel Core i7 CPU upgraded from its original Intel Core i5 CPU.
In my case, the laptop's CPU cores were limited to 800 MHz, which makes the system very slow/sluggish.
After debugging this issue in Xen and in Linux quite a bit, I found that passing the option processor.ignore_ppc=1
on the Linux kernel's command line resolves the issue for me. The laptop returned to its full speed and xenpm
correctly reports that the processor cores are no longer limited to 800 MHz clock frequency.
In summary, during boot-up the Linux kernel uses ACPI to determine processor-related parameters, and passes these to the Xen hypervisor. The "PPC" parameter's value in the ACPI implementation of my laptop's BIOS caused Xen to disable CPU frequency scaling on my laptop.
If my recollection is correct, I was able to observe the PPC value via the following Xen command line option, which causes extra information to be printed out during boot-up (which can be observed via xl dmesg
). The verbose=1
string is the important part. Please adapt the maxfreq
and minfreq
parameters according to the CPU frequencies reported by xenpm get-cpufreq-para
prior to trying this on your laptop.
cpufreq=xen:ondemand,maxfreq=2700000,minfreq=800000,verbose=1
To be clear, this Xen command line option is not necessary to resolve the issue at hand, it is only a debugging aid.
I hope that this helps!
@sylentprofet , out of curiosity, could you confirm whether passing processor.ignore_ppc=1
to the kernel via its command line resolves the issue for you as well?
@m-v-b Thanks for your feedback. I experience this issue on a Latitude 7400 (Core i5-8365U Whiskey Lake). I see it on both R4.0 and R4.1. For me, the simplest way to see it is to just install powertop in dom0 and check the idle power usage---for me, it idles at 15W; it should be more like 7W at most. Unfortunately, setting processor.ignore_ppc=1
didn't change this for me. I think you might be hitting a slightly different bug. I suspect that pstate is really what we need.
@dmoerner You are welcome, and thank you for the feedback as well. I agree with you that we are most likely talking about/observing different bugs. I think that the behaviour resolved by processor.ignore_ppc=1
on my Thinkpad T420 matches the original reporter's description of the issue where the CPU is stuck at its lowest clock frequency.
I took a look at the old intel_pstate patches. The patches apply fairly easily to Xen 4.13; I can post refreshed patches if anyone is curious. (It's here: https://github.com/dmoerner/qubes-vmm-xen/blob/intel-pstate-4.13/intel-pstate-4.13.patch) But I think it's a moot point because, first, it doesn't build, and this kind of error is way above my paygrade:
intel_pstate.c:565:38: error: 'intel_pstate_cpu_ids' causes a section type conflict
with '__setup_str_load'
565 | static __initconst struct x86_cpu_id intel_pstate_cpu_ids[] __initconst = {
| ^~~~~~~~~~~~~~~~~~~~
In file included from intel_pstate.c:3:
/home/user/rpmbuild/BUILD/xen-4.13.1/xen/include/xen/init.h:113:17: note: '__setup_str_load' was declared here
113 | __setup_str __setup_str_##_var[] = _name; \
| ^~~~~~~~~~~~
intel_pstate.c:845:5: note: in expansion of macro 'boolean_param'
845 | boolean_param("intel_pstate", load);
| ^~~~~~~~~~~~~
Second, and more importantly, those intel_pstate patches are based on five-year-old code in the Linux kernel, which has now been significantly reworked. The code base has also been expanded by more than a factor of 2. Someone with real knowledge of Xen will need to redo the main intel_pstate patch, probably from scratch. It doesn't look like the old patches required much work to work with Xen, but I don't know anywhere near enough about Xen or the newer intel_pstate driver to be confident the same is true today.
@dmoerner static __initconst struct x86_cpu_id intel_pstate_cpu_ids[] __initconst = {
Dropping the 2 __initconst
here and on static __initconst struct cpu_defaults core_params = {
and static __initconst struct cpu_defaults byt_params = {
lets it build.
However, like you say, it's 5 years old. The driver doesn't match my hardware, so it doesn't load :( . A re-sync with linux code will be needed.
My ThinkPad x230 (with only a puny i5) used to get stuck on the lowest CPU frequency, and processor.ignore_ppc=1
didn't help - only a full poweroff would unstick it.
What finally fixed it was to swap the power supply from a third-party 90W unit to an original Lenovo 90W, even though I had only ever come across reports of throttling for i7 models on a 65W PSU before. So I can really recommend trying a better power supply if that's an option at all.
What could we do to help with this issue?
Le lun. 28 sept. 2020 à 14:44, jandryuk notifications@github.com a écrit :
@dmoerner https://github.com/dmoerner static __initconst struct x86_cpu_id intel_pstate_cpu_ids[] initconst = { Dropping the 2 initconst here and on static initconst struct cpu_defaults core_params = { and static initconst struct cpu_defaults byt_params = { lets it build.
However, like you say, it's 5 years old. The driver doesn't match my hardware, so it doesn't load :( . A re-sync with linux code will be needed.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/QubesOS/qubes-issues/issues/4604#issuecomment-699982905, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADLUHFJBYRYHSAYAWQMSEDSICALZANCNFSM4GKCH7NQ .
It turned out that on my Qubes 4.1, the xen-acpi-processor
kernel module in dom0 was not loaded at all. Apparently the module is responsible for initializing the data for the cpufreq system, so without it the frequency scaling was not working at all.
My processor is Core i5-10210U. With the module loaded, at least it looks like scaling is working (xenpm
reports P-states between 400 and 1600 MHz). I don't think the turbo mode is working properly, though. xenpm get-cpufreq-average
returns values up to about 2000 MHz, where the processor should scale up to 4200 MHz. (At least it solves my problem with battery life...)
When I try to use the intel_pstate
module (from the patch above), it loads, and after loading the kernel module xenpm
is still reporting something, but I don't think the scaling works at all.
OpenXT was missing xen-acpi-processor
. With that loaded, xenpm shows processor states.
With an i7-8665U, I also see some scaling and use of higher P-states. Unfortunately, it still runs hot.
With the pstate patch above, the i7-8665U doesn't match, so it falls back to the built in acpi-cpufreq driver. I think the benefit of the pstate patch will be hardware-managed P-states (HWP) which would be used by my processor, as far as I can tell.
If the HWP feature has been enabled, intel_pstate relies on the processor to select P-states by itself, but still it can give hints to the processor’s internal P-state selection logic. What those hints are depends on which P-state selection algorithm has been applied to the given policy (or to the CPU it corresponds to).
https://www.kernel.org/doc/html/v4.12/admin-guide/pm/intel_pstate.html
I confirm that with a i7 8550U, scaling 400 MHz - 2000 MHz seems to work:
dom0: xenpm start 1
CPU2: Residency(ms) Avg Res(ms) C0 45 ( 4.54%) 0.15 C1 3 ( 0.36%) 1.78 C2 45 ( 4.51%) 0.68 C3 12 ( 1.26%) 0.57 C4 72 ( 7.28%) 2.21 C5 72 ( 7.25%) 2.50 C6 492 (49.25%) 4.48 C7 255 (25.56%) 5.95
P0 0 ( 0.00%) P1 0 ( 0.00%) P2 0 ( 0.00%) P3 7 (17.02%) P4 0 ( 0.00%) P5 0 ( 0.00%) P6 0 ( 0.00%) P7 0 ( 0.00%) P8 0 ( 0.00%) P9 0 ( 0.00%) P10 3 ( 7.67%) P11 0 ( 0.00%) P12 0 ( 0.96%) P13 0 ( 0.00%) P14 0 ( 0.00%) P15 30 (74.35%) Avg freq 2141070 KHz
But max frequency seems to be 2000 MHz. This is also noticable when using cpu intensive operations.
Strange thing experienced: Running windows 7 in HVM seems to be able to use higher frequencies as cpu intensive operations runs faster. (Haven't compared it to a linux HVM).
I'm happy to support further investigations in this area. @pwmarcz how to enable xen-acpi-processor in dom0?
Has anyone found at least some kind of workaround for these issues? My Lenovo T480 with i7-8550U basically comes crawling to a stop under load. Worked fine in the spring, but now it's almost useless. Not sure exactly which update caused the regression, as I was distracted with transferring the installation to a new SSD, and upgrades along the way.
xenpm shows conflicting information:
Every 1.0s: xenpm start 1 Fri Dec 4 10:31:03 2020
Timeout set to 1 seconds
Start sampling, waiting for CTRL-C or SIGINT or SIGALARM signal ...
Elapsed time (ms): 1001
CPU0: Residency(ms) Avg Res(ms)
C0 752 (75.10%) 3.05
C1 5 ( 0.57%) 5.72
C2 112 (11.24%) 0.89
C3 21 ( 2.18%) 0.73
C4 58 ( 5.86%) 1.00
C5 41 ( 4.17%) 1.74
C6 8 ( 0.87%) 1.46
C7 0 ( 0.00%) 0.00
P0 0 ( 0.00%)
P1 0 ( 0.00%)
P2 0 ( 0.00%)
P3 173 (23.23%)
P4 0 ( 0.00%)
P5 0 ( 0.00%)
P6 0 ( 0.00%)
P7 0 ( 0.00%)
P8 0 ( 0.00%)
P9 0 ( 0.00%)
P10 0 ( 0.00%)
P11 0 ( 0.00%)
P12 0 ( 0.00%)
P13 84 (11.30%)
P14 128 (17.29%)
P15 358 (48.17%)
Avg freq 400200 KHz
CPU1: Residency(ms) Avg Res(ms)
Avg freq 400200 KHz
Seems reasonable, except this is with a few AppVM's running, and some intensive processes chewing up CPU cycles and slowing everything to a crawl, so it should never be in P15 at 400 MHz, especially since it's never overclocking into turbo mode. On closer inspection It would appear that the only thing that makes sense above is the "Ave freq" reported at 400 MHz. The P-states are listed as follows:
cpu id : 0
total P-states : 16
usable P-states : 13
current frequency : 400 MHz
P0 [2001 MHz]
P1 [2000 MHz]
P2 [1900 MHz]
P3 [1800 MHz]
P4 [1700 MHz]
P5 [1500 MHz]
P6 [1400 MHz]
P7 [1300 MHz]
P8 [1200 MHz]
P9 [1100 MHz]
P10 [1000 MHz]
P11 [ 800 MHz]
P12 [ 700 MHz]
P13 [ 600 MHz]
P14 [ 500 MHz]
P15 [ 400 MHz]
_(trimmed for clarity)_
So, ~20% of time in P3 at 1800 MHz should already be adding 360 MHz to the average above the "floor" of 400 MHz, even discounting the time spent at 500 & 600 MHz. Therefore the reported frequency statistics are clearly inconsistent and wrong. I know that it's probably just sitting continuously at the "Ave freq" of 400 MHz because heat and fan speed are consistent with an idled CPU, and also with package temperature reported from dom0: 44C. No wonder the machine is struggling to keep up.
Changing the governor between "ondemand" and "performance" seems to have an effect on both reported frequency distribution and responsiveness, but nothing dramatic. Either way, videoconferencing on some of the major platforms (zoom,teams,bluejeans,jitsi) can't even handle audio-only. No good given the current state of the world. :(
The laptop runs just fine under both Windows and debian on bare metal, so I think the hardware is OK.
Qubes basically becomes a non-starter under these circumstances, which I sorely regret, since I've really been impressed with how the developers have been able to make it all come together and work fairly well while living up to the high standards the project has set. While I've had do work through some inconveniences now and then, Qubes has actually been very usable as my daily driver, and is incredibly flexible and powerful. Kudos to all that contribute to the project. It would be a shame for me to have to walk away from using Qubes now.
Disabling xen cpufreq control by adding:
cpufreq=none,verbose
to the options line in /boot/efi/EFI/qubes/xen.cfg at least seems to render the machine responsive under load. Not clear yet how battery life is affected, although it appears it's better than before (which was never outstanding), and I can't figure out what frequency the CPU is running at without dom0 controlling it. xenpm only show C-state information this way.
Any hints or thoughts welcome!
Disabling xen cpufreq control by adding:
cpufreq=none,verbose
to the options line in /boot/efi/EFI/qubes/xen.cfg at least seems to render the machine responsive under load. Not clear yet how
@core-i1: Could you please explain where to add this line exactly in xen.cfg? I could find a spec for this file. I have a global section and one section per kernel listed there. As I still experience same behavior, I'm happy for every hint to experiment with.
@100111001: Here's my current xen.cfg with the cpufreq setting added to the end of the options line of the default kernel:
[global] default=4.19.155-1.pvops.qubes.x86_64
[4.19.132-1.pvops.qubes.x86_64] options=loglvl=all dom0_mem=min:1024M dom0_mem=max:4096M iommu=no-igfx ucode=scan smt=off kernel=vmlinuz-4.19.132-1.pvops.qubes.x86_64 root=/dev/mapper/qubes_dom0-root rd.luks.uuid=luks-a38d10b9-0944-45ec-9873-1e9d0515be2c rd.lvm.lv=qubes_dom0/root rd.lvm.lv=qubes_dom0/swap i915.alpha_support=1 rhgb quiet rd.qubes.hide_all_usb plymouth.ignore-serial-consoles ramdisk=initramfs-4.19.132-1.pvops.qubes.x86_64.img [4.19.152-1.pvops.qubes.x86_64] options=loglvl=all dom0_mem=min:1024M dom0_mem=max:4096M iommu=no-igfx ucode=scan smt=off kernel=vmlinuz-4.19.152-1.pvops.qubes.x86_64 root=/dev/mapper/qubes_dom0-root rd.luks.uuid=luks- rd.lvm.lv=qubes_dom0/root rd.lvm.lv=qubes_dom0/swap i915.alpha_support=1 rhgb quiet rd.qubes.hide_all_usb plymouth.ignore-serial-consoles ramdisk=initramfs-4.19.152-1.pvops.qubes.x86_64.img [4.19.155-1.pvops.qubes.x86_64] options=loglvl=all dom0_mem=min:1024M dom0_mem=max:4096M iommu=no-igfx ucode=scan smt=off cpufreq=none,verbose kernel=vmlinuz-4.19.155-1.pvops.qubes.x86_64 root=/dev/mapper/qubes_dom0-root rd.luks.uuid=luks-a38d10b9-0944-45ec-9873-1e9d0515be2c rd.lvm.lv=qubes_dom0/root rd.lvm.lv=qubes_dom0/swap i915.alpha_support=1 rhgb quiet rd.qubes.hide_all_usb plymouth.ignore-serial-consoles ramdisk=initramfs-4.19.155-1.pvops.qubes.x86_64.img
Seems to have saved my installation for the moment: the machine is OK under minimal load, and is able to handle light videoconferencing at about 90% AppVM CPU usage (way too high, should normally be <=30%). I'll have a better idea when I stress test it this week under more demanding conditions.
Thanks @core-i1. After 2 days, my observation is that the overall system reacts slower than before especially with a lot of concurrency. But the main point is that there are no hickups under heavy load. That's why I consider it as a workaround for my system.
@100111001: Yes, I would say my experience is consistent with yours - generally sluggish performance, but also no longer standing still under load. Videoconferencing generally usable with both audio & video, but right on the edge most of the time (~90% CPU usage). Temps edging to 70C under load, so CPU must be ramping up frequency to some extent. If I can help with any logs or testing to help diagnose this, just let me know.
I wanted to experiment a bit to solve the issue and my idea was to build vmm-xen-dom0 with the intel_pstate patch refreshed by @dmoerner, but modified adding the CPUIDs of all Intel processors released after 2015. Then, adding intel_pstate=enable on Xen CMDLINE should enable the drivers, or at least I hope so.
Unfortunately, I am having problems in building vmm-xen-dom0 even after the changes @jandryuk suggested, and the errors are not clear even after I had enabled DEBUG=1 and VERBOSE=1 in my builder.conf file. So I tried to download Xen only, apply the patch with those changes, and it does compile. But only Xen, it does fail when I try to compile Xen tools. @jandryuk and @pwmarcz were you able to compile vmm-xen-dom0 with the patch?
@marcogiglio I used Xen with OpenXT patches, not vmm-xen-dom0. I can't help with the Qubes builder stuff.
I didn't attempt your idea of adding CPUIDs since, as I wrote in https://github.com/QubesOS/qubes-issues/issues/4604#issuecomment-714498521, you want HWP which doesn't exist in the old patch. Making a new patch to enable HWP is the way forward, in my opinion.
Even with the old pstate patch, there was something wrong - xenpm didn't report strings properly for the running governor.
I submitted a patchset adding HWP support to Xen here: https://lore.kernel.org/xen-devel/20210308210210.116278-1-jandryuk@gmail.com/T/#m726bf3e6056806d91d1ebe095ad5d6e38a99f9ab
Testing is appreciated.
Testing is appreciated.
Thank you! Can you provide installable packages for Q4.0 for testing? I'd guess most people (including me) don't really know how to build them locally.
@jandryuk I built these patches against Xen-4.14.1 for Qubes R4.1. They applied with some fuzz, including a bit of manual work on xen/include/asm-x86/msr-index.h
. (I assume they were made against the latest git sources.) I then tested them on a Dell Latitude 7400, Core i5-8365U (8th gen, "Whiskey Lake"). Unfortunately, it doesn't seem to work for me, but perhaps I need to take some extra steps.
sudo dmesg | grep pstate
outputs intel_pstate: CPU model not supported
. In contrast, on Fedora 33, I get: intel_pstate: HWP enabled
.
sudo xenpm get-cpufreq-para
outputs [CPU0] failed to get cpufreq parameter
, etc.
Qubes OS version:
R4.0
Affected component(s):
intel_pstate acpi-cpufreq xenpm
Steps to reproduce the behavior:
Tested on:
All with Intel i7-8550U.
Latest BIOS revisions for the respective systems as of Dec. 2018
Kernel: 4.19.2-3.pvops.qubes.x86_64.
EFI install.
In dom0,
sudo xenpm get-cpufreq-para
Expected behavior:
The processor is rated at 1.8 GHz (4.0 turbo), so we would expect to see appropriate scaling in that range, available frequencies from 1800000 - 4000000.
Further, we would expect to see
scaling_driver = intel_pstate
.Actual behavior:
The CPU frequencies do not scale correctly. Why?
Frequencies are pinned at 2 GHz max, 400 MHz min, across all cores.
Confirmed with
watch -n1 "cat /proc/cpuinfo | grep \"[c]pu MHz\""
xenpm set-scaling-maxfreq
and-minfreq
have no effect.xenpm get-cpufreq-states
shows 16 total/usable P-states.Changing the governor to
performance
has no effect. Default isondemand
dmidecode
reports a max of 2 GHz on the Lenovos, and an apparently erroneous speed on the Huawei (~ 8 GHz).The
scaling_driver
is legacyacpi-cpufreq
. Interestingly,intel_pstate
can be seen initializing during boot, but it does not take over handling anything. Attempting toblacklist acpi-cpufreq
inmodprobe.d
has no effect./sys/devices/system/cpu/intel_pstate/
contains the expected attributes, but as mentioned in the "related issue" linked below,no_turbo
,num_pstates
, andturbo_pct
errorResource temporarily unavailable
./sys/devices/system/cpu/intel_pstate/status
always returnsoff
, and does not respond toecho "active" >
. This behavior has been tested with various kernel command line parameters, includingintel_pstate=force
,intel_pstate=disabled
,intel_pstate=no_hwp
,intel_pstate=enable
with no change in performance aside from../cpu/intel_pstate/
attributes disappearing whenno_hwp
ordisabled
were in effect. Also triedprocessor.ignore_ppc=1
.Strangely, none of the appropriate attributes for
cpufreq
exist in/sys/devices/system/cpu/cpu*/
.lsmod | grep cpufreq
shows no results, trying tomodprobe acpi-cpufreq
orcpufreq-xen
returns errors.xen_acpi_processor
is loaded.cpupower frequency-info
is completely unresponsive, with zero information available about the processor.Though it shouldn't have any effect, testing was attempted with
smt=on
andoff
, andHyperthreading
enabled/disabled in the BIOS appropriately.Testing was also performed while toggling various BIOS settings.
Intel SpeedStep
Maximum Performance
vs.Balanced
It does not appear to be a thermal throttling issue, with idle ~ 37C and under load ~60C observed consistently.
tlp
was tested with no effect on the frequency scaling, regardless of being enabled or disabled.tlp-stat
yields minimal additional info, with what seems to be an outdated recommendation for the Lenovos to installtp-smapi kernel modules
, that are in fact deprecated in favor ofthinkpad_acpi
, which appears to be active on the Thinkpads.thermald
is not loaded.General notes:
https://www.kernel.org/doc/html/v4.12/admin-guide/pm/intel_pstate.html
This link suggests removing
irqbalance
but I'm skeptical. https://askubuntu.com/questions/1067866/ubuntu-18-04-steam-games-frame-rate-drop/1073353#1073353?newreg=c7c120f373da4effb7317104571cd573https://cateee.net/lkddb/web-lkddb/XEN_ACPI_PROCESSOR.html Regarding xen_acpi_processor: "It also registers itself as the SMM so that other drivers (such as ACPI cpufreq scaling driver) will not load."
How could
lsmod
reportxen_acpi_processor
as loaded butxenpm
shows the scaling driveracpi-cpufreq
? This might make sense as to the missing/sys/devices/.../cpufreq
entries.The following exchange is dubious at best, the final post gets down to the point of disabling intel microcode. They also suggest the use of
msr-tools
, but that really shouldn't be necessary. https://bbs.archlinux.org/viewtopic.php?id=231077This is good work, but in my opinion, running a script every few seconds in dom0 isn't a legitimate fix. https://github.com/erpalma/lenovo-throttling-fix
Related issues:
https://github.com/QubesOS/qubes-issues/issues/4491 https://github.com/QubesOS/qubes-issues/issues/450