Power Consumption 2-3x after first suspend/resume

AndrewX192 commented 5 years ago

Qubes OS version R4.0.2-rc1

Affected component(s) or functionality Power Management (Xen) Brief summary Power usage 2-3X after first suspend/resume cycle

To Reproduce (Tested on Lenovo T470 and T490)

Turn off laptop
Turn on laptop and login to desktop
Note power usage using powertop, and look at power consumption - you can wait as long as you like for sampling
Suspend computer by closing lid or through system menu
Resume computer
Note that power usage reported in powertop is 2-3x what it was before, and is sustained until the machine is rebooted

Expected behavior Power management is restored after resumption, and not materially different than before putting laptop to sleep.

Actual behavior Laptop consumes 2-3x power consumption in watts before suspend

Screenshots N/A

Additional context Not sure how many systems this impacts. I lived with this for years on the T470, but when I got the T490 the symptoms were way more noticeable and I can't carry around a spare battery anymore to mitigate it.

Solutions you've tried

Setting xempm cpufreq to "ondemand"
Using tlp in dom0 before/after suspend
Testing on more than one device (T470/T490)
Updating to kernel-latest

Relevant documentation you've consulted

Related, non-duplicate issues none

marmarek commented 5 years ago

Try disabling hyperthreading in BIOS. It may be related to a software method used to disable sibling threads and how it interact with suspend/resume. I've seen related patches in newer Xen version - if that helps, we it may be worth considering a backport.

AndrewX192 commented 5 years ago

So, I thought I had disabled Hyperthreading, but it appears that it was still enabled. I turned it on and back off, and now it appears to be disabled. Battery life is significantly better.

alpha-and-omega commented 5 years ago

@marmarek Disabling Hyperthreading solved my #5267 problem too, thanks! Please backport!

marmarek commented 5 years ago

Unfortunately there quite a bit differences between Xen 4.8 and 4.13 and the backport is complex (https://github.com/xen-project/xen/commit/eb91201107, https://github.com/xen-project/xen/commit/8797d20a6ec2dd75195585a107ce345c51c0a59a). Given there is an easy workaround (disabling hyperthreading in BIOS, instead of in software), it's safer to go this way. Note we do disable hyperthreading anyway (by not using secondary threads), so you don't pay any performance cost. And actually the code to "disable it" with software has the issues after resume, so by disabling it in BIOS you avoid the problem.

m-v-b commented 4 years ago

Hello @marmarek! My laptop's BIOS does not offer an option to disable hyperthreading, so I cherry-picked/back-ported 5 commits to xen-4.8, which resolved the higher-power-consumption-after-suspend-to-RAM issue according to powertop's power measurements. (For the record, powertop appears to work fine in an HVM, but not in PV mode.)

Would you be interested in a pull request to qubes-vmm-xen to resolve this issue in software in Qubes OS 4.0 for people whose laptops do not have a hyperthread on/off switch?

Here is a list of the commits I cherry-picked:

8d88eacb3b390984e2c483d75d8f40b3ec4be67c ("xen: add helper for calling notifier_call_chain() to common/cpu.c")
51c79e943fb3f9a746181f8b8415cf2baa5d26bd ("xen: add new cpu notifier action CPU_RESUME_FAILED")
6870ea9d1fad6fbe27cf3ce5fe093be709ad2668 ("xen/cpupool: simplify suspend/resume handling")
eb912011076e76f6c5e4013616a61ba670e7fc15 ("x86/ACPI: re-park previously parked CPUs upon resume from S3")
b0000b128adb07f4107c6e324d32ab025a73a6c8 ("sched: populate cpupool0 only after all cpus are up")

(It is possible that the 5th commit is not truly necessary.)

marmarek commented 4 years ago

Yes, if you managed to get it compile and work, feel free to open a PR :)

m-v-b commented 4 years ago

Hello again,

First of all sorry for the delay, and sorry about the wall of text that follows:

During my tests after having cherry-picked the aforementioned commits, I encountered an unusual issue: systemd-sleep (i.e., systemd's helper program which actually requests from the kernel to suspend the system) would occasionally segfault during the resume from suspend to ram. When inspecting the core dump, the disassemble command in gdb would point to an instruction loading from the %fs register, possibly one that loads data from thread-local storage:

dom0 kernel: systemd-sleep[7102]: segfault at 10 ip 00007fb79ac283f9 sp 00007ffd7c1dd508 error 4 in libc-2.24.so[7fb79abb1000+1bd000]

Core was generated by `/usr/lib/systemd/systemd-sleep suspend'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007fb79ac283f9 in ferror () from /lib64/libc.so.6
Missing separate debuginfos, use: dnf debuginfo-install systemd-udev-231-19.fc25.x86_64
(gdb) bt
#0  0x00007fb79ac283f9 in ferror () from /lib64/libc.so.6
#1  0x00007fb79b4d1f3c in fflush_and_check () from /usr/lib/systemd/libsystemd-shared-231.so
#2  0x000056456b3d63c8 in main ()
(gdb) disassemble 
Dump of assembler code for function ferror:
   0x00007fb79ac283e0 <+0>:     mov    (%rdi),%edx
   0x00007fb79ac283e2 <+2>:     mov    %rdi,%r9
   0x00007fb79ac283e5 <+5>:     mov    %edx,%eax
   0x00007fb79ac283e7 <+7>:     and    $0x8000,%eax
   0x00007fb79ac283ec <+12>:    jne    0x7fb79ac284a0 <ferror+192>
   0x00007fb79ac283f2 <+18>:    mov    0x88(%rdi),%r8
=> 0x00007fb79ac283f9 <+25>:    mov    %fs:0x10,%r10
   0x00007fb79ac28402 <+34>:    cmp    0x8(%r8),%r10

Upon debugging this further, I noticed that this occurs because the %fs register (i.e., the FS segment index register) is clobbered by Xen to the value 0x9b00. Having a non-zero value in %fs appears to cause the kernel to re-load the FS segment when context switching to systemd-sleep running in dom0 userspace, which eventually causes a segfault:

Jul 08 12:54:38 kernel: Enabling non-boot CPUs ...
Jul 08 12:54:38 kernel: installing Xen timer for CPU 1
Jul 08 12:54:38 kernel: xxx: save_fsgs: cpu:0 task:systemd-sleep saved fsindex != 0 (0x9b00)
Jul 08 12:54:38 kernel: xxx: load_seg_legacy: cpu:0 prev_task:systemd-sleep next_task:swapper/0 next_index: 0 <= 3
Jul 08 12:54:38 kernel: xxx: load_seg_legacy: cpu:0 prev_task:systemd-sleep next_task:swapper/0 next_base == 0. next_index:0x0 CPU has X86_BUG_NULL_SEG
Jul 08 12:54:38 kernel: xxx: load_seg_legacy: loading segments. cpu:0. prev_task:swapper/0: next_task:systemd-sleep May cause a segfault... next_index: 39680 > 3 (0x9b00)
...
Jul 08 12:54:38 kernel: PM: suspend exit
Jul 08 12:54:38 kernel: systemd-sleep[7102]: segfault at 10 ip 00007fb79ac283f9 sp 00007ffd7c1dd508 error 4 in libc-2.24.so[7fb79abb1000+1bd000]
Jul 08 12:54:38 kernel: Code: 41 89 48 04 eb e1 66 2e 0f 1f 84 00 00 00 00 00 90 8b 17 49 89 f9 89 d0 25 00 80 00 00 0f 85 ae 00 00 00 4c 8b 87 88 00 00 00 <64> 4c 8b 14 25 10 00 00 00 4d 3b 50 08 0f 84 a4 00 00 00 be 01 00

The fix is thankfully very simple: We can simply stash the original value of the %fs register prior to suspend, and restore it after the resume is complete.

Here is the section of code in Xen that clobbers the %fs register. Quoting from xen/arch/x86/boot/wakeup.S

43          # boot trampoline is under 1M, and shift its start into
44          # %fs to reference symbols in that area
45          mov     wakesym(trampoline_seg), %fs
46          lidt    %fs:bootsym(idt_48)
47          lgdt    %fs:bootsym(gdt_48)

However, I have not been able to explain why the clobbered value of the %fs register intermittently causes issues in dom0 userspace with the cherry-picked patches. After instrumenting the kernel and Xen, I have noticed that the migration/x and swapper/x kernel threads seem to be tainted with the leaked %fs register value as well, even in an unpatched Xen configuration.

All this to say, there appears to be a bug in Xen that interacts negatively with the commits I had proposed to cherry-pick to Qubes OS's version of Xen. I also noticed that recent versions of Xen no longer use the %fs register in the aforementioned part of the same file, so it is possible that recent versions of Xen are not affected by this bug.

@marmarek, would you be open to merging an additional patch that stashes and restores the %fs register as part of a pull request?

andrewdavidwong commented 4 years ago

I'm a bit confused about the status of this issue. Should it be reopened? If so, why?

m-v-b commented 4 years ago

Perhaps the bug could be reopened for the purpose of tracking a Xen-based solution to this bug in Qubes OS? I think that it would be valuable to resolve this issue in the version of Xen shipped by Qubes OS for users whose laptops do not provide a hyper-threading enable/disable switch, and Marek was (at least initially) okay with my preparation of a pull request that would resolve this bug in Qubes OS's version of Xen.

marmarek commented 4 years ago

I also noticed that recent versions of Xen no longer use the %fs register in the aforementioned part of the same file, so it is possible that recent versions of Xen are not affected by this bug.

Can you find the change when it was removed? Perhaps commit message would point out some more context and possibly other related issues.

m-v-b commented 4 years ago

I also noticed that recent versions of Xen no longer use the %fs register in the aforementioned part of the same file, so it is possible that recent versions of Xen are not affected by this bug.

Can you find the change when it was removed? Perhaps commit message would point out some more context and possibly other related issues.

I realized that I missed another important piece of the code. Please see the context around the change that partially removes the dependence of %fs. It looks like %fs is modified not only for loading the descriptor tables, but also for printing progress output to the console/VGA. All this to say, the overwriting of %fs might affect recent versions of Xen as well!

commit c1740dc20ae5ea74d9dfc101cb9d2c2ed437f8a6
Author: David Woodhouse <dwmw@amazon.co.uk>
Date:   Sun Apr 28 17:13:37 2019 +0300

    x86/wakeup: Stop using %fs for lidt/lgdt

    The wakeup code is now relocated alongside the trampoline code, so
    as long as we move idt_48 and gdt_48 up a little bit so that they're
    visible in the real-mode segment that the wakeup code runs in, using
    %ds is just fine here.

    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
    Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>

diff --git a/xen/arch/x86/boot/wakeup.S b/xen/arch/x86/boot/wakeup.S
index f9632eef952c..89df2617ae23 100644
--- a/xen/arch/x86/boot/wakeup.S
+++ b/xen/arch/x86/boot/wakeup.S
@@ -33,25 +33,22 @@ ENTRY(wakeup_start)
         testl   $2, wakesym(video_flags)
         jz      1f
         movl    wakesym(video_mode), %eax
         call    mode_setw

 1:      # Show some progress if VGA is resumed
         movw    $0xb800, %ax
         movw    %ax, %fs
         movw    $0x0e00 + 'L', %fs:(0x10)

-        # boot trampoline is under 1M, and shift its start into
-        # %fs to reference symbols in that area
-        mov     wakesym(trampoline_seg), %fs
-        lidt    %fs:bootsym(idt_48)
-        lgdt    %fs:bootsym(gdt_48)
+        lidt    wakesym(idt_48)
+        lgdt    wakesym(gdt_48)

m-v-b commented 4 years ago

For what it is worth, I have pushed my work to my vmm-xen repository clone on Github. Here is the commit. It is tested and ready for a pull request.

marmarek commented 4 years ago

I would recommend you to send your last commit to xen-devel, including the full context (that it un-breaks suspend with those backports, but based on source code inspection, it may be also an issue on master branch too).

m-v-b commented 4 years ago

Thanks for taking a look at the patches, and sorry for the delay. Sounds good; I will bring this topic up on the xen-devel mailing list, and I will make sure to mention the context. I intend to do this either later today or tomorrow.

m-v-b commented 4 years ago

As promised, I reported this issue on the xen-devel mailing list, and another patch to address the issue in a proper and more elegant manner was published on the xen-devel mailing list by Jan Beulich. Here is a link to the proposed patch. There have been code review comments on the mailing list as well.

I am currently waiting for the patch to be delivered to the master branch of the Xen repository, which I can later backport to the version of Xen used by Qubes OS. Another option would be to preliminarily use (in Qubes OS) Andrew Cooper's suggested version of the patch, so that this issue is resolved in Qubes OS sooner. Afterwards, when an official delivery is made to Xen repository's master branch, the patch carried by Qubes OS could be aligned to the official version.

m-v-b commented 3 years ago

@marmarek, given that the changes to the Xen hypervisor have been merged upstream as well as in Qubes OS, maybe we can close this issue?

na-- commented 1 year ago

FWIW, I appear to still be suffering from this problem or something quite similar to it, with an 11th gen Intel CPU and QubesOS v4.1 (xen-4.14.5-20.fc32.x86_64 and dom0 kernels 6.2.10 or 6.3.9).

Any advice? Were the Xen 4.8 patches maybe dropped in later versions? :confused:

na-- commented 1 year ago

FWIW, this still seems to be a problem with the 6.4.8 kernel, with and without disabled hyperthreading :disappointed: Weirdly, when I disabled hyperthreading from the BIOS, my machine felt perceptibly slower compared to when it was enabled, despite having smt=off in the Xen boot arguments in both cases :confused: :man_shrugging:

na-- commented 1 year ago

Has anyone with this issue updated to the R4.2 release candidates and noticed the power consumption permanently increase from the system start, even without doing a suspend+resume? Because that is what I observed (https://github.com/QubesOS/qubes-issues/issues/8664) and I am wondering if the cause even for my original issue here was that my previously disabled discrete GPU was somehow re-enabled after resuming the suspended system :thinking: That could explain why disabling hyperthreading from the BIOS had no effect in my case...

QubesOS / qubes-issues

Power Consumption 2-3x after first suspend/resume #5210