QubesOS / qubes-issues

The Qubes OS Project issue tracker
https://www.qubes-os.org/doc/issue-tracking/
535 stars 47 forks source link

Error when shutting down whonix templates and Qube Manager becomes unresponsive (Failed to shutdown domain '18' with libxenlight) #7510

Closed resulin closed 2 years ago

resulin commented 2 years ago

How to file a helpful issue

Qubes OS release

4.1

Brief summary

Steps to reproduce

Shutting down a whonix based VM (either whonix-ws or whonix-gw).

Expected behavior

No error.

Actual behavior

Error when shutting down whonix templates, also if done using the Qube Manager or the Qube Widget they both lag and become unresponsive.

                    [Dom0] Error shutting down qube
The following error occurred while attempting to shut down qube sys-whonix:
Got empty response from qubesd. See journalctl in dom0 for details. 
                    [Dom0] Error shutting down Qube!
ERROR: Got empty response from qubesd. See journalctl in dom0 for details.

journalctl log:

May 15 10:24:06 dom0 audit[86161]: USER_START pid=86161 uid=1000 auid=1000 ses=2 msg='op=AM:session_open grantors=pam_keyinit,pam_limits,pam_systemd,pam_unix acct="root" exe="/usr/bin/pkex>
May 15 10:24:06 dom0 kernel: audit: type=1105 audit(1652621046.225:624): pid=86161 uid=1000 auid=1000 ses=2 msg='op=PAM:session_open grantors=pam_keyinit,pam_limits,pam_systemd,pam_unix acc>
May 15 10:24:06 dom0 pkexec[86161]: user: Executing command [USER=root] [TTY=unknown] [CWD=/] [COMMAND=/usr/sbin/xfpm-power-backlight-helper --set-brightness 1490]
May 15 10:24:06 dom0 pkexec[86165]: pam_unix(polkit-1:session): session opened for user root by (uid=1000)
May 15 10:24:06 dom0 audit[86165]: USER_START pid=86165 uid=1000 auid=1000 ses=2 msg='op=PAM:session_open grantors=pam_keyinit,pam_limits,pam_systemd,pam_unix acct="root" exe="/usr/bin/pkex>
May 15 10:24:06 dom0 kernel: audit: type=1105 audit(1652621046.325:625): pid=86165 uid=1000 auid=1000 ses=2 msg='op=PAM:session_open grantors=pam_keyinit,pam_limits,pam_systemd,pam_unix acc>
May 15 10:24:06 dom0 pkexec[86165]: user: Executing command [USER=root] [TTY=unknown] [CWD=/] [COMMAND=/usr/sbin/xfpm-power-backlight-helper --set-brightness 1515]
May 15 10:24:55 dom0 libvirtd[1758]: internal error: Failed to shutdown domain '18' with libxenlight
May 15 10:24:55 dom0 qubesd[2051]: unhandled exception while calling src=b'dom0' meth=b'admin.vm.Shutdown' dest=b'sys-whonix' arg=b'' len(untrusted_payload)=0
May 15 10:24:55 dom0 qubesd[2051]: Traceback (most recent call last):
May 15 10:24:55 dom0 qubesd[2051]:   File "/usr/lib/python3.8/site-packages/qubes/api/__init__.py", line 286, in respond
May 15 10:24:55 dom0 qubesd[2051]:     response = await self.mgmt.execute(
May 15 10:24:55 dom0 qubesd[2051]:   File "/usr/lib/python3.8/site-packages/qubes/api/admin.py", line 901, in vm_shutdown
May 15 10:24:55 dom0 qubesd[2051]:     await self.dest.shutdown(force=force, wait=wait)
May 15 10:24:55 dom0 qubesd[2051]:   File "/usr/lib/python3.8/site-packages/qubes/vm/qubesvm.py", line 1339, in shutdown
May 15 10:24:55 dom0 qubesd[2051]:     self.libvirt_domain.shutdown()
May 15 10:24:55 dom0 qubesd[2051]:   File "/usr/lib/python3.8/site-packages/qubes/app.py", line 103, in wrapper
May 15 10:24:55 dom0 qubesd[2051]:     return attr(*args, **kwargs)
May 15 10:24:55 dom0 qubesd[2051]:   File "/usr/lib64/python3.8/site-packages/libvirt.py", line 2672, in shutdown
May 15 10:24:55 dom0 qubesd[2051]:     if ret == -1: raise libvirtError ('virDomainShutdown() failed', dom=self)
May 15 10:24:55 dom0 qubesd[2051]: libvirt.libvirtError: internal error: Failed to shutdown domain '18' with libxenlight
May 15 10:24:55 dom0 systemd-homed[1737]: block device /sys/devices/virtual/block/dm-69 has been removed.
May 15 10:24:55 dom0 systemd-homed[1737]: block device /sys/devices/virtual/block/dm-69 has been removed.
resulin commented 2 years ago

This only started happening recently (~1 week ago).

DemiMarie commented 2 years ago

The unresponsiveness is likely in part due to #4719. Do you have anything interesting in /var/log/libvirt/libxl/sys-whonix.log?

resulin commented 2 years ago

@DemiMarie

I can't tell from looking at it but here's my entire sys-whonix.log. https://privatebin.net/?6b82b3a92c3cb250#6gAZvyZ3TbeFH7gkcHSMojPAHu2HvaGe7JW4L5pbVdWX

Reports in the forums suggest that it's related to https://github.com/QubesOS/qubes-issues/issues/7404 : https://forum.qubes-os.org/t/suspend-swap-and-whonix-gateway-performance-issues/11310/3

I will test later today whether applying the patch in #7404 solves the issue.

resulin commented 2 years ago

I don't know if it's related but I also noticed that systemd-socket-proxyd jumps to 100% CPU usage after a while. I have to keep killing it each time. Should I open a different bug report for this one?

DemiMarie commented 2 years ago

systemd-socket-proxyd spinning at 100% CPU is a systemd bug. Please report it to systemd if it is one of the two most recent versions, or to your distribution maintainer otherwise. Only report it here if it is in dom0.

DemiMarie commented 2 years ago

Would you mind adding:

log_filters="3:*"
log_level="1"
log_outputs="1:syslog:libvirtd 1:journald"

to /etc/libvirtd/libvirtd.conf and then restarting libvirtd with systemctl restart libvirtd? That will cause libvirt to log debug messages from libxl.

resulin commented 2 years ago

@DemiMarie

After adding these lines to /etc/libvirt/libvirtd.conf:

log_filters="3:*"
log_level="1"
log_outputs="1:syslog:libvirtd 1:journald"

And then running systemctl restart libvirtd I run into the following error:

Job for libvirtd.service failed because the control process exited with error code.
See "systemctl status libvirtd.service" and "journalctl -xe" for details.
dom0 libvirtd[207503]: libvirt version: 6.6.0, package: 5.fc32 (Unknown, 2021-10-02-04:34:58, )
dom0 libvirtd[207503]: hostname: dom0
dom0 libvirtd[207503]: internal error: unexpected domain Domain-0 already exists
dom0 libvirtd[207503]: Failed to load config for domain 'Domain-0'
DemiMarie commented 2 years ago

@DemiMarie

After adding these lines to /etc/libvirt/libvirtd.conf:

log_filters="3:*"
log_level="1"
log_outputs="1:syslog:libvirtd 1:journald"

And then running systemctl restart libvirtd I run into the following error:

Job for libvirtd.service failed because the control process exited with error code.
See "systemctl status libvirtd.service" and "journalctl -xe" for details.

Can you post the full log output of libvirtd?

dom0 libvirtd[207503]: libvirt version: 6.6.0, package: 5.fc32 (Unknown, 2021-10-02-04:34:58, )
dom0 libvirtd[207503]: hostname: dom0
dom0 libvirtd[207503]: internal error: unexpected domain Domain-0 already exists
dom0 libvirtd[207503]: Failed to load config for domain 'Domain-0'

This is a libvirtd bug, but should not be causing libvirtd start to fail.

resulin commented 2 years ago

Manually applying the patch from https://github.com/QubesOS/qubes-core-admin/pull/473 and then restarting seems to have solved the issue, suspend works without problem now.

@DemiMarie feel free to re-open this issue if you want to investigate that specific libvirtd problem if you think it's worth it, I can definitely go back and undo the patch above to provide the full logs for libvirtd if needed.

qtpies commented 2 years ago

Manually applying this patch seems to fix this issue for me as well.