Open tortugaverde98 opened 6 years ago
I ran: sudo lvm
Template have three items listed: templatename-private templatename-root templatename-root-152682xxxx-back
Most app VMs have two items: vmname-private vmname-15282xxxxx-private
But sys-usb, sys-net, and sys-firewall have five: -private -private-15282xxxxx-back -private-snap -root-snap -volatile
sys-whonix has four: -private -private-15282xxxxx-back -private-snap -private-tmp
I'm at risk of losing a couple weeks worth of intense work. Please advise.
You can remove -root-snap
volume of VMs not currently running (sys-usb, sys-net etc) - it is created during VM startup (as a snapshot of template), but apparently it failed to remove it after previous shutdown (a crash?)
But first check one more thing - in lvs
command output there is Data%
and Meta%
column - check if any of volumes isn't full or nearly full, especially in Meta%
column.
Data% for pool00 is 62.11 Meta for pool00 is 34.22
The data column is populated for all items, but meta has data only for pool00.
I ran: sudo lvremove qubes_dom0 vm-sys-usb-root-snap
After a brief heart attack, i said 'n' to removing the pool(!). Then i said no to every other prompt, watching for vm-sys-usb-root-snap. However, no -root-snaps were ever presented in the list of items to which i had to declare y/n.
Yet they continue to exist in sudo lvm.
Another note is when declaring no to -private-snaps for sys-usb and sys-net, instead of the usual response 'logical volume xxx not removed', additional errors were displayed: Can't create snapshot vm-sys-whonix-private as origin vm-sys-whonix-private-snap is not suspended. Failed to suspend qubes_dom0/pool00 with queued messages. Failed to update pool qubes_dom0/pool00.
Is there a way i can attempt to suspend vm-sys-whonix-private-snap? Is that what is holding everything up?
The Origin column is blank except for sys-usb, sys-net, sys-firewall, and sys-whonix. For usb, net, and fw, origin shows -private and the template -root.
For sys-whonix, origin shows -snap and two -tmp.
I attempted qvm-shutdown on all four of these VMs, but gave Shutdown error: domain is powered off. No VMs are running.
I ran: sudo lvremove vm-sys-whonix-private-tmp
But it is still there, referenced twice in Origin column, once for -private-15268xxxx-back and once for -snap.
However, when i ran it again, it did not appear in the list of items for y/n, but still listed in sudo lvm.
Attempted: qvm-remove sys-whonix qvm-remove: error: Domain is in use: 'sys-whonix'; details in system log
systemctl | grep sys-whonix qubes-vm@sys-whonix.service loaded failed failed Start Qubes VM sys-whonix
Rebooted, no change.
sudo lvremove qubes_dom0 vm-sys-usb-root-snap
Should be lvremove qubes_dom0/vm-sys-usb-root-snap
Can't create snapshot vm-sys-whonix-private as origin vm-sys-whonix-private-snap is not suspended. Failed to suspend qubes_dom0/pool00 with queued messages. Failed to update pool qubes_dom0/pool00.
Is sys-whonix running?
qvm-remove sys-whonix qvm-remove: error: Domain is in use: 'sys-whonix'; details in system log
See journalctl command for details. Probably it is set as a netvm to anon-whonix and whonix-ws-dvm. Anyway, removing sys-whonix shouldn't be needed to fix your problem.
I ran lvremove qubes_dom0/vm-sys-usb-root-snap Results in this error: Can't create snapshot vm-sys-whonix-private as origin vm-sys-whonix-private -snap is not suspended. Failed to suspend qubes_dom0/pool00 with queued messages. Failed to update pool qubes_dom0/pool00.
Sys- whonix is not running. I also set it to autostart false and rebooted, no change.
Journalctl provides a mountain of info. I cannot scroll up to find a root cause.
Success. I ran: sudo lvremove qubes_dom0/vm-sys-whonix-private-snap
It successfully deleted. From there all VMs can start. I rebooted qubes to be sure. Everything comes up cleanly.
Hello guys. I met the same problem. Just one hour ago I manually shutted down my Qubes 4 because it freezes during the work. After reboot I try to run any AppVM and get:
ERROR: b" Can't create snapshot vm-sys-firewall-private-15288xxxxx-back as origin vm-sys-firewall-private is not suspended.\n Failed to suspend qubes_dom0/pool00 with queued messages.\n"
Then I try to start sys-net and see following:
ERROR: b' Lobical Volume "vm-sys-net-volatile" already exists in volume group "qubes_dom0"\n'
I don't know what to do to fix it. I tried yours " sudo lvremove qubes_dom0/vm-sys-whonix-private-snap
" but, as I think (I'm noob) it is not connected with sys-net and sys-firewall in my case.
Then I tried
sudo lvremove qubes_dom0/vm-sys-firewall-private-snap
and sudo lvremove qubes_dom0/vm-sys-firewall-private-15288xxxxx-back
with no succes.
Please help me. I can't start any my AppVM and it's critically for me.
When I try:
sudo lvremove qubes_dom0/vm-sys-net-volatile
I recieve:
Can't create snapshot vm-sys-firewall-private-15288xxxxx-back as origin vm-sys-firewall-private-snap is not suspended. Failed to suspend qubes_dom0/pool00 with queued messages. Failed to update pool qubes_dom0/pool00.
Handling of unclean shutdown is already fixed with https://github.com/QubesOS/qubes-core-admin/commit/11c7b4bb512023b76cc9a70c987abceea8e0e785, which is part of qubes-core-dom0 >= 4.0.28.
It seems I hit similar problem on Thursday. I installed latest dom0 updates (including e.g. qubes-manager-4.0.26-1
; ), rebooted, tried to update all relevant templates using the new, now (after this last update) functioning, qubes-update-gui
. Update went fine (except for whonix-14 - a problem with a fix already described elsewhere), but after finishing it, while I was trying to restart all AppVMs, system (dom0) stopped responding on (Debian 9 based) network VM (sys-net-deb
) startup (used to happen in the past on each start with NIC assigned without no strict reset AND permissive mode specified - a problem already solved, but, as I assumed, maybe somehow reintroduced by some setting change resulting from the last update).
I forced (HW button) system restart and, seeing that sys-net-deb
startup on Qubes boot failed, reselected sys-net-deb
assigned devices and reenabled no-strict-reset
for these devices using Qube Manager. On the next restart I finally noticed it's not just sys-net-deb
that isn't starting.
Now, even after few succeeding restarts, no VM (except dom0
) is starting.
sys-net-deb
through sys-firewall
:Domain sys-net-deb failed to start:
Logical Volume "vm-sys-net-deb-volatile" already exists in volume group "qubes_dom0"
or its vm-sys-net-deb-root-snap
or sys-net-deb-private-snap
variant.
sys-usb
(no network, USB controller assigned) throws similar errors, but referencing itself instead of sys-net-deb
.
Domain <JustTryingToStartDomainName> has failed to start: device-mapper:message ioctl on (253:3) failed: File exists
Failed to process thin pool message "create_snap 1666 1649".
Failed to suspend qubes_dom0/pool00 with queued messages.
Any attempt to sudo lvremove qubes_dom0/vm-<SomeName>-volatile
, -root-snap
or -private-snap
to date ended unsuccessfully with:
device-mapper:message ioctl on (253:3) failed: File exists
Failed to process thin pool message "create_snap 1666 1649".
Failed to suspend qubes_dom0/pool00 with queued messages.
Failed to update pool qubes_dom0/pool00.
pool00
Data%
: 74.56, Meta%
: 50.37-private
and -private-*-back
Data%
at around 98% (I don't assume it means anything - it hasn't been run for ages and it's just an AppVM)vm-<Name>-private
and corresponding -*-back
entry sys-net-deb
, sys-usb
) there are -volatile
, -private-snap
and -root-snap
entries as well; the same applies to whonix-gw-14
(although it has no device attached (and never had), it is a template and is set to standard PVH virtualization mode) and two (random?) AppVMs (one based on Debian 9, one on Fedora 28) which I don't remember running recently, nor having ever any devices assigned-private
, -root
and root-*-back
entries@whatevrr @subvertising @marmarek Problem is apparently caused by queued metadata that LVM can't process (maybe power was lost while a snapshot was in progress).
I encountered a solution here that uses vgcfgbackup and vgcfgrestore like so:
# vgcfgbackup -f /tmp/pool0-current vg01
# cp /tmp/pool0-current /tmp/pool0-current-orig # backup the file before making changes
# vim /tmp/pool0-current # remove the message1 section in vg01 -> logical_volumes -> pool0
# vgcfgrestore -f /tmp/pool0-current vg01 force
I'm also experiencing this bug with qubes-core-dom0-4.0.39-1.fc25
.
In my case, it happens without any unexpected shut down or power loss.
Here's the journalctl
log from the latest occurrence. In my case, the problem is resolved by restarting Qubes.
Apr 23 02:28:50 dom0 qubesd[10730]: Starting sys-whonix Apr 23 02:28:51 dom0 qubesd[10730]: Starting sys-firewall Apr 23 02:28:51 dom0 qubesd[10730]: Starting sys-net Apr 23 02:28:51 dom0 qmemman.systemstate[10723]: do_balloon(memsize=577765376) Apr 23 02:28:51 dom0 qmemman.systemstate[10723]: xenfree=657469362 Apr 23 02:29:06 dom0 qmemman.daemon.reqhandler[10723]: EOF Apr 23 02:29:06 dom0 qubesd[10730]: Starting sys-whonix Apr 23 02:29:06 dom0 qubesd[10730]: Starting sys-firewall Apr 23 02:29:06 dom0 qubesd[10730]: Starting sys-net Apr 23 02:29:06 dom0 qmemman.systemstate[10723]: do_balloon(memsize=577765376) Apr 23 02:29:06 dom0 qmemman.systemstate[10723]: xenfree=657469362 Apr 23 02:29:21 dom0 qmemman.daemon.reqhandler[10723]: EOF Apr 23 02:29:31 dom0 sudo[1379]: user : TTY=pts/0 ; PWD=/home/user ; USER=root ; COMMAND=/bin/qubes-dom0-update --clean --check-only Apr 23 02:29:31 dom0 audit[1379]: USER_CMD pid=1379 uid=1000 auid=1000 ses=2 msg='cwd="/home/user" cmd=71756265732D646F6D302D757064617465202D2D636C65616E202D2D636865636B2D6F6E6C79 terminal=pts/0 res=success' Apr 23 02:29:31 dom0 kernel: audit: type=1123 audit(1556004571.296:12655): pid=1379 uid=1000 auid=1000 ses=2 msg='cwd="/home/user" cmd=71756265732D646F6D302D757064617465202D2D636C65616E202D2D636865636B2D6F6E6C79 terminal=pts/0 res=success' Apr 23 02:29:31 dom0 audit[1379]: CRED_REFR pid=1379 uid=0 auid=1000 ses=2 msg='op=PAM:setcred grantors=pam_env,pam_unix acct="root" exe="/usr/bin/sudo" hostname=? addr=? terminal=/dev/pts/0 res=success' Apr 23 02:29:31 dom0 kernel: audit: type=1110 audit(1556004571.297:12656): pid=1379 uid=0 auid=1000 ses=2 msg='op=PAM:setcred grantors=pam_env,pam_unix acct="root" exe="/usr/bin/sudo" hostname=? addr=? terminal=/dev/pts/0 res=success' Apr 23 02:29:31 dom0 sudo[1379]: pam_systemd(sudo:session): Cannot create session: Already running in a session Apr 23 02:29:31 dom0 audit[1379]: USER_START pid=1379 uid=0 auid=1000 ses=2 msg='op=PAM:session_open grantors=pam_keyinit,pam_limits,pam_keyinit,pam_limits,pam_systemd,pam_unix acct="root" exe="/usr/bin/sudo" hostname=? addr=? terminal=/dev/pts/0 res=success' Apr 23 02:29:31 dom0 sudo[1379]: pam_unix(sudo:session): session opened for user root by (uid=0) Apr 23 02:29:31 dom0 kernel: audit: type=1105 audit(1556004571.299:12657): pid=1379 uid=0 auid=1000 ses=2 msg='op=PAM:session_open grantors=pam_keyinit,pam_limits,pam_keyinit,pam_limits,pam_systemd,pam_unix acct="root" exe="/usr/bin/sudo" hostname=? addr=? terminal=/dev/pts/0 res=success' Apr 23 02:29:32 dom0 sudo[1379]: pam_unix(sudo:session): session closed for user root Apr 23 02:29:32 dom0 audit[1379]: USER_END pid=1379 uid=0 auid=1000 ses=2 msg='op=PAM:session_close grantors=pam_keyinit,pam_limits,pam_keyinit,pam_limits,pam_systemd,pam_unix acct="root" exe="/usr/bin/sudo" hostname=? addr=? terminal=/dev/pts/0 res=success' Apr 23 02:29:32 dom0 audit[1379]: CRED_DISP pid=1379 uid=0 auid=1000 ses=2 msg='op=PAM:setcred grantors=pam_env,pam_unix acct="root" exe="/usr/bin/sudo" hostname=? addr=? terminal=/dev/pts/0 res=success' Apr 23 02:29:32 dom0 kernel: audit: type=1106 audit(1556004572.244:12658): pid=1379 uid=0 auid=1000 ses=2 msg='op=PAM:session_close grantors=pam_keyinit,pam_limits,pam_keyinit,pam_limits,pam_systemd,pam_unix acct="root" exe="/usr/bin/sudo" hostname=? addr=? terminal=/dev/pts/0 res=success' Apr 23 02:29:32 dom0 kernel: audit: type=1104 audit(1556004572.244:12659): pid=1379 uid=0 auid=1000 ses=2 msg='op=PAM:setcred grantors=pam_env,pam_unix acct="root" exe="/usr/bin/sudo" hostname=? addr=? terminal=/dev/pts/0 res=success' Apr 23 02:29:33 dom0 qubesd[10730]: Starting fedora-29 Apr 23 02:29:33 dom0 qmemman.systemstate[10723]: do_balloon(memsize=426770432) Apr 23 02:29:33 dom0 qmemman.systemstate[10723]: xenfree=657469362 Apr 23 02:29:33 dom0 qubesd[10730]: Task exception was never retrieved Apr 23 02:29:33 dom0 qubesd[10730]: future:exception=StoragePoolException(' Logical volume qubes_dom0/vm-sys-net-root-snap in use.',)> Apr 23 02:29:33 dom0 qubesd[10730]: Traceback (most recent call last): Apr 23 02:29:33 dom0 qubesd[10730]: File "/usr/lib64/python3.5/asyncio/tasks.py", line 240, in _step Apr 23 02:29:33 dom0 qubesd[10730]: result = coro.send(None) Apr 23 02:29:33 dom0 qubesd[10730]: File "/usr/lib/python3.5/site-packages/qubes/storage/lvm.py", line 283, in wrapper Apr 23 02:29:33 dom0 qubesd[10730]: return (yield from method(self, *args, **kwargs)) Apr 23 02:29:33 dom0 qubesd[10730]: File "/usr/lib/python3.5/site-packages/qubes/storage/lvm.py", line 681, in stop Apr 23 02:29:33 dom0 qubesd[10730]: yield from qubes_lvm_coro(cmd, self.log) Apr 23 02:29:33 dom0 qubesd[10730]: File "/usr/lib/python3.5/site-packages/qubes/storage/lvm.py", line 814, in qubes_lvm_coro Apr 23 02:29:33 dom0 qubesd[10730]: return _process_lvm_output(p.returncode, out, err, log) Apr 23 02:29:33 dom0 qubesd[10730]: File "/usr/lib/python3.5/site-packages/qubes/storage/lvm.py", line 787, in _process_lvm_output Apr 23 02:29:33 dom0 qubesd[10730]: raise qubes.storage.StoragePoolException(err) Apr 23 02:29:33 dom0 qubesd[10730]: qubes.storage.StoragePoolException: Logical volume qubes_dom0/vm-sys-net-root-snap in use. Apr 23 02:29:33 dom0 qubesd[10730]: Task exception was never retrieved Apr 23 02:29:33 dom0 qubesd[10730]: future: exception=StoragePoolException(' Logical volume qubes_dom0/vm-sys-net-volatile in use.',)> Apr 23 02:29:33 dom0 qubesd[10730]: Traceback (most recent call last): Apr 23 02:29:33 dom0 qubesd[10730]: File "/usr/lib64/python3.5/asyncio/tasks.py", line 240, in _step Apr 23 02:29:33 dom0 qubesd[10730]: result = coro.send(None) Apr 23 02:29:33 dom0 qubesd[10730]: File "/usr/lib/python3.5/site-packages/qubes/storage/lvm.py", line 283, in wrapper Apr 23 02:29:33 dom0 qubesd[10730]: return (yield from method(self, *args, **kwargs)) Apr 23 02:29:33 dom0 qubesd[10730]: File "/usr/lib/python3.5/site-packages/qubes/storage/lvm.py", line 684, in stop Apr 23 02:29:33 dom0 qubesd[10730]: yield from qubes_lvm_coro(cmd, self.log) Apr 23 02:29:33 dom0 qubesd[10730]: File "/usr/lib/python3.5/site-packages/qubes/storage/lvm.py", line 814, in qubes_lvm_coro Apr 23 02:29:33 dom0 qubesd[10730]: return _process_lvm_output(p.returncode, out, err, log) Apr 23 02:29:33 dom0 qubesd[10730]: File "/usr/lib/python3.5/site-packages/qubes/storage/lvm.py", line 787, in _process_lvm_output Apr 23 02:29:33 dom0 qubesd[10730]: raise qubes.storage.StoragePoolException(err) Apr 23 02:29:33 dom0 qubesd[10730]: qubes.storage.StoragePoolException: Logical volume qubes_dom0/vm-sys-net-volatile in use. Apr 23 02:29:33 dom0 qubesd[10730]: Task exception was never retrieved Apr 23 02:29:33 dom0 qubesd[10730]: future: exception=StoragePoolException(' Logical Volume "vm-sys-net-root-snap" already exists in volume group "qubes_dom0"',)> Apr 23 02:29:33 dom0 qubesd[10730]: Traceback (most recent call last): Apr 23 02:29:33 dom0 qubesd[10730]: File "/usr/lib64/python3.5/asyncio/tasks.py", line 240, in _step Apr 23 02:29:33 dom0 qubesd[10730]: result = coro.send(None) Apr 23 02:29:33 dom0 qubesd[10730]: File "/usr/lib/python3.5/site-packages/qubes/storage/lvm.py", line 283, in wrapper Apr 23 02:29:33 dom0 qubesd[10730]: return (yield from method(self, *args, **kwargs)) Apr 23 02:29:33 dom0 qubesd[10730]: File "/usr/lib/python3.5/site-packages/qubes/storage/lvm.py", line 666, in start Apr 23 02:29:33 dom0 qubesd[10730]: yield from self._snapshot() Apr 23 02:29:33 dom0 qubesd[10730]: File "/usr/lib/python3.5/site-packages/qubes/storage/lvm.py", line 657, in _snapshot Apr 23 02:29:33 dom0 qubesd[10730]: yield from qubes_lvm_coro(cmd, self.log) Apr 23 02:29:33 dom0 qubesd[10730]: File "/usr/lib/python3.5/site-packages/qubes/storage/lvm.py", line 814, in qubes_lvm_coro Apr 23 02:29:33 dom0 qubesd[10730]: return _process_lvm_output(p.returncode, out, err, log) Apr 23 02:29:33 dom0 qubesd[10730]: File "/usr/lib/python3.5/site-packages/qubes/storage/lvm.py", line 787, in _process_lvm_output Apr 23 02:29:33 dom0 qubesd[10730]: raise qubes.storage.StoragePoolException(err) Apr 23 02:29:33 dom0 qubesd[10730]: qubes.storage.StoragePoolException: Logical Volume "vm-sys-net-root-snap" already exists in volume group "qubes_dom0" Apr 23 02:29:33 dom0 qubesd[10730]: Task exception was never retrieved Apr 23 02:29:33 dom0 qubesd[10730]: future: exception=StoragePoolException(' Logical Volume "vm-sys-net-volatile" already exists in volume group "qubes_dom0"',)> Apr 23 02:29:33 dom0 qubesd[10730]: Traceback (most recent call last): Apr 23 02:29:33 dom0 qubesd[10730]: File "/usr/lib64/python3.5/asyncio/tasks.py", line 240, in _step Apr 23 02:29:33 dom0 qubesd[10730]: result = coro.send(None) Apr 23 02:29:33 dom0 qubesd[10730]: File "/usr/lib/python3.5/site-packages/qubes/storage/lvm.py", line 283, in wrapper Apr 23 02:29:33 dom0 qubesd[10730]: return (yield from method(self, *args, **kwargs)) Apr 23 02:29:33 dom0 qubesd[10730]: File "/usr/lib/python3.5/site-packages/qubes/storage/lvm.py", line 668, in start Apr 23 02:29:33 dom0 qubesd[10730]: yield from self._reset() Apr 23 02:29:33 dom0 qubesd[10730]: File "/usr/lib/python3.5/site-packages/qubes/storage/lvm.py", line 367, in _reset Apr 23 02:29:33 dom0 qubesd[10730]: yield from qubes_lvm_coro(cmd, self.log) Apr 23 02:29:33 dom0 qubesd[10730]: File "/usr/lib/python3.5/site-packages/qubes/storage/lvm.py", line 814, in qubes_lvm_coro Apr 23 02:29:33 dom0 qubesd[10730]: return _process_lvm_output(p.returncode, out, err, log) Apr 23 02:29:33 dom0 qubesd[10730]: File "/usr/lib/python3.5/site-packages/qubes/storage/lvm.py", line 787, in _process_lvm_output Apr 23 02:29:33 dom0 qubesd[10730]: raise qubes.storage.StoragePoolException(err) Apr 23 02:29:33 dom0 qubesd[10730]: qubes.storage.StoragePoolException: Logical Volume "vm-sys-net-volatile" already exists in volume group "qubes_dom0" Apr 23 02:29:33 dom0 qubesd[10730]: Task exception was never retrieved Apr 23 02:29:33 dom0 qubesd[10730]: future: exception=StoragePoolException(' Logical Volume "vm-sys-net-private-snap" already exists in volume group "qubes_dom0"',)> Apr 23 02:29:33 dom0 qubesd[10730]: Traceback (most recent call last): Apr 23 02:29:33 dom0 qubesd[10730]: File "/usr/lib64/python3.5/asyncio/tasks.py", line 240, in _step Apr 23 02:29:33 dom0 qubesd[10730]: result = coro.send(None) Apr 23 02:29:33 dom0 qubesd[10730]: File "/usr/lib/python3.5/site-packages/qubes/storage/lvm.py", line 283, in wrapper Apr 23 02:29:33 dom0 qubesd[10730]: return (yield from method(self, *args, **kwargs)) Apr 23 02:29:33 dom0 qubesd[10730]: File "/usr/lib/python3.5/site-packages/qubes/storage/lvm.py", line 666, in start Apr 23 02:29:33 dom0 qubesd[10730]: yield from self._snapshot() Apr 23 02:29:33 dom0 qubesd[10730]: File "/usr/lib/python3.5/site-packages/qubes/storage/lvm.py", line 657, in _snapshot Apr 23 02:29:33 dom0 qubesd[10730]: yield from qubes_lvm_coro(cmd, self.log) Apr 23 02:29:33 dom0 qubesd[10730]: File "/usr/lib/python3.5/site-packages/qubes/storage/lvm.py", line 814, in qubes_lvm_coro Apr 23 02:29:33 dom0 qubesd[10730]: return _process_lvm_output(p.returncode, out, err, log) Apr 23 02:29:33 dom0 qubesd[10730]: File "/usr/lib/python3.5/site-packages/qubes/storage/lvm.py", line 787, in _process_lvm_output Apr 23 02:29:33 dom0 qubesd[10730]: raise qubes.storage.StoragePoolException(err) Apr 23 02:29:33 dom0 qubesd[10730]: qubes.storage.StoragePoolException: Logical Volume "vm-sys-net-private-snap" already exists in volume group "qubes_dom0" Apr 23 02:29:33 dom0 qubesd[10730]: Task exception was never retrieved Apr 23 02:29:33 dom0 qubesd[10730]: future: exception=StoragePoolException(' Logical Volume "vm-sys-net-root-snap" already exists in volume group "qubes_dom0"',)> Apr 23 02:29:33 dom0 qubesd[10730]: Traceback (most recent call last): Apr 23 02:29:33 dom0 qubesd[10730]: File "/usr/lib64/python3.5/asyncio/tasks.py", line 240, in _step Apr 23 02:29:33 dom0 qubesd[10730]: result = coro.send(None) Apr 23 02:29:33 dom0 qubesd[10730]: File "/usr/lib/python3.5/site-packages/qubes/storage/lvm.py", line 283, in wrapper Apr 23 02:29:33 dom0 qubesd[10730]: return (yield from method(self, *args, **kwargs)) Apr 23 02:29:33 dom0 qubesd[10730]: File "/usr/lib/python3.5/site-packages/qubes/storage/lvm.py", line 666, in start Apr 23 02:29:33 dom0 qubesd[10730]: yield from self._snapshot() Apr 23 02:29:33 dom0 qubesd[10730]: File "/usr/lib/python3.5/site-packages/qubes/storage/lvm.py", line 657, in _snapshot Apr 23 02:29:33 dom0 qubesd[10730]: yield from qubes_lvm_coro(cmd, self.log) Apr 23 02:29:33 dom0 qubesd[10730]: File "/usr/lib/python3.5/site-packages/qubes/storage/lvm.py", line 814, in qubes_lvm_coro Apr 23 02:29:33 dom0 qubesd[10730]: return _process_lvm_output(p.returncode, out, err, log) Apr 23 02:29:33 dom0 qubesd[10730]: File "/usr/lib/python3.5/site-packages/qubes/storage/lvm.py", line 787, in _process_lvm_output Apr 23 02:29:33 dom0 qubesd[10730]: raise qubes.storage.StoragePoolException(err) Apr 23 02:29:33 dom0 qubesd[10730]: qubes.storage.StoragePoolException: Logical Volume "vm-sys-net-root-snap" already exists in volume group "qubes_dom0" Apr 23 02:29:33 dom0 qubesd[10730]: Task exception was never retrieved Apr 23 02:29:33 dom0 qubesd[10730]: future: exception=StoragePoolException(' Logical Volume "vm-sys-net-volatile" already exists in volume group "qubes_dom0"',)> Apr 23 02:29:33 dom0 qubesd[10730]: Traceback (most recent call last): Apr 23 02:29:33 dom0 qubesd[10730]: File "/usr/lib64/python3.5/asyncio/tasks.py", line 240, in _step Apr 23 02:29:33 dom0 qubesd[10730]: result = coro.send(None) Apr 23 02:29:33 dom0 qubesd[10730]: File "/usr/lib/python3.5/site-packages/qubes/storage/lvm.py", line 283, in wrapper Apr 23 02:29:33 dom0 qubesd[10730]: return (yield from method(self, *args, **kwargs)) Apr 23 02:29:33 dom0 qubesd[10730]: File "/usr/lib/python3.5/site-packages/qubes/storage/lvm.py", line 668, in start Apr 23 02:29:33 dom0 qubesd[10730]: yield from self._reset() Apr 23 02:29:33 dom0 qubesd[10730]: File "/usr/lib/python3.5/site-packages/qubes/storage/lvm.py", line 367, in _reset Apr 23 02:29:33 dom0 qubesd[10730]: yield from qubes_lvm_coro(cmd, self.log) Apr 23 02:29:33 dom0 qubesd[10730]: File "/usr/lib/python3.5/site-packages/qubes/storage/lvm.py", line 814, in qubes_lvm_coro Apr 23 02:29:33 dom0 qubesd[10730]: return _process_lvm_output(p.returncode, out, err, log) Apr 23 02:29:33 dom0 qubesd[10730]: File "/usr/lib/python3.5/site-packages/qubes/storage/lvm.py", line 787, in _process_lvm_output Apr 23 02:29:33 dom0 qubesd[10730]: raise qubes.storage.StoragePoolException(err) Apr 23 02:29:33 dom0 qubesd[10730]: qubes.storage.StoragePoolException: Logical Volume "vm-sys-net-volatile" already exists in volume group "qubes_dom0" Apr 23 02:29:33 dom0 qubesd[10730]: Task exception was never retrieved Apr 23 02:29:33 dom0 qubesd[10730]: future: exception=StoragePoolException(' Logical Volume "vm-sys-net-private-snap" already exists in volume group "qubes_dom0"',)> Apr 23 02:29:33 dom0 qubesd[10730]: Traceback (most recent call last): Apr 23 02:29:33 dom0 qubesd[10730]: File "/usr/lib64/python3.5/asyncio/tasks.py", line 240, in _step Apr 23 02:29:33 dom0 qubesd[10730]: result = coro.send(None) Apr 23 02:29:33 dom0 qubesd[10730]: File "/usr/lib/python3.5/site-packages/qubes/storage/lvm.py", line 283, in wrapper Apr 23 02:29:33 dom0 qubesd[10730]: return (yield from method(self, *args, **kwargs)) Apr 23 02:29:33 dom0 qubesd[10730]: File "/usr/lib/python3.5/site-packages/qubes/storage/lvm.py", line 666, in start Apr 23 02:29:33 dom0 qubesd[10730]: yield from self._snapshot() Apr 23 02:29:33 dom0 qubesd[10730]: File "/usr/lib/python3.5/site-packages/qubes/storage/lvm.py", line 657, in _snapshot Apr 23 02:29:33 dom0 qubesd[10730]: yield from qubes_lvm_coro(cmd, self.log) Apr 23 02:29:33 dom0 qubesd[10730]: File "/usr/lib/python3.5/site-packages/qubes/storage/lvm.py", line 814, in qubes_lvm_coro Apr 23 02:29:33 dom0 qubesd[10730]: return _process_lvm_output(p.returncode, out, err, log) Apr 23 02:29:33 dom0 qubesd[10730]: File "/usr/lib/python3.5/site-packages/qubes/storage/lvm.py", line 787, in _process_lvm_output Apr 23 02:29:33 dom0 qubesd[10730]: raise qubes.storage.StoragePoolException(err) Apr 23 02:29:33 dom0 qubesd[10730]: qubes.storage.StoragePoolException: Logical Volume "vm-sys-net-private-snap" already exists in volume group "qubes_dom0" Apr 23 02:29:33 dom0 dmeventd[2171]: No longer monitoring thin pool qubes_dom0-pool00-tpool. Apr 23 02:29:33 dom0 lvm[2171]: Monitoring thin pool qubes_dom0-pool00-tpool. Apr 23 02:29:34 dom0 dmeventd[2171]: No longer monitoring thin pool qubes_dom0-pool00-tpool. Apr 23 02:29:34 dom0 lvm[2171]: Monitoring thin pool qubes_dom0-pool00-tpool.
This seem to be happening more often now. My laptop can't go more than a few days without a reboot before hitting this. Here's what happens when I try lvremove
:
$ sudo lvremove -vf qubes_dom0/vm-sys-net-root-snap
Logical volume qubes_dom0/vm-sys-net-root-snap in use.
$ sudo lvremove -vf qubes_dom0/vm-sys-net-private-snap
Logical volume qubes_dom0/vm-sys-net-private-snap in use.
The only workaround I've found is to reboot.
On Tue, Jun 11, 2019 at 09:05:28PM -0700, Andrew David Wong wrote:
This seem to be happening more often now. My laptop can't go more than a few days without a reboot before hitting this. Here's what happens when I try
lvmremove
:$ sudo lvremove -vf qubes_dom0/vm-sys-net-root-snap Logical volume qubes_dom0/vm-sys-net-root-snap in use. $ sudo lvremove -vf qubes_dom0/vm-sys-net-private-snap Logical volume qubes_dom0/vm-sys-net-private-snap in use.
The only workaround I've found is to reboot.
Is that full uptime, or are you sleeping during this time? Are you running "testing"?
Lets try to find out what is using those volumes. Is there any related VM still visible on xl list
? Is there any VM with (null)
name? Is any process in dom0 related to this VM still running (ps aux | grep VMNAME
)?
Is that full uptime, or are you sleeping during this time?
Not full uptime; frequent sleeping periods throughout.
Are you running "testing"?
Just security-testing.
Lets try to find out what is using those volumes. Is there any related VM still visible on
xl list
? Is there any VM with(null)
name? Is any process in dom0 related to this VM still running (ps aux | grep VMNAME
)?
I'll check this the next it happens. Thanks!
Lets try to find out what is using those volumes.
It just happened again, so I was able to check this.
Is there any related VM still visible on
xl list
?
Yes:
Name ID Mem VCPUs State Time(s)
Domain-0 0 4080 2 r----- 73330.0
sys-net 1981 400 1 --p--- 0.0
sys-net-dm 1982 144 1 -b---- 3.1
Is there any VM with
(null)
name?
No.
Is any process in dom0 related to this VM still running (
ps aux | grep VMNAME
)?
Yes:
root 12574 0.0 0.1 160576 6184 ? SLsl 02:33 0:02 /usr/bin/qubes-guid -N sys-net -c 0xcc0000 -i /usr/share/icons/hicolor/128x128/devices/appvm-red.png -l 1 -q -d 1644 -n -f
I was able to use xl destroy
to destroy sys-net
. This allowed me to start sys-net
normally with Qubes tools. It connected to my Wi-Fi network and worked normally. Other VMs that depend on sys-net
were able to start normally after that.
However, after using xl destroy
on sys-net
, I was not able to shut down any VMs normally with Qubes tools. (They would go down to 0 MB in the Qubes Domains widget but not fully halt. The "Shutdown" button was grayed out in Qube Manager. Starting the VM again would halt it, then restart it.) So, I had to do a full reboot of the whole machine anyway.
sys-net 1981 400 1 --p--- 0.0 sys-net-dm 1982 144 1 -b---- 3.1
Ok, so sys-net
is paused, not powered off. It isn't surprising that starting it again fails - even if that wouldn't be LVM, then later it would fails as it's already started. There are two situations where VMs are paused in qubes:
sys-net
should be suspended, but if that fails (which is a problem itself), it fallback to pause. You can easily spot this situation, as suspend takes significantly longer in this case - it involves 30-60s timeout.But neither of those situations should allow the same VM to be started again. I've tried to reproduce similar thing manually and in all cases I've got proper "domains ... is already running" error when trying to start such paused VM. Can you provide more details what actions leads to this error? Check also if you see some errors related to libvirtd in journalctl (sudo journalctl -u libvirtd
) between last successful VM start and the failed one.
So, I see two (probably related) issues here:
sys-net
remains pausedAs for xl destroy
, use virsh -c xen destroy
, details here.
Can you provide more details what actions leads to this error?
It pretty reliably starts happening after ~4 days of uptime. I just use the system normally, starting and stopping VMs, suspending and resuming a few times a day, Updating TemplateVMs, shutting all VMs down and restarting them, etc. One non-default thing about my setup is that I used "named"/"static" DisposableVMs for sys-net
and sys-firewall
.
Check also if you see some errors related to libvirtd in journalctl (
sudo journalctl -u libvirtd
) between last successful VM start and the failed one.
Ok, I'll check this the next time it happens.
As for
xl destroy
, usevirsh -c xen destroy
Will do, thanks. Should I expect to be able to use this in order to destroy sys-net
when it gets stuck as a workaround to avoid having to reboot the whole system?
It pretty reliably starts happening after ~4 days of uptime.
Does it happen right after suspend? Or maybe while restarting all the VMs (including sys-net)?
Should I expect to be able to use this in order to destroy
sys-net
when it gets stuck as a workaround to avoid having to reboot the whole system?
Yes.
Does it happen right after suspend?
It doesn't seem related to suspending. Perhaps it sometimes happens right after resuming from suspend, but certainly not only after resuming from suspend.
Or maybe while restarting all the VMs (including sys-net)?
Yes, it usually seems to happen when I shut down all VMs including sys-net
, then try to start some back up. (Well, I usually shut down all other VMs when I want to restart sys-net
anyway.)
As for
xl destroy
, usevirsh -c xen destroy
Unfortunately, this did not work for me either:
[user@dom0 ~]$ xl list
Name ID Mem VCPUs State Time(s)
Domain-0 0 4080 2 r----- 101097.1
sys-net 1953 400 1 --p--- 0.0
sys-net-dm 1954 144 1 -b---- 314.5
[user@dom0 ~]$ virsh -c xen destroy sys-net
error: Failed to destroy domain sys-net
error: Requested operation is not valid: Domain is not running
[user@dom0 ~]$ virsh -c xen destroy sys-net-dm
error: failed to get domain 'sys-net-dm'
error: Domain not found
[user@dom0 ~]$ xl list
Name ID Mem VCPUs State Time(s)
Domain-0 0 4080 2 r----- 101098.5
sys-net 1953 400 1 --p--- 0.0
sys-net-dm 1954 144 1 -b---- 314.7
Check /var/log/libvirt/libxl/libxl-driver.log
about the time of sys-net start that lead to this state (so, the failed one, before the message from this issue title). Do you see anything similar to https://github.com/QubesOS/qubes-issues/issues/3322 ?
Check
/var/log/libvirt/libxl/libxl-driver.log
about the time of sys-net start that lead to this state (so, the failed one, before the message from this issue title). Do you see anything similar to #3322 ?
The log is no longer available when it last occurred (probably because I had to reboot), but I'll check this the next time it happens.
Not even rotated one (with date suffix)?
FWIW, the problem has not returned for me since February. I don't use named dispvms, however, and my suspend activity is lighter than Andrews. My uptime can vary a lot, but rarely exceed 5 days.
I have a few troubleshooting recommendations for Andrew:
If you revert back to normal sys-net and sys-firewall, does the error persist?
Does using vgcfgrestore help?
Try hardware/RAM diagnostics. Then follow that up with something like rpm -Va
to verify all package contents in the system (maybe restrict this to qubes* packages at first).
I also meant to include this:
lvs
output, is the Meta% for pool00 high? Also try booting from a rescue disk, unlock your system partition then run thin_check /dev/mapper/qubes_dom0-pool00_tmeta
.Not even rotated one (with date suffix)?
Ah, indeed. I don't remember exactly when it happened, but here are all the unusual lines:
2019-06-21 03:28:16.834+0000: libxl: libxl_device.c:1093:device_backend_callback: unable to remove device with path /local/domain/1535/backend/vif/1540/0
2019-06-21 03:28:16.845+0000: libxl: libxl.c:1669:devices_destroy_cb: libxl__devices_destroy failed for 1540
2019-06-21 03:29:02.142+0000: libxl: libxl_linux.c:155:libxl__loopdev_cleanup: unable to release device /dev/loop0: No such device or address
2019-06-21 03:29:02.345+0000: libxl: libxl_linux.c:155:libxl__loopdev_cleanup: unable to release device /dev/loop0: No such device or address
2019-06-21 14:30:29.833+0000: libxl: libxl_linux.c:155:libxl__loopdev_cleanup: unable to release device /dev/loop0: No such device or address
2019-06-21 14:30:30.063+0000: libxl: libxl_linux.c:155:libxl__loopdev_cleanup: unable to release device /dev/loop0: No such device or address
2019-06-21 14:46:41.539+0000: libxl: libxl.c:1853:libxl_console_get_tty: unable to read console tty path `/local/domain/1939/console/tty': Resource temporarily unavailable
2019-06-21 23:33:08.859+0000: libxl: libxl.c:1853:libxl_console_get_tty: unable to read console tty path `/local/domain/1946/console/tty': Resource temporarily unavailable
2019-06-22 09:55:51.455+0000: libxl: libxl_device.c:1093:device_backend_callback: unable to remove device with path /local/domain/1938/backend/vif/1942/0
2019-06-22 09:55:51.461+0000: libxl: libxl.c:1669:devices_destroy_cb: libxl__devices_destroy failed for 1942
2019-06-22 09:55:52.903+0000: libxl: libxl_device.c:1093:device_backend_callback: unable to remove device with path /local/domain/1938/backend/vif/1944/0
2019-06-22 09:55:52.911+0000: libxl: libxl.c:1669:devices_destroy_cb: libxl__devices_destroy failed for 1944
2019-06-22 09:56:40.245+0000: libxl: libxl_linux.c:155:libxl__loopdev_cleanup: unable to release device /dev/loop0: No such device or address
2019-06-22 09:56:40.472+0000: libxl: libxl_linux.c:155:libxl__loopdev_cleanup: unable to release device /dev/loop0: No such device or address
2019-06-22 20:39:16.401+0000: libxl: libxl_pci.c:1176:libxl__device_pci_reset: The kernel doesn't support reset from sysfs for PCI device 0000:00:14.0
2019-06-22 20:40:33.078+0000: libxl: libxl_pci.c:1176:libxl__device_pci_reset: The kernel doesn't support reset from sysfs for PCI device 0000:00:14.0
2019-06-22 20:40:33.200+0000: libxl: libxl_pci.c:1494:do_pci_remove: xc_physdev_unmap_pirq irq=16: Invalid argument
2019-06-23 06:59:17.419+0000: libxl: libxl_device.c:1093:device_backend_callback: unable to remove device with path /local/domain/5/backend/vif/36/0
2019-06-23 06:59:17.455+0000: libxl: libxl.c:1669:devices_destroy_cb: libxl__devices_destroy failed for 36
2019-06-23 06:59:19.200+0000: libxl: libxl_device.c:1093:device_backend_callback: unable to remove device with path /local/domain/5/backend/vif/32/0
2019-06-23 06:59:19.204+0000: libxl: libxl.c:1669:devices_destroy_cb: libxl__devices_destroy failed for 32
2019-06-23 06:59:20.475+0000: libxl: libxl_device.c:1093:device_backend_callback: unable to remove device with path /local/domain/5/backend/vif/41/0
2019-06-23 06:59:20.485+0000: libxl: libxl.c:1669:devices_destroy_cb: libxl__devices_destroy failed for 41
2019-06-23 06:59:20.758+0000: libxl: libxl_device.c:1093:device_backend_callback: unable to remove device with path /local/domain/5/backend/vif/38/0
2019-06-23 06:59:20.766+0000: libxl: libxl.c:1669:devices_destroy_cb: libxl__devices_destroy failed for 38
2019-06-23 07:00:13.674+0000: libxl: libxl_linux.c:155:libxl__loopdev_cleanup: unable to release device /dev/loop0: No such device or address
2019-06-23 07:00:13.756+0000: libxl: libxl_linux.c:155:libxl__loopdev_cleanup: unable to release device /dev/loop0: No such device or address
I have a few troubleshooting recommendations for Andrew: [...]
Thanks for the tips, Chris! I'll try these next if the logs above don't reveal anything.
Thin pool health: In
lvs
output, is the Meta% for pool00 high?
It's around 47%, so it appears to be fine.
Check
/var/log/libvirt/libxl/libxl-driver.log
about the time of sys-net start that lead to this state (so, the failed one, before the message from this issue title). Do you see anything similar to #3322 ?
I've been monitoring /var/log/libvirt/libxl/libxl-driver.log
when this problem occurs. I do not see anything similar to #3322.
At the time when sys-net
fails to start, I just see lines like this:
2019-06-28 09:40:27.689+0000: libxl: libxl.c:1853:libxl_console_get_tty: unable to read console tty path `/local/domain/1960/console/tty': Resource temporarily unavailable
2019-06-28 09:42:30.240+0000: libxl: libxl.c:1853:libxl_console_get_tty: unable to read console tty path `/local/domain/1961/console/tty': Resource temporarily unavailable
2019-06-28 09:43:57.060+0000: libxl: libxl.c:1853:libxl_console_get_tty: unable to read console tty path `/local/domain/1962/console/tty': Resource temporarily unavailable
One odd thing about this bug is that it consistently occurs at ~5 days of uptime. If I just always make sure to reboot dom0 before reaching ~5 days of uptime, I can consistently avoid the bug.
Occurred again after 4d22h of uptime. This time I caught the initial error message about being unable to start sys-net
:
Cannot recv data: Connection reset by peer
This led me to #990, in which @marmarek said to check /var/log/messages
. Unfortunately, mine was empty.
Still nothing different in /var/log/libvirt/libxl/libxl-driver.log
(same as posted above).
I don't think it's a hardware problem due to the 5d uptime consistency. I'm going to try reverting back to normal sys-net
and sys-firewall
now.
Check libvirt related errors in journalctl
Check libvirt related errors in journalctl
Yeah, I'm seeing a lot of libvirt errors in here:
Something gone really bad there:
Jul 13 17:19:54 dom0 kernel: libvirtd[3341]: segfault at b3 ip 00000000000000b3 sp 00007ffea31f99a0 error 14 in libvirtd[5605bb8ea000+98000]
Jul 13 17:19:54 dom0 kernel: Code: Bad RIP value.
Jul 13 17:19:54 dom0 systemd[1]: Started Process Core Dump (PID 25968/UID 0).
Jul 13 17:19:55 dom0 systemd[1]: libvirtd.service: Main process exited, code=dumped, status=11/SEGV
Jul 13 17:19:55 dom0 systemd[1]: libvirtd.service: Unit entered failed state.
Jul 13 17:19:55 dom0 systemd[1]: libvirtd.service: Failed with result 'core-dump'.
Jul 13 17:19:56 dom0 systemd-coredump[25969]: Process 3341 (libvirtd) of user 0 dumped core.
Stack trace of thread 3341:
#0 0x00000000000000b3 n/a (n/a)
Looks like some memory corruption bug within libvirt. I don't have high hopes, but could you try extract full backtrace using coredumpctl info
command?
Looks like some memory corruption bug within libvirt. I don't have high hopes, but could you try extract full backtrace using
coredumpctl info
command?
There is indeed a coredumpctl info
entry at the same timestamp:
PID: 3341 (libvirtd)
UID: 0 (root)
GID: 0 (root)
Signal: 11 (SEGV)
Timestamp: Sat 2019-07-13 17:19:54 CDT (10h ago)
Command Line: /usr/sbin/libvirtd
Executable: /usr/sbin/libvirtd
Control Group: /system.slice/libvirtd.service
Unit: libvirtd.service
Slice: system.slice
Boot ID: e0e01a54ec7a41c4a893029e6e7b5864
Machine ID: c23c70a11dcc4184b8ea44ab38623af3
Hostname: dom0
Coredump: /var/lib/systemd/coredump/core.libvirtd.0.e0e01a54ec7a41c4a893029e6e7b5864.3341.1563056394000000000000.lz4
Message: Process 3341 (libvirtd) of user 0 dumped core.
Stack trace of thread 3341:
#0 0x00000000000000b3 n/a (n/a)
But I guess this doesn't look like it includes a full backtrace.
I'm going to try reverting back to normal
sys-net
andsys-firewall
now.
This made no noticeable difference. Still experiencing the problem the same as before.
Then follow that up with something like
rpm -Va
to verify all package contents in the system (maybe restrict this to qubes* packages at first).
Should I be concerned about the missing
or L
(mismatch occurs in a link) entries?
$ sudo rpm -Va
.M....GT. /var/log/salt/minion
..5....T. /usr/share/themes/Ops/xfwm4/menu-active.xpm
..5....T. /usr/share/themes/Ops/xfwm4/menu-inactive.xpm
S.5....T. /usr/share/themes/Ops/xfwm4/themerc
S.5....T. c /etc/yum.conf
....L.... /usr/bin/X
......G.. c /etc/qubes-rpc/policy/whonix.GatewayCommand
......G.. c /etc/qubes-rpc/policy/whonix.NewStatus
......G.. c /etc/qubes-rpc/policy/whonix.SdwdateStatus
S.5....T. c /etc/qubes-rpc/policy/qubes.InputMouse
S.5....T. /etc/xdg/xfce4/xfconf/xfce-perchannel-xml/xfce4-session.xml
.......T. /lib/modules/4.19.43-1.pvops.qubes.x86_64/modules.builtin.bin
.......T. /lib/modules/4.19.43-1.pvops.qubes.x86_64/modules.devname
.......T. /lib/modules/4.19.43-1.pvops.qubes.x86_64/modules.softdep
SM5....T. c /etc/qubes-rpc/policy/qubes.Filecopy
.......T. c /etc/qubes-rpc/policy/qubes.GetDate
S.5....T. c /etc/qubes-rpc/policy/qubes.OpenInVM
S.5....T. c /etc/qubes-rpc/policy/qubes.OpenURL
S.5....T. c /etc/qubes-rpc/policy/qubes.StartApp
S.5....T. c /etc/qubes-rpc/policy/qubes.UpdatesProxy
.M.....T. c /etc/qubes-rpc/policy/qubes.VMRootShell
SM5....T. c /etc/qubes-rpc/policy/qubes.VMShell
S.5....T. c /etc/qubes/qmemman.conf
.M....... /var/run/qubes
.M....G.. /etc/xen
S.5....T. c /etc/xen/xl.conf
.M....G.. /var/lib/xen
.M....G.. /var/log/xen
.M....G.. /var/log/xen/console
S.5....T. /etc/xdg/xfce4/xfconf/xfce-perchannel-xml/xfce4-power-manager.xml
.M....... c /etc/grub.d/10_linux
S.5....T. c /etc/plymouth/plymouthd.conf
....L.... c /etc/pam.d/fingerprint-auth
....L.... c /etc/pam.d/password-auth
....L.... c /etc/pam.d/postlogin
....L.... c /etc/pam.d/smartcard-auth
....L.... c /etc/pam.d/system-auth
S.5....T. /usr/lib/python3.5/site-packages/urllib3/packages/__pycache__/six.cpython-35.pyc
.......T. c /etc/PackageKit/PackageKit.conf
.......T. /lib/modules/4.14.103-1.pvops.qubes.x86_64/modules.builtin.bin
.......T. /lib/modules/4.14.103-1.pvops.qubes.x86_64/modules.devname
.......T. /lib/modules/4.14.103-1.pvops.qubes.x86_64/modules.softdep
S.5....T. c /etc/xdg/xfce4/xfconf/xfce-perchannel-xml/xfce4-keyboard-shortcuts.xml
S.5....T. c /etc/xdg/xfce4/panel/default.xml
S.5....T. c /etc/qubes-rpc/policy/qubes.Gpg
S.5....T. c /etc/qubes-rpc/policy/qubes.GpgImportKey
..5....T. c /etc/yum.repos.d/qubes-dom0.repo
S.5....T. c /etc/xdg/xfce4/xfconf/xfce-perchannel-xml/xsettings.xml
..5....T. /usr/share/themes/Xfce-dusk/gtk-2.0/gtkrc
S.5....T. c /etc/dnf/dnf.conf
.......T. /lib/modules/4.14.116-1.pvops.qubes.x86_64/modules.builtin.bin
.......T. /lib/modules/4.14.116-1.pvops.qubes.x86_64/modules.devname
.......T. /lib/modules/4.14.116-1.pvops.qubes.x86_64/modules.softdep
S.5....T. c /etc/xscreensaver/XScreenSaver.ad.tail
missing /var/run/pulse
S.5....T. c /etc/qubes-rpc/policy/qubes.ClipboardPaste
It now appears that this problem occurs as a result of activity, not time. This week, my pattern of Qubes usage has been different from usual. I'm at over 7 days of uptime, yet the problem has not occurred. My usual pattern of activity includes a lot of suspending and resuming, as well as a lot of shutting down and restarting of VMs, especially at night during scripted backups and template updates. Since I haven't been doing any of this as much over the past week, and the problem has not occurred by now, I suspect that there is something about my usual pattern of activity that triggers the problem.
I wonder if this has been fixed in a newer version of libvirtd.
I wonder if this has been fixed in a newer version of libvirtd.
I can check and report back. Which version should I have for a good test?
No idea, tbh.
Ok, I'll just keep an eye on it in general.
Can you reproduce this in R4.1 with a Fedora 32 dom0?
Can you reproduce this in R4.1 with a Fedora 32 dom0?
Sorry, I don't have a machine to test 4.1, but perhaps someone else who has experienced the issue can. Or I can report back once 4.1 is stable and I've migrated to it, but that might not be for a while.
I can at least confirm that this is still happening on R4.0 with libvirtd 3.3.0.
The original problem should have been fixed by QubesOS/qubes-core-admin#397. @andrewdavidwong if you still experience libvirtd crashes, please open a separate issue for them.
Still happens to me, 4.1, cant use sys-usb
Qubes 4.0
Affected component(s):
All VMs
Steps to reproduce the behavior:
Boot computer, decrypt, and log in.
Expected behavior:
VMs start as usual, sys-usb, sys-net, sys-firewall, etc.
Actual behavior:
No VMs start. No USB items function. When attempting to start a VM, an error is provided:
ERROR: b' Logical Volume "vm-sys-usb-root-snap" already exists in volume group "qubes_dom0" \n'
Everything appeared normal when I shut it down last night, with the exception that I had to kill the vault VM since it repeatedly prompted for it. That was the first time vault didn't shut down cleanly on its own. Typically that only happens with the VPN VMs and it never caused a problem.
I rebooted repeatedly today in hopes it would resolve itself but to no avail.
Any VM which has internet access, when attempting to start, references vm-sys-net-root-snap in the error. Presumably because that is the first VM in the chain that must start.
Oddly, except sys-usb which references itself, VMs which do not have internet access, such as vault, when attempting to start, provide a different, puzzling error:
ERROR: b" Can't create snapshot vm-sys-whonix-private as origin vm-sys-whonix-private -snap is not suspended.\n Failed to suspend qubes_dom0/pool00 with queued messages.\n"
Why offline VMs should refer to sys-whonix is beyond me.
I lack the skills to troubleshoot this. Any and all help would be extremely appreciated.
General notes:
Related issues: