Open Eric678 opened 12 months ago
Same here with a qube moved from fedora-37 (installed in 4.1) to fedora-38 template while on 4.2.
Solution here:
in dom0 :
sudo mount /dev/qubes_dom0/vm-MYQUBE-private /mnt
sudo touch /mnt/.autorelabel
sudo umount /mnt
qvm-start MYQUBE
Not sure what's going on with SELinux here, but:
sudo mount /dev/qubes_dom0/vm-MYQUBE-private /mnt
Never mount a VM volume in dom0. Do it in a DisposableVM instead (ideally based on a disposable template that uses the same TemplateVM as the VM): https://www.qubes-os.org/doc/mount-lvm-image/
I just experienced this issue when migrating to 4.2.0.
The qube is pretty big, about 350 GB. It was created on 4.1 with a derivative of the fedora-38 template 0:4.0.6-202305200036. By derivative I mean: I leave the out of the box templates alone, I clone them and install additional software in the clone to make them fit for purpose. There's nothing special about the qube.
Since I backed up and restored everything, the template and the qube are both present in the new install. The qube worked with no issues until I switched the template to a new template deriving from fedora-38-xfce 0:4.2.0-202312171103. Then the qube wouldn't start and I saw the relabel messages in the console log.
In my case I ran this to get around the issue:
qvm-prefs --set MYQUBE qrexec_timeout 1200
The qube did eventually start. The 1200 number was arrived at after some trial and error. 5 minutes wasn't enough.
After it started, I restarted it and it subsequently started in a reasonable amount of time. I will reset the qrexec_timeout to the default and carry on.
In my case I was only about 3 seconds short of making it. Selinux was set to permissive, until I got around to sorting out printing. There were just under 500K items in the private volume. Guess it is just a bit slow with all items needing labeling.
I restored VMs from a 4.1.2 install to a fresh 4.2.2 install and they all started correctly except 2 that failed with the same Cannot connect to qrexec agent for 60 seconds
error. One of them succeeded on the second try, while the other did not even succeed after 10 minutes of timeout. Mounting the volume into a temporary VM and creating the missing .autorelabel
fixed this issue. I wonder how it happened to go missing? Might be worth noting that after restoring the VMs, I started most of them nearly at the same time, so perhaps it timed out due to processing other ones before, and it could not add the .autorelabel
file before the first timeout? Not sure if that makes any sense, regardless thanks for the solution.
.autorelabel
on private volume signals when SELinux relabeling was completed. If it's missing, it means relabeling wasn't completed. For a really big private storage (in number of files, not necessarily bytes) it may take a while, could be also over 10 minutes. If you created it manually without actually completing relabeling, some labels will be missing and you will run into SELinux issues sooner or later. You can do relabeling manually: /usr/sbin/restorecon -RF /rw /home /usr/local
Qubes OS release
4.2.0-rc3 6.1.43
Brief summary
Transferring a specific app qube from R4.1 to R4.2 results in a qube that will not start: dom0: Cannot connect to qrexec agent for 60 seconds, see ...log: Job qubes-relabel-rw.service/start clocked up 57 seconds before the qube was killed.
Steps to reproduce
Cannot give specific instructions here as it only happened to 1 of my qubes when migrating across to 4.2. It is a largish qube ~30GB (my biggest) and so it was quite slow to trial and error, change something in /rw; backup; restore ... Did not stumble on the cause. Source template fedora-37, dest template fedora-38.
The first restore there were a few persistent block device attaches that generated warnings during restore on 4.2 and would have prevented a start in 4.1. First retry deleted those from dom0 before the backup. Perhaps there was something lingering on the destination after the original restored qube was deleted? This was the only qube that I did a full backup restore with persistent attaches.
Expected behavior
Restored qube would start normally.
Actual behavior
As above. While the relabel-rw was running it was in some sort of infinite loop with dom0. It was chalking up 75-95% CPU while dom0 had a couple of short sessions running kcryptd daemons.
My workaround was to tar up all of /rw on the source on 4.1, create a new qube on 4.2 and untar --overwrite on /rw at dest and immediately restart the qube. Much quicker than a backup restore! Worked a treat. Not had any problems with that qube since. Still have the broken qube, if anything useful can be done.