Open scallyob opened 2 years ago
By the way, have been working through this last couple months before posting here: https://forum.qubes-os.org/t/backup-fails-since-upgrade-to-4-1-unless-qubes-are-all-shut-down/8660/15
Originally discovered in 4.1rc3 install, but just did a fresh 4.1 install and found it has not been fixed.
In https://github.com/QubesOS/qubes-issues/issues/7198#issuecomment-1086674714 you wrote that you have an ext4 installation. It's intentional that the Qubes backup system will refuse to back up a running VM stored on a deprecated legacy 'file' driver pool (which was never a safe thing to do).
The unintentional part is that when the backup destination is a VM (instead of dom0) the process will indeed just hang for some reason, instead of aborting with the message "Backup error: file pool cannot export running volumes" as it should.
You may want to reinstall R4.1 with one of the three supported automatic partitioning schemes:
LVM Thin Provisioning
(the default)Btrfs
Standard Partition
, keeping the preset File System
choice of xfs
for the partition at /var/lib/qubes
I believe I did use the default. I just went through the installer and here is what I remember doing.
When I got to the page entitled "Device Selecting", I selected /dev/sda and /dev/sdb, which I intended to use as a RAID1, then I used "custom" in order to set up the RAID.
The next screen is entitled "New Qubes OS R4.1.0 Installation" I can't think of any reason why I would have changed it from the default of "LVM Thin Provisioning" there. Then I had to delete the old root partition and recreate it as type: RAID check encrypt select RAID1 then select format as ext4
When I first attempted installing the release candidate I tried manually setting up the RAID with fdisk and mdadm but couldn't figure it out. The above seemed to work and is what I repeated the other day when doing a fresh install. But from what you've said it seems likely my lack of skills in the RAID install may be the source of my problem? Does what I describe above seem like the wrong way to do it?
The next screen is entitled "New Qubes OS R4.1.0 Installation" I can't think of any reason why I would have changed it from the default of "LVM Thin Provisioning" there. Then I had to delete the old root partition and recreate it as type: RAID check encrypt select RAID1 then select format as ext4
I suspect that is what the installer didn’t handle properly. You were probably hoping that the installer would create a RAID (either using LVM RAID or mdadm) and then do the normal provisioning on top of that. But the manual partitioning probably got it confused and caused it to leave out the LVM layer. That left you stuck with an ext4 filesystem without LVM, so Qubes OS had to resort to the old, crummy file pool. The result is that lots of stuff doesn’t work properly.
(I don’t think I have ever seen a good explanation for why the file pool is no good, so here is mine: The file pool uses Linux’s dm-snapshot
driver to provide snapshots. However, dm-snapshot
only supports a single origin and a single snapshot, so if one wants to support N revisions, one needs N dm-snapshot
layers. Reflinks and LVM thin provisioning, on the other hand, don’t distinguish between origins and snapshots. Both the origin and snapshot are independent of each other, and one can have huge numbers of snapshots with only O(log N) overhead. So it is much more practical to implement nice features this way.)
When I first attempted installing the release candidate I tried manually setting up the RAID with fdisk and mdadm but couldn't figure it out. The above seemed to work and is what I repeated the other day when doing a fresh install. But from what you've said it seems likely my lack of skills in the RAID install may be the source of my problem? Does what I describe above seem like the wrong way to do it?
This is an installer bug. At the very least, the installer should emit a giant warning if it cannot create a non-deprecated pool, with the default action being to not continue.
OK, thanks for the info @DemiMarie. So will have to wait for a new release of the installer?
Or is there a way to create a RAID setup with the current installer that won't cause this problem?
OK, thanks for the info @DemiMarie. So will have to wait for a new release of the installer?
Or is there a way to create a RAID setup with the current installer that won't cause this problem?
BTRFS’s built-in RAID should work (subject to the usual BTRFS caveats: avoid RAID5 and RAID6, I/O performance can be unstable, etc), and selecting XFS or BTRFS instead of ext4 will result in a usable varlibqubes pool. If you want to use dm-raid + LVM thin provisioning I believe you will need to do it manually.
reinstalling with BTRFS fixed the backup issue. But it did greatly increase the load, lagging of the system. Maybe the I/O performance you mention? I described it more in the forum for reference: https://forum.qubes-os.org/t/unable-to-get-fully-functional-system-when-installing-qubes-4-1-on-a-raid1/10682
reinstalling with BTRFS fixed the backup issue. But it did greatly increase the load, lagging of the system. Maybe the I/O performance you mention?
It probably is. BTRFS isn’t known for being fast and VM disks are one of the worst-case scenarios for it. You could also try XFS, which might be faster.
I described it more in the forum for reference: https://forum.qubes-os.org/t/unable-to-get-fully-functional-system-when-installing-qubes-4-1-on-a-raid1/10682
I will take a look at that in a bit.
How to file a helpful issue
Qubes OS release
4.1rc3 and 4.1 final release
Brief summary
Qubes Backup hangs after creating 30K file if qube being backed up is still running. If multiple qubes being backed up and some are running it will create a bigger file (presumably of the non-running ones) before hanging.
Steps to reproduce
Expected behavior
If I shutdown backuptest and run the steps above I see the progress bar move on the Backup tool until it reaches 100%. This creates a 3.7GB file that I can verify with the Restore Backup tool. I would expect these same results when backuptest is running.
Actual behavior
progress bar stays at 0% in Backup tool file never exceeds 30K and timestamp doesn't change after that in Terminal backup never completes, have to cancel the backup tool