VMs not working/starting right from a fresh install.
Brief summary
Right after a fresh install, all VMs fail to mount root and therefore fails to start beyond the point where they expect /dev/xvda3 available. This happens on a device that has 4kB logical and physical block sizes (NVMe drive). This was not problem in R3.2 (as it used files by default for VM storage).
To Reproduce
Steps to reproduce the behavior:
Install Qubes to a drive with 4kB sector size (both logical / physical); (I put /boot to a SATA drive with 512B sectors to avoid BIOS/NVMe boot challenges, rest of the system is on the NVMe with 4kB sectors).
Firstboot stuff fails
After clicking "finish" for firstboot, find out that no VM will start successfully (which explains firstboot failures I guess)
Look to the VM logs, and find this from there:
[ 0.887548] blkfront: xvda: flush diskcache: enabled; persistent grants: enabled; indirect descriptors: enabled;
[ 0.902355] blkfront: xvdb: flush diskcache: enabled; persistent grants: enabled; indirect descriptors: enabled;
[ 0.924386] blkfront: xvdc: flush diskcache: enabled; persistent grants: enabled; indirect descriptors: enabled;
[ 0.940325] blkfront: xvdd: flush diskcache: enabled; persistent grants: enabled; indirect descriptors: enabled;
Waiting for /dev/xvda* devices...
Qubes: Doing R/W setup for TemplateVM...
[ 1.049451] random: sfdisk: uninitialized urandom read (4 bytes read)
[ 1.052481] xvdc: xvdc1
[ 1.060250] random: mkswap: uninitialized urandom read (16 bytes read)
Setting up swapspace version 1, size = 8 GiB (8589930496 bytes)
no label, UUID=...
Qubes: done.
mount: wrong fs type, bad option, bad superblock on /dev/xvda,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so.
Waiting for /dev/xvdd device...
mount: /dev/xvdd is write-protected, mounting read-only
[ 1.099814] EXT4-fs (xvdd): mounting ext3 file system using the ext4 subsystem
[ 1.106796] EXT4-fs (xvdd): mounted filesystem with ordered data mode. Opts: (null)
mount: /sysroot not mounted or bad option
In some cases useful info is found in syslog - try
dmesg | tail or so.
[ 1.119049] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x1e335a008d5, max_idle_ns: 440795216613 ns
mount: /sysroot not mounted or bad option
In some cases useful info is found in syslog - try
dmesg | tail or so.
switch_root: failed to mount moving /sysroot to /: Invalid argument
switch_root: failed. Sorry.
[ 1.217841] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000100
...
Expected behavior
VMs would start. Firstboot stuff would work. Drives with 4kB sector size would work.
Additional context
I've tracked this down to the handling of the partition table. With 512B sectors the location of the GPT differs from that of with 4kB sectors and therefore VMs fail to find the correct partition table from xvda. Obviously also the partition start/end values will be off by the factor of 8 because the templates are built(?) with an assumption of 512B sector size.
I'm not sure if there are other assumptions based on 512B sectors with the other /dev/xvd* drives.
Solutions you've tried
I cloned a template and I tried to manually fix the partition table of the clone (in dom0 through /dev/qubes_dom0/...). There's was plenty of space before the first partition, however, at the end the drive is so tight on space that the GPT secondary table won't fit so the xvda3 partition's tail was truncated slightly and I didn't try to resize its filesystem first (this probably causes some problems, potentially corruption?). With such a fixed partition table, I could start VMs (but there are then some other problems/oddities that might be due to incomplete firstboot or non-fixed fedora template, I only fixed the debian one which I mainly use normally). I could possibly enlarge the relevant LV slightly to avoid the truncate problem at the tail of xvda3 but I've not tried that yet.
I tried to look if I could somehow force pv/vg/lv chain to fake the logical sector size but couldn't find anything from the manpages.
Libvirt might be able to fake the logical_block_size but I've not yet tried that.
None I could find, some other issues included failure to mount root successfully but the causes are different.
Decided solution
Add a partition table conversion to initramfs. Specifically, write a tool that would check if partition table matches current block size. If it matches, do nothing. If not, convert it to the right block size format before mounting anything. And destroy the wrong partition table (if isn't directly overridden by the converted one) to prevent confusion which one is the current one.
That’s really interesting, but it actually makes sense: since BTRFS is copy-on-write, it can (at the expense of performance) make arbitrarily small writes atomic.
Qubes OS version
R4.0
Affected component(s) or functionality
VMs not working/starting right from a fresh install.
Brief summary
Right after a fresh install, all VMs fail to mount root and therefore fails to start beyond the point where they expect /dev/xvda3 available. This happens on a device that has 4kB logical and physical block sizes (NVMe drive). This was not problem in R3.2 (as it used files by default for VM storage).
To Reproduce
Steps to reproduce the behavior:
Expected behavior
VMs would start. Firstboot stuff would work. Drives with 4kB sector size would work.
Additional context
I've tracked this down to the handling of the partition table. With 512B sectors the location of the GPT differs from that of with 4kB sectors and therefore VMs fail to find the correct partition table from xvda. Obviously also the partition start/end values will be off by the factor of 8 because the templates are built(?) with an assumption of 512B sector size.
I'm not sure if there are other assumptions based on 512B sectors with the other /dev/xvd* drives.
Solutions you've tried
I cloned a template and I tried to manually fix the partition table of the clone (in dom0 through /dev/qubes_dom0/...). There's was plenty of space before the first partition, however, at the end the drive is so tight on space that the GPT secondary table won't fit so the xvda3 partition's tail was truncated slightly and I didn't try to resize its filesystem first (this probably causes some problems, potentially corruption?). With such a fixed partition table, I could start VMs (but there are then some other problems/oddities that might be due to incomplete firstboot or non-fixed fedora template, I only fixed the debian one which I mainly use normally). I could possibly enlarge the relevant LV slightly to avoid the truncate problem at the tail of xvda3 but I've not tried that yet.
I tried to look if I could somehow force pv/vg/lv chain to fake the logical sector size but couldn't find anything from the manpages.
Libvirt might be able to fake the
logical_block_size
but I've not yet tried that.Relevant documentation you've consulted
During install, I used the custom install steps to create manual partitioning (but I think it is irrelevant).
Related, non-duplicate issues
None I could find, some other issues included failure to mount root successfully but the causes are different.
Decided solution
Add a partition table conversion to initramfs. Specifically, write a tool that would check if partition table matches current block size. If it matches, do nothing. If not, convert it to the right block size format before mounting anything. And destroy the wrong partition table (if isn't directly overridden by the converted one) to prevent confusion which one is the current one.
References: https://github.com/QubesOS/qubes-issues/issues/4974#issuecomment-482897265 https://github.com/QubesOS/qubes-issues/issues/4974#issuecomment-1677356693