QubesOS / qubes-issues

The Qubes OS Project issue tracker
https://www.qubes-os.org/doc/issue-tracking/
536 stars 47 forks source link

Support 4k storage #4974

Open ij1 opened 5 years ago

ij1 commented 5 years ago

Qubes OS version

R4.0

Affected component(s) or functionality

VMs not working/starting right from a fresh install.

Brief summary

Right after a fresh install, all VMs fail to mount root and therefore fails to start beyond the point where they expect /dev/xvda3 available. This happens on a device that has 4kB logical and physical block sizes (NVMe drive). This was not problem in R3.2 (as it used files by default for VM storage).

To Reproduce

Steps to reproduce the behavior:

  1. Install Qubes to a drive with 4kB sector size (both logical / physical); (I put /boot to a SATA drive with 512B sectors to avoid BIOS/NVMe boot challenges, rest of the system is on the NVMe with 4kB sectors).
  2. Firstboot stuff fails
  3. After clicking "finish" for firstboot, find out that no VM will start successfully (which explains firstboot failures I guess)
  4. Look to the VM logs, and find this from there:
[    0.887548] blkfront: xvda: flush diskcache: enabled; persistent grants: enabled; indirect descriptors: enabled;
[    0.902355] blkfront: xvdb: flush diskcache: enabled; persistent grants: enabled; indirect descriptors: enabled;
[    0.924386] blkfront: xvdc: flush diskcache: enabled; persistent grants: enabled; indirect descriptors: enabled;
[    0.940325] blkfront: xvdd: flush diskcache: enabled; persistent grants: enabled; indirect descriptors: enabled;
Waiting for /dev/xvda* devices...
Qubes: Doing R/W setup for TemplateVM...
[    1.049451] random: sfdisk: uninitialized urandom read (4 bytes read)
[    1.052481]  xvdc: xvdc1
[    1.060250] random: mkswap: uninitialized urandom read (16 bytes read)
Setting up swapspace version 1, size = 8 GiB (8589930496 bytes)
no label, UUID=...
Qubes: done.
mount: wrong fs type, bad option, bad superblock on /dev/xvda,
       missing codepage or helper program, or other error

       In some cases useful info is found in syslog - try
       dmesg | tail or so.
Waiting for /dev/xvdd device...
mount: /dev/xvdd is write-protected, mounting read-only
[    1.099814] EXT4-fs (xvdd): mounting ext3 file system using the ext4 subsystem
[    1.106796] EXT4-fs (xvdd): mounted filesystem with ordered data mode. Opts: (null)
mount: /sysroot not mounted or bad option

       In some cases useful info is found in syslog - try
       dmesg | tail or so.
[    1.119049] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x1e335a008d5, max_idle_ns: 440795216613 ns
mount: /sysroot not mounted or bad option

       In some cases useful info is found in syslog - try
       dmesg | tail or so.
switch_root: failed to mount moving /sysroot to /: Invalid argument
switch_root: failed. Sorry.
[    1.217841] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000100
...

Expected behavior

VMs would start. Firstboot stuff would work. Drives with 4kB sector size would work.

Additional context

I've tracked this down to the handling of the partition table. With 512B sectors the location of the GPT differs from that of with 4kB sectors and therefore VMs fail to find the correct partition table from xvda. Obviously also the partition start/end values will be off by the factor of 8 because the templates are built(?) with an assumption of 512B sector size.

I'm not sure if there are other assumptions based on 512B sectors with the other /dev/xvd* drives.

Solutions you've tried

I cloned a template and I tried to manually fix the partition table of the clone (in dom0 through /dev/qubes_dom0/...). There's was plenty of space before the first partition, however, at the end the drive is so tight on space that the GPT secondary table won't fit so the xvda3 partition's tail was truncated slightly and I didn't try to resize its filesystem first (this probably causes some problems, potentially corruption?). With such a fixed partition table, I could start VMs (but there are then some other problems/oddities that might be due to incomplete firstboot or non-fixed fedora template, I only fixed the debian one which I mainly use normally). I could possibly enlarge the relevant LV slightly to avoid the truncate problem at the tail of xvda3 but I've not tried that yet.

I tried to look if I could somehow force pv/vg/lv chain to fake the logical sector size but couldn't find anything from the manpages.

Libvirt might be able to fake the logical_block_size but I've not yet tried that.

Relevant documentation you've consulted

During install, I used the custom install steps to create manual partitioning (but I think it is irrelevant).

Related, non-duplicate issues

None I could find, some other issues included failure to mount root successfully but the causes are different.

Decided solution

Add a partition table conversion to initramfs. Specifically, write a tool that would check if partition table matches current block size. If it matches, do nothing. If not, convert it to the right block size format before mounting anything. And destroy the wrong partition table (if isn't directly overridden by the converted one) to prevent confusion which one is the current one.

References: https://github.com/QubesOS/qubes-issues/issues/4974#issuecomment-482897265 https://github.com/QubesOS/qubes-issues/issues/4974#issuecomment-1677356693

DemiMarie commented 10 months ago

That’s really interesting, but it actually makes sense: since BTRFS is copy-on-write, it can (at the expense of performance) make arbitrarily small writes atomic.