Closed duncancmt closed 2 months ago
This is a separate partition on the same NVMe drive as my Qubes BTRFS root.
Using the same setup, I haven't been able to reproduce this problem.
Steps to reproduce
Create partition. Format as XFS.
Can you post the exact dom0 commands? The qvm-block attach
command too.
[ 1837.176949] I/O error, dev xvdi, sector 0 op 0x0:(READ) flags 0x1000 phys_seg 1 prio class 0
Odd. Are you by any chance involving dm-integrity somehow, especially cryptsetup luksFormat --integrity-no-wipe
?
I apologize for the glibness of the steps to reproduce. I partitioned this drive quite a while ago, so I don't have the exact commands I ran. It was something along the lines of using gdisk
to partition in dom0 and:
$ sudo mkfs.xfs -f -m bigtime=1 -m rmapbt=1 -m reflink=1 /dev/xvdi
to format the drive inside the VM. This is the same SSD that holds dom0's root and boot (root is btrfs, encrypted of course; in this case LUKS2 xchacha12,aes-adiantum-plain64
, no integrity), but the badly-behaved partition is just that: a partition. It's not a btrfs subvolume.
As far as commands I use to attempt to mount the drive:
dom0
$ qvm-block attach ethereum dom0:nvme0n1p4
$ echo $?
0
ethereum
$ sudo mount -m /dev/xvdi /mnt/ethereum
mount: /mnt/ethereum: can't read superblock on /dev/xvdi.
dmesg(1) may have more information after failed mount system call.
$ sudo dmesg
<snip>
[ 44.598154] SGI XFS with ACLs, security attributes, realtime, scrub, quota, no debug enabled
[ 44.602523] I/O error, dev xvdi, sector 0 op 0x0:(READ) flags 0x1000 phys_seg 1 prio class 0
[ 44.602582] XFS (xvdi): SB validate failed with error -5.
Then I can detach the volume from the ethereum VM, shut it down, and back in dom0 run:
$ sudo mount -m /dev/nvme0n1p4 /mnt/ethereum
$ echo $?
0
$ ls /mnt/ethereum
<some files>
Odd. Are you by any chance involving dm-integrity somehow, especially
cryptsetup luksFormat --integrity-no-wipe
?
As implied by the commands above, this is just a bare partition that has been formatted XFS. No RAID. No LUKS. No integrity.
And for the record, I get the same behavior on Fedora 38, Fedora 39, and Debian 12.
Are there any dom0 kernel messages during the period where you are attaching the partition and attempting to mount it inside the VM?
I also wonder if it's possible to take XFS out of the equation: If you attach the partition and run something like sudo head -c 100M /dev/xvdi | sha256sum
does it result in the same read error?
$ sudo dd if=/dev/xvdi of=/dev/null bs=1 count=1M status=progress
1048576+0 records in
1048576+0 records out
1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.440201 s, 2.4 MB/s
$ echo $?
0
dmesg shows nothing of note
$ sudo mount -m /dev/xvdi /mnt/ethereum
mount: /mnt/ethereum: can't read superblock on /dev/xvdi.
dmesg(1) may have more information after failed mount system call.
$ sudo dmesg
<snip>
[ 68.055045] SGI XFS with ACLs, security attributes, realtime, scrub, quota, no debug enabled
[ 68.057308] I/O error, dev xvdi, sector 0 op 0x0:(READ) flags 0x1000 phys_seg 1 prio class 0
[ 68.057333] XFS (xvdi): SB validate failed with error -5.
$ sudo head -c 100M /dev/xvdi | sha256sum
REDACTED -
$ echo $?
0
so it appears to be XFS-specific
and for what it's worth, shasum-ing the first 100M of the partition in dom0 returns the same hash
The only logline in dom0 dmesg that appears during the process is (sorry hand copied):
[206838.262156] xen-blkback: backend/vbd/15/51840: using 4 queues, protocol 1 (x86_64-abi) persistent grants
which seems uninteresting
This is so intriguing! I'm out of ideas though :(
Maybe try the linux-xfs mailing list?
Apparently I'm now experiencing the issue myself: A read error at sector 0 happening only when I attempt to mount the attached block device, but not otherwise. However my block device contains an ext4 filesystem instead of XFS.
This has occurred almost all the time with kernel-latest-qubes-vm (6.9.4 and 6.9.2). I haven't been able to reproduce it with kernel-qubes-vm (6.6.33) so far.
My source device in dom0 is a loop device with 4096 byte logical+physical block size, which in the failing case is attached to the VM with 512 byte logical+physical blocks. Can you try this in your setup (substituting your NVMe device for my loop12
device) @duncancmt?
[user@dom0 ~]$ head /sys/block/loop12/queue/*_block_size
==> /sys/block/loop12/queue/logical_block_size <==
4096
==> /sys/block/loop12/queue/physical_block_size <==
4096
[user@dom0 bin]$ qvm-run -p the-vm 'head /sys/block/xvdi/queue/*_block_size'
==> /sys/block/xvdi/queue/logical_block_size <==
512
==> /sys/block/xvdi/queue/physical_block_size <==
512
read error at sector 0 happening only when I attempt to mount the attached block device, but not otherwise
The discrepancy is due to the read during the mount attempt happening with direct I/O turned on. I also get it for dd if=/dev/xvdi of=/dev/null count=1
with vs. without iflag=direct
Kernel regression then?
Yeah, last good one appears to be 6.8.8-1. Unfortunately kernel-latest 6.9.2-1 is already in stable :(
Can you try this in your setup?
[user@dom0 ~]$ head /sys/block/nvme0n1/queue/*_block_size
==> /sys/block/nvme0n1/queue/logical_block_size <==
4096
==> /sys/block/nvme0n1/queue/physical_block_size <==
4096
[user@dom0 ~]$ qvm-run -p ethereum 'head /sys/block/xvdi/queue/*_block_size'
==> /sys/block/xvdi/queue/logical_block_size <==
4096
==> /sys/block/xvdi/queue/physical_block_size <==
4096
so that's different :thinking:
I also get it for
dd if=/dev/xvdi of=/dev/null count=1
with vs. withoutiflag=direct
[user@ethereum ~]$ sudo dd if=/dev/xvdi of=/dev/null bs=1 count=1 status=progress
1+0 records in
1+0 records out
1 byte copied, 0.0119412 s, 0.1 kB/s
[user@ethereum ~]$ sudo dd if=/dev/xvdi of=/dev/null bs=1 count=1 status=progress iflag=direct
/usr/bin/dd: error reading '/dev/xvdi': Invalid argument
0+0 records in
0+0 records out
0 bytes copied, 2.7803e-05 s, 0.0 kB/s
so that's different as well
EDIT: I get the same error in dom0 with iflag=direct
; setting bs=4096
makes dd
happy, but bs=512
does not
Oh wait. I got ahead of myself and downgraded the troublesome VM to 6.8.8-1. With the vm on 6.9.2-1, I get:
[user@dom0 ~]$ qvm-run -p ethereum 'head /sys/block/xvdi/queue/*_block_size'
==> /sys/block/xvdi/queue/logical_block_size <==
512
==> /sys/block/xvdi/queue/physical_block_size <==
512
[user@ethereum ~]$ sudo dd if=/dev/xvdi of=/dev/null count=1 status=progress
1+0 records in
1+0 records out
512 bytes copied, 0.00148875 s, 344 kB/s
[user@ethereum ~]$ sudo dd if=/dev/xvdi of=/dev/null count=1 status=progress iflag=direct
/usr/bin/dd: error reading '/dev/xvdi': Input/output error
0+0 records in
0+0 records out
0 bytes copied, 0.000170858 s, 0.0 kB/s
[user@ethereum ~]$ sudo dd if=/dev/xvdi of=/dev/null bs=4096 count=1 status=progress iflag=direct
1+0 records in
1+0 records out
4096 bytes (4.1 kB, 4.0 KiB) copied, 0.011916 s, 344 kB/s
What about adding bs=4096 ? Does it change anything?
-- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab
Oh wait. I got ahead of myself and downgraded the troublesome VM to 6.8.8-1. With the vm on 6.9.2-1, I get:
Thank God :laughing:
There were 5 commits to xen-blkfront.c
in February that all landed in kernel 6.9. The last one has logical/physical block size stuff in the diff, although the first one is already related to queue limits.
Thanks @rustybird ! I've forwarded the info to relevant maintainers: https://lore.kernel.org/xen-devel/Znl5FYI9CC37jJLX@mail-itl/T/#u
@rustybird @duncancmt the above linked thread has a proposed fix from Christoph Hellwig already, care to try (and preferably report back in the email thread)?
Welp, I can't even get builderv2 to download the repo (there's a different number of missing bytes every time):
15:35:15,176 [executor:local:/home/user/tmp/138233767774496c9ed3ed0/builder] output: args: (128, ['git', 'clone', '-n', '-q', '-b', 'main', 'https://github.com/QubesOS/qubes-linux-kernel', '/home/user/tmp/138233767774496c9ed3ed0/builder/linux-kernel-latest'])
15:35:15,177 [executor:local:/home/user/tmp/138233767774496c9ed3ed0/builder] output: stdout: b''
15:35:15,177 [executor:local:/home/user/tmp/138233767774496c9ed3ed0/builder] output: stderr: b'error: 3213 bytes of body are still expected\nfetch-pack: unexpected disconnect while reading sideband packet\nfatal: early EOF\nfatal: fetch-pack: invalid index-pack output\n'
Maybe I can just build the xen_blkfront module manually somehow.
Edit: Managed to download the repo and everything, it's building
Edit 2: The patch works: https://lore.kernel.org/xen-devel/Znndj9W_bCsFTxkz@mutt/
Automated announcement from builder-github
The component linux-kernel-latest
(including package kernel-latest-6.9.7-1.qubes.fc32
) has been pushed to the r4.1
testing repository for dom0.
To test this update, please install it with the following command:
sudo qubes-dom0-update --enablerepo=qubes-dom0-current-testing
Automated announcement from builder-github
The component linux-kernel-latest
(including package kernel-latest-6.9.7-1.qubes.fc32
) has been pushed to the r4.1
stable repository for dom0.
To install this update, please use the standard update command:
sudo qubes-dom0-update
Or update dom0 via Qubes Manager.
Qubes OS release
4.2.1
Brief summary
Prior to upgrading to 4.2, I was able to mount a partition inside a VM. Now I am not. This is a separate partition on the same NVMe drive as my Qubes BTRFS root. Surprisingly, the partition mounts just fine in dom0.
https://forum.qubes-os.org/t/cant-mount-xfs-filesystem-from-a-partition-after-upgrade-to-4-2/26809
Steps to reproduce
Create partition. Format as XFS. Attempt to mount in a VM
Expected behavior
It should mount
Actual behavior
Error thrown