Closed jlebon closed 2 years ago
@openstacker, Does this happen every time? Is it possible to narrow down when it started happening?
Thanks @jlebon I found something new. It works on my test environment based on OpenStack Cinder with LVM driver. But failed on our production (Cinder + Ceph).
Just a sanity check, is it only FCOS that's affected? Do you see any corruption when booting any other OSes? Have you validated attaching a new Cinder blockdev to an existing guest and doing e.g. mkfs.(xfs|ext4|btrfs|etc)
and loading it with some data seems fine?
Is this still able to be reproduced?
Just found another XFS corruption issue in multipath.day1
on QEMU:
[ 4.738880] systemd[1]: Starting File System Check on /dev/disk/by-label/dm-mpath-root...
[ 4.758132] systemd-fsck[621]: /usr/sbin/fsck.xfs: XFS file system.
[[0;32m OK [0m] Finished [0;1;39mFile System Check…v/disk/by-label/dm-mpath-root[0m.
[ 4.767015] systemd[1]: Finished File System Check on /dev/disk/by-label/dm-mpath-root.
Mounting [0;1;39m/sysroot[0m...[ 4.771354] systemd[1]: Mounting /sysroot...
[ 4.937470] SGI XFS with ACLs, security attributes, scrub, quota, no debug enabled
[ 4.981550] XFS (dm-4): Mounting V5 Filesystem
[ 5.190848] XFS (dm-4): totally zeroed log
[ 5.191944] XFS (dm-4): Corruption warning: Metadata has LSN (5:12452) ahead of current LSN (1:0). Please unmount and run xfs_repair (>= v4.3) to resolve.
[ 5.194403] XFS (dm-4): log mount/recovery failed: error -22
[ 5.197860] XFS (dm-4): log mount failed
While chasing down problems seen on a hotfixed RHCOS 4.9 build, we observed some races between randomizing the rootfs UUID and mounting the rootfs. The fix for this is believed to be in https://github.com/coreos/fedora-coreos-config/pull/1357
Speaking with @sandeen about the race, he identified this issue that could be traced to the same root cause.
See also downstream BZ https://bugzilla.redhat.com/show_bug.cgi?id=2055258
(Originally posted by @openstacker in https://github.com/coreos/fedora-coreos-tracker/issues/735#issuecomment-813961330)
I'm getting the same issue with FCOS 33.20210301.3.1, see the console log below.