kola: `ext.config.kdump.crash` fails with XFS kernel error race condition

dustymabe commented 2 years ago

The ext.config.kdump.crash test started failing with an XFS corruption/kernel issue. Here is a snippet of output from 5.17.6-300.fc36.x86_64.

[    1.920053] systemd[1]: Mounting kdumproot-sysroot.mount - /kdumproot/sysroot...
[    2.194643] SGI XFS with ACLs, security attributes, scrub, quota, no debug enabled
[    2.199585] XFS: attr2 mount option is deprecated.
[    2.200537] XFS (vda4): Mounting V5 Filesystem
[    2.241396] XFS (vda4): Starting recovery (logdev: internal)
[    2.264029] XFS (vda4): xfs_buf_find: daddr 0x3b6001 out of range, EOFS 0x3b6000
[    2.264921] ------------[ cut here ]------------
[    2.265443] WARNING: CPU: 0 PID: 487 at fs/xfs/xfs_buf.c:556 xfs_buf_find+0x518/0x5f0 [xfs]
[    2.266645] Modules linked in: xfs crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel serio_raw virtio_net net_failover virtio_console failover virtio_blk ata_generic pata_acpi qemu_fw_cfg ip6_tables ip_tables ipmi_devintf ipmi_msghandler fuse overlay squashfs loop
[    2.269172] CPU: 0 PID: 487 Comm: mount Not tainted 5.17.6-300.fc36.x86_64 #1
[    2.269920] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-1.fc36 04/01/2014
[    2.270835] RIP: 0010:xfs_buf_find+0x518/0x5f0 [xfs]
[    2.271448] Code: e8 2d 4f f1 c0 e9 e5 fe ff ff 0f 1f 44 00 00 e9 b5 fe ff ff 48 89 c1 48 c7 c2 b0 bb 22 c0 48 c7 c6 e0 c6 23 c0 e8 65 28 05 00 <0f> 0b b8 8b ff ff ff e9 cd fe ff ff 0f 1f 44 00 00 e9 1a fd ff ff
[    2.273331] RSP: 0018:ffffc9000073f980 EFLAGS: 00010282
[    2.273888] RAX: 00000000ffffffea RBX: 0000000000000001 RCX: 000000007fffffff
[    2.274642] RDX: 0000000000000021 RSI: 0000000000000000 RDI: ffffffffc02378fd
[    2.275423] RBP: 0000000000000000 R08: 0000000000000000 R09: 000000000000000a
[    2.276300] R10: 000000000000000a R11: 0fffffffffffffff R12: ffff8880bb1c5100
[    2.277205] R13: ffffc9000073f9e8 R14: ffffc9000073fa98 R15: 0000000000000001
[    2.278144] FS:  00007f815240f800(0000) GS:ffff8880b8a00000(0000) knlGS:0000000000000000
[    2.279101] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    2.279807] CR2: 00007f4b81ba8c08 CR3: 00000000bb248000 CR4: 0000000000350eb0
[    2.280829] Call Trace:
[    2.281176]  <TASK>
[    2.281461]  xfs_buf_get_map+0x35/0x390 [xfs]
[    2.282289]  xfs_buf_read_map+0x39/0x270 [xfs]
[    2.282932]  xfs_buf_readahead_map+0x40/0x50 [xfs]
[    2.283648]  ? xfs_buf_readahead_map+0x22/0x50 [xfs]
[    2.284421]  xlog_buf_readahead+0x52/0x60 [xfs]
[    2.285113]  xlog_recover_commit_trans+0xcf/0x300 [xfs]
[    2.285879]  xlog_recovery_process_trans+0xc0/0xf0 [xfs]
[    2.286685]  xlog_recover_process_data+0xa0/0x120 [xfs]
[    2.287395]  xlog_do_recovery_pass+0x3ae/0x5b0 [xfs]
[    2.288004]  ? preempt_count_add+0x44/0x90
[    2.288474]  xlog_do_log_recovery+0x87/0xb0 [xfs]
[    2.289046]  xlog_do_recover+0x34/0x1b0 [xfs]
[    2.289597]  xlog_recover+0xb6/0x150 [xfs]
[    2.290113]  xfs_log_mount+0x14a/0x280 [xfs]
[    2.290651]  xfs_mountfs+0x3ce/0x840 [xfs]
[    2.291175]  ? xfs_filestream_get_parent+0x70/0x70 [xfs]
[    2.291807]  ? xfs_mru_cache_create+0x136/0x180 [xfs]
[    2.292469]  xfs_fs_fill_super+0x466/0x810 [xfs]
[    2.293032]  ? xfs_init_fs_context+0x170/0x170 [xfs]
[    2.293637]  get_tree_bdev+0x16d/0x260
[    2.294064]  vfs_get_tree+0x25/0xb0
[    2.294465]  path_mount+0x431/0xa70
[    2.294857]  __x64_sys_mount+0xe2/0x120
[    2.295290]  do_syscall_64+0x3a/0x80
[    2.295688]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[    2.296241] RIP: 0033:0x7f81526002ae
[    2.296641] Code: 48 8b 0d 6d fb 0d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 3a fb 0d 00 f7 d8 64 89 01 48
[    2.298518] RSP: 002b:00007ffdf1b0e548 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
[    2.299312] RAX: ffffffffffffffda RBX: 0000564ccec18bb0 RCX: 00007f81526002ae
[    2.300051] RDX: 0000564ccec18f20 RSI: 0000564ccec18f80 RDI: 0000564ccec19c50
[    2.300801] RBP: 0000000000000000 R08: 0000564ccec18e80 R09: 0000000000000073
[    2.301560] R10: 0000000000200000 R11: 0000000000000246 R12: 0000564ccec19c50
[    2.302314] R13: 0000564ccec18f20 R14: 00000000ffffffff R15: 00007f815274e076
[    2.303056]  </TASK>
[    2.303329] ---[ end trace 0000000000000000 ]---

This is failing in Ci (see https://github.com/coreos/fedora-coreos-config/pull/1740#issuecomment-1126412870) and also locally.

I'm really not sure how this issue made it into our testing stream, but it did.

ext.config.kdump.crash.console.txt

dustymabe commented 2 years ago

It turns out this might be a race condition. I originally thought it was related to a kernel update, but some further tests indicate its a race. If I add a sleep 5 in the test it starts to pass reliably. I'm not sure what changed recently to cause this race to start happening.

cc @sandeen since the trace is in the XFS stack.

dustymabe commented 2 years ago

New workaround in https://github.com/coreos/fedora-coreos-config/pull/1742 - I'm not thrilled about it but it gets us unblocked for now.

sandeen commented 2 years ago

Something is up with a growfs followed quickly by a crash. Sleep 5 pushed the gap between the two operations out past 30s, which solved the problem due to internal timers triggering log actions. We (mostly dchinner) are taking a look now. Essentially, we are replaying transactions that modify blocks past the original end of filesystem, but internally xfs still thinks it has the smaller size, and the "blocks past EOFS" check fails.

sandeen commented 2 years ago

This will reproduce it for me:

#!/bin/bash

rm -f fsfile
mkfs.xfs -b size=4096 -dfile,name=fsfile,size=486400b
truncate --size=10199478272 fsfile
mkdir -p mnt
mount -o loop fsfile mnt
xfs_growfs mnt
for I in `seq 1 32`; do
    mkdir mnt/dir$I
    touch mnt/dir$I/file
done
sync -f mnt
xfs_io -x -c "shutdown" mnt
umount mnt
mount -o loop fsfile mnt
umount mnt

Thank you for finding this bug. :)

dustymabe commented 2 years ago

Nice. Thanks @sandeen for taking a look. Is there a more proper issue that can be filed somewhere that we can track? Do we know when this problem was introduced (what kernel version)?

dustymabe commented 2 years ago

Also.. Wow. I must have got real lucky with the sleep 5. It was the first value for sleep that I chose.

sandeen commented 2 years ago

Dave thinks it's a zero-day bug. Best place to file is a good question, thanks for asking - I don't necessarily scale very well here. I think ideally, logging it in bugzilla.kernel.org would be best; to save time, a super brief overview of the issue and pointing back to the github issue is probably fine. But putting bugs on bugzilla.kernel.org will send them to the XFS developer list, and get more eyes on them.

It's probably going to be a bit before we get this one fixed (not complicated, I think, just lots of things competing for time right now), but your workaround should keep you in decent shape, yes? The trick is to wait > 30s between the growfs and the crash, I think.

dustymabe commented 2 years ago

Thanks @sandeen. I opened https://bugzilla.kernel.org/show_bug.cgi?id=216031

Sorry for the delay here.

coreos / fedora-coreos-tracker

kola: `ext.config.kdump.crash` fails with XFS kernel error race condition #1195