Open dustymabe opened 2 years ago
It turns out this might be a race condition. I originally thought it was related to a kernel update, but some further tests indicate its a race. If I add a sleep 5 in the test it starts to pass reliably. I'm not sure what changed recently to cause this race to start happening.
cc @sandeen since the trace is in the XFS stack.
New workaround in https://github.com/coreos/fedora-coreos-config/pull/1742 - I'm not thrilled about it but it gets us unblocked for now.
This will reproduce it for me:
#!/bin/bash
rm -f fsfile
mkfs.xfs -b size=4096 -dfile,name=fsfile,size=486400b
truncate --size=10199478272 fsfile
mkdir -p mnt
mount -o loop fsfile mnt
xfs_growfs mnt
for I in `seq 1 32`; do
mkdir mnt/dir$I
touch mnt/dir$I/file
done
sync -f mnt
xfs_io -x -c "shutdown" mnt
umount mnt
mount -o loop fsfile mnt
umount mnt
Thank you for finding this bug. :)
Nice. Thanks @sandeen for taking a look. Is there a more proper issue that can be filed somewhere that we can track? Do we know when this problem was introduced (what kernel version)?
Also.. Wow. I must have got real lucky with the sleep 5
. It was the first value for sleep that I chose.
Dave thinks it's a zero-day bug. Best place to file is a good question, thanks for asking - I don't necessarily scale very well here. I think ideally, logging it in bugzilla.kernel.org would be best; to save time, a super brief overview of the issue and pointing back to the github issue is probably fine. But putting bugs on bugzilla.kernel.org will send them to the XFS developer list, and get more eyes on them.
It's probably going to be a bit before we get this one fixed (not complicated, I think, just lots of things competing for time right now), but your workaround should keep you in decent shape, yes? The trick is to wait > 30s between the growfs and the crash, I think.
Thanks @sandeen. I opened https://bugzilla.kernel.org/show_bug.cgi?id=216031
Sorry for the delay here.
The
ext.config.kdump.crash
test started failing with an XFS corruption/kernel issue. Here is a snippet of output from5.17.6-300.fc36.x86_64
.This is failing in Ci (see https://github.com/coreos/fedora-coreos-config/pull/1740#issuecomment-1126412870) and also locally.
I'm really not sure how this issue made it into our
testing
stream, but it did.ext.config.kdump.crash.console.txt