Open tormath1 opened 5 months ago
Hello @tormath1, I will try to reproduce the issue in my env too, to take a better look.
Hello,
I have reproduced the behaviour in my environment using the Flatcar SDK to build a Flatcar image ~ 50% chance after running build_image and image_to_vm.sh. But I cannot reproduce the issue manually, I have tried with this simple script:
#!/bin/bash
set -xe
umount /mnt || true
losetup -d /dev/loop6 || true
# create a loopback file of ~2GB
dd of=test.loop if=/dev/zero bs=1MB count=2048
losetup /dev/loop6 test.loop
# use the exact values from Flatcar layout
mkfs.btrfs --mixed -m single -d single --byte-count 1065345024 --label USR-A /dev/loop6
# mount the btrfs partition
mount -o relatime,seclabel,space_cache=v2,subvolid=5,subvol=/ /dev/loop6 /mnt
btrfs fi usage /mnt
# set the zstd compression
btrfs property set /mnt compression zstd
# write a ~690MB file
dd if=/dev/zero of=/mnt/test_file bs=1KB count=682490 && sync
# replace the ~690MB file with a ~459MB file
dd if=/dev/zero of=/mnt/test_file bs=1KB count=459490 && sync
# df / usage shows correctly
btrfs fi usage /mnt
# try to rebalance and remove the unused btrfs space
btrfs balance start -v -dusage=5 -musage=5 /mnt
# df / usage shows correctly again, no disparity between Free estimated and Free statsfs/df
btrfs fi usage /mnt
I think this issue is practically a non-issue, as from what I understood in the case of btrfs, the Linux syscalls used by df/statsfs are not properly showing in some conditions the actual correct values.
I will try to reproduce the disparity, but wanted to share this starting point if anyone else is also investigating.
I have tried a few times to create the image using this small fix and the sizes are converging:
diff --git a/build_library/disk_util b/build_library/disk_util
index f94317e3c1..32893c87c4 100755
--- a/build_library/disk_util
+++ b/build_library/disk_util
@@ -660,6 +660,7 @@ def ReadWriteSubvol(options, partition, disable_rw):
with PartitionLoop(options, partition) as loop_dev:
btrfs_mount = tempfile.mkdtemp()
Sudo(['mount', '-t', 'btrfs', loop_dev, btrfs_mount])
+ Sudo(['btrfs', 'balance', 'start', '-dusage=0', '-musage=0', btrfs_mount])
try:
Sudo(['btrfs', 'property', 'set', '-ts', btrfs_mount, 'ro', 'true' if disable_rw else 'false'])
finally:
@tormath1 I could not find the actual cause of this issue or reproduce it in isolation yet, but this patch should not do any harm, as the balance gets done right before making the partition readonly and the verity signing.
Adding the commit https://github.com/flatcar/scripts/commit/95d8361fe9594a807ab2e76ab6c3830c6024f204 notes here for visibility:
Note that /usr is also a zstd compressed btrfs partition, so the output
of `df` free size and the actual free size after a file write for
example, will be very different, because the data in that file write has
a compression rate only definable after the file sync.
Unfortunately, there is no determinism in the btrfs file system case, because even if
you could in theory pre-compress with zstd the file before, and have an
idea about the size to be used, you still cannot really predict also the metadata
size for that file write.
While checking the journalctl output on the latest main, I observed that this warning appears 'nologreplay' is deprecated, use 'rescue=nologreplay' instead
. But there is no such mount option used in the flatcar/scripts repo as far as I know, the deprecated values were recently removed by https://github.com/flatcar/scripts/commit/18265de9d86dfe72532fd8d519d5897df9e7eead.
@jepio do you have an idea from where the warning might come? I checked flatcar init / bootengine repos, but those also look fine.
/usr mount log :
Jul 01 16:38:11 localhost systemd[1]: Found device dev-mapper-usr.device - /dev/mapper/usr.
Jul 01 16:38:11 localhost systemd[1]: Mounting sysusr-usr.mount - /sysusr/usr...o
Jul 01 16:38:11 localhost systemd[1]: Finished verity-setup.service - Verity Setup for /dev/mapper/usr.
Jul 01 16:38:11 localhost kernel: BTRFS info (device dm-0): first mount of filesystem 60877fc8-37bb-4e8a-ae4f-aaea0a123cfa
Jul 01 16:38:11 localhost kernel: BTRFS info (device dm-0): using crc32c (crc32cc-intel) checksum algorithm
Jul 01 16:38:11 localhost kernel: BTRFS warning (device dm-0): 'nologreplay' is deprecated, use 'rescue=nologreplay' instead
Jul 01 16:38:11 localhost kernel: BTRFS info (device dm-0): disabling log replay at mount time
Jul 01 16:38:11 localhost kernel: BTRFS info (device dm-0): using free space treee
Jul 01 16:38:11 localhost systemd[1]: Mounted sysusr-usr.mount - /sysusr/usr.
I could actually obtain some really weird results during my experiments:
root@localhost ~ # btrfs fi usage /usr
Overall:
Device size: 1015.99MiB
Device allocated: 1014.94MiB
Device unallocated: 1.05MiB
Device missing: 0.00B
Device slack: 0.00B
Used: 465.39MiB
Free (estimated): 542.93MiB (min: 542.93MiB)
Free (statfs, df): 0.00B
Data ratio: 1.00
Metadata ratio: 1.00
Global reserve: 2.63MiB (used: 0.00B)
Multiple profiles: no
Data+Metadata,single: Size:1010.94MiB, Used:465.38MiB (46.03%)
/dev/mapper/usr 1010.94MiB
System,single: Size:4.00MiB, Used:4.00KiB (0.10%)
/dev/mapper/usr 4.00MiB
Unallocated:
/dev/mapper/usr 1.05MiB
root@localhost ~ # df -h /usr
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/usr 1016M 469M 0 100% /usr
root@localhost ~ # uname -a
Linux localhost 6.6.43-flatcar #1 SMP PREEMPT_DYNAMIC Wed Aug 7 13:29:34 -00 2024 x86_64 Intel(R) Xeon(R) Gold 6134 CPU @ 3.20GHz GenuineIntel GNU/Linux
root@localhost ~ # cat /etc/os-release
NAME="Flatcar Container Linux by Kinvolk"
ID=flatcar
ID_LIKE=coreos
VERSION=4054.0.0+nightly-20240806-2100
VERSION_ID=4054.0.0
BUILD_ID=nightly-20240806-2100
SYSEXT_LEVEL=1.0
PRETTY_NAME="Flatcar Container Linux by Kinvolk 4054.0.0+nightly-20240806-2100 (Oklo)"
ANSI_COLOR="38;5;75"
HOME_URL="https://flatcar.org/"
BUG_REPORT_URL="https://issues.flatcar.org"
FLATCAR_BOARD="amd64-usr"
CPE_NAME="cpe:2.3:o:flatcar-linux:flatcar_linux:4054.0.0+nightly-20240806-2100:*:*:*:*:*:*:*"
How I managed to obtain those results -> added a btrfs fi defrag
in the workflow. Still puzzled on what is happening and if it is an issue in the Linux kernel or btrfs-progs.
Actual command used in the disk_util: Sudo(['btrfs', 'fi', 'defrag', '-r', '-v', options.disk_image])
.
Made some progress and there might be a way to solve the problem, will make a PR with it. It seems that the only way to deallocate the size is to shrink and increase the filesystem size.
btrfs filesystem resize -500m /tmp/btrfs-mount
btrfs filesystem resize +500m /tmp/btrfs-mount
Flatcar Results:
root@localhost ~ # df -h /usr
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/usr 1016M 468M 443M 52% /usr
root@localhost ~ # btrfs fi usage /usr
Overall:
Device size: 1015.99MiB
Device allocated: 572.00MiB
Device unallocated: 443.99MiB
Device missing: 0.00B
Device slack: 0.00B
Used: 465.46MiB
Free (estimated): 544.02MiB (min: 544.02MiB)
Free (statfs, df): 442.94MiB
Data ratio: 1.00
Metadata ratio: 1.00
Global reserve: 2.52MiB (used: 0.00B)
Multiple profiles: no
Data+Metadata,single: Size:568.00MiB, Used:465.46MiB (81.95%)
/dev/mapper/usr 568.00MiB
System,single: Size:4.00MiB, Used:4.00KiB (0.10%)
/dev/mapper/usr 4.00MiB
Unallocated:
/dev/mapper/usr 443.99MiB
Came up with a script to get the closest reproduction
#!/bin/bash
set -xe
LOOP=/dev/loop15
mkdir /tmp/btrfs-mount || true
umount /tmp/btrfs-mount || true
losetup -d $LOOP || true
# create a loopback file of ~2GB
dd of=test.loop if=/dev/zero bs=1MB count=2048
losetup $LOOP test.loop
# use the exact values from Flatcar layout
mkfs.btrfs --mixed -m single -d single --byte-count 1065345024 --label USR-A $LOOP
# mount the btrfs partition
mount -o relatime,seclabel,space_cache=v2,subvolid=5,subvol=/ $LOOP /tmp/btrfs-mount
btrfs fi usage /tmp/btrfs-mount
# set the zstd compression
btrfs property set /tmp/btrfs-mount compression zstd
# write a ~690MB file
dd if=/dev/random of=/tmp/btrfs-mount/test_file bs=1KB count=682490 && sync
# replace the ~690MB file with a ~459MB file
dd if=/dev/random of=/tmp/btrfs-mount/test_file bs=1KB count=459490 && sync
# Allocated value is really high
btrfs fi usage /tmp/btrfs-mount
# decrease the filesystem to more than it can actually do
btrfs filesystem resize -500m /tmp/btrfs-mount | true
# Allocated value is got reset to a low value
btrfs fi usage /tmp/btrfs-mount
Output:
# Initial clean fs
Overall:
Device size: 1015.99MiB
Device allocated: 12.00MiB
Device unallocated: 1003.99MiB
Device missing: 0.00B
Used: 36.00KiB
Free (estimated): 1010.59MiB (min: 1010.59MiB)
Free (statfs, df): 1010.91MiB
Data ratio: 1.00
Metadata ratio: 1.00
Global reserve: 1.38MiB (used: 0.00B)
Multiple profiles: no
# Before resize
Overall:
Device size: 1015.99MiB
Device allocated: 1014.94MiB
Device unallocated: 1.05MiB
Device missing: 0.00B
Used: 438.72MiB
Free (estimated): 570.84MiB (min: 570.84MiB)
Free (statfs, df): 572.22MiB
Data ratio: 1.00
Metadata ratio: 1.00
Global reserve: 1.38MiB (used: 0.00B)
Multiple profiles: no
# After failed resize
ERROR: unable to resize '/tmp/btrfs-mount': No space left on device
Overall:
Device size: 1015.99MiB
Device allocated: 572.00MiB
Device unallocated: 443.99MiB
Device missing: 0.00B
Used: 438.73MiB
Free (estimated): 571.89MiB (min: 571.89MiB)
Free (statfs, df): 572.21MiB
Data ratio: 1.00
Metadata ratio: 1.00
Global reserve: 1.38MiB (used: 0.00B)
Multiple profiles: no
Opened an issue upstream: https://bugzilla.kernel.org/show_bug.cgi?id=219167
Description
Recently noticed and I'm not sure really since when it is around but BTRFS allocation looks variable from one build to the other (at least on current Alpha and Beta):
Example on Beta-3941.1.0 (good behavior):
While on a
main
build:Allocated space is different.
Impact
The impact is that the filesystem appears to be more used than in reality:
Random behavior example with the last alpha (4012.0.0) release:
Similar thing can be observed after rerunning a Beta build.