flatcar / Flatcar

Flatcar project repository for issue tracking, project documentation, etc.
https://www.flatcar.org/
Apache License 2.0
653 stars 27 forks source link

btrfs allocation issue #1473

Open tormath1 opened 1 week ago

tormath1 commented 1 week ago

Description

Recently noticed and I'm not sure really since when it is around but BTRFS allocation looks variable from one build to the other (at least on current Alpha and Beta):

Example on Beta-3941.1.0 (good behavior):

$ sudo btrfs fi usage /usr
Overall:
    Device size:        1015.99MiB
    Device allocated:        572.00MiB
    Device unallocated:      443.99MiB
    Device missing:          0.00B
    Device slack:            0.00B
    Used:            462.76MiB
    Free (estimated):        546.67MiB  (min: 546.67MiB)
    Free (statfs, df):       442.94MiB
    Data ratio:               1.00
    Metadata ratio:           1.00
    Global reserve:        2.57MiB  (used: 0.00B)
    Multiple profiles:              no

Data+Metadata,single: Size:568.00MiB, Used:462.75MiB (81.47%)
   /dev/dm-0     568.00MiB

System,single: Size:4.00MiB, Used:4.00KiB (0.10%)
   /dev/dm-0       4.00MiB

Unallocated:
   /dev/dm-0     443.99MiB

While on a main build:

$  sudo btrfs fi usage /usr
Overall:
    Device size:        1015.99MiB
    Device allocated:        684.00MiB
    Device unallocated:      331.99MiB
    Device missing:          0.00B
    Device slack:            0.00B
    Used:            462.88MiB
    Free (estimated):        546.61MiB  (min: 546.61MiB)
    Free (statfs, df):       330.94MiB
    Data ratio:               1.00
    Metadata ratio:           1.00
    Global reserve:        2.51MiB  (used: 0.00B)
    Multiple profiles:              no

Data+Metadata,single: Size:680.00MiB, Used:462.88MiB (68.07%)
   /dev/dm-0     680.00MiB

System,single: Size:4.00MiB, Used:4.00KiB (0.10%)
   /dev/dm-0       4.00MiB

Unallocated:
   /dev/dm-0     331.99MiB

Allocated space is different.

Impact

The impact is that the filesystem appears to be more used than in reality:

14:15:07   File    Size  Used Avail Use% Type
14:15:07  -/usr   1016M  465M  443M  52% btrfs
14:15:07  +/usr   1016M  465M  331M  59% btrfs

Random behavior example with the last alpha (4012.0.0) release:

 --- a/tmp/4011.0.0+nightly-20240624-2100-o4CAju
 +++ b/tmp/4012.0.0-0AUpWi
 @@ -1,5 +1,5 @@
  File    Size  Used Avail Use% Type
  /boot   127M   61M   66M  48% vfat
 -/usr   1016M  468M  331M  59% btrfs
 +/usr   1016M  468M  443M  52% btrfs

Similar thing can be observed after rerunning a Beta build.

ader1990 commented 5 days ago

Hello @tormath1, I will try to reproduce the issue in my env too, to take a better look.

ader1990 commented 3 days ago

Hello,

I have reproduced the behaviour in my environment using the Flatcar SDK to build a Flatcar image ~ 50% chance after running build_image and image_to_vm.sh. But I cannot reproduce the issue manually, I have tried with this simple script:

#!/bin/bash

set -xe

umount /mnt || true
losetup -d /dev/loop6 || true

# create a loopback file of ~2GB
dd of=test.loop if=/dev/zero bs=1MB count=2048
losetup /dev/loop6 test.loop
# use the exact values from Flatcar layout
mkfs.btrfs --mixed -m single -d single --byte-count 1065345024 --label USR-A /dev/loop6

# mount the btrfs partition
mount -o relatime,seclabel,space_cache=v2,subvolid=5,subvol=/ /dev/loop6 /mnt
btrfs fi usage /mnt

# set the zstd compression
btrfs property set /mnt compression zstd

# write a ~690MB file
dd if=/dev/zero of=/mnt/test_file bs=1KB count=682490 && sync
# replace the ~690MB file with a ~459MB file
dd if=/dev/zero of=/mnt/test_file bs=1KB count=459490 && sync

# df / usage shows correctly
btrfs fi usage /mnt

# try to rebalance and remove the unused btrfs space
btrfs balance start -v -dusage=5 -musage=5 /mnt

# df / usage shows correctly again, no disparity between Free estimated  and Free statsfs/df
btrfs fi usage /mnt

I think this issue is practically a non-issue, as from what I understood in the case of btrfs, the Linux syscalls used by df/statsfs are not properly showing in some conditions the actual correct values.

I will try to reproduce the disparity, but wanted to share this starting point if anyone else is also investigating.

ader1990 commented 3 days ago

I have tried a few times to create the image using this small fix and the sizes are converging:

diff --git a/build_library/disk_util b/build_library/disk_util
index f94317e3c1..32893c87c4 100755
--- a/build_library/disk_util
+++ b/build_library/disk_util
@@ -660,6 +660,7 @@ def ReadWriteSubvol(options, partition, disable_rw):
   with PartitionLoop(options, partition) as loop_dev:
     btrfs_mount = tempfile.mkdtemp()
     Sudo(['mount', '-t', 'btrfs', loop_dev, btrfs_mount])
+    Sudo(['btrfs', 'balance', 'start', '-dusage=0', '-musage=0', btrfs_mount])
     try:
       Sudo(['btrfs', 'property', 'set', '-ts', btrfs_mount, 'ro', 'true' if disable_rw else 'false'])
     finally:

@tormath1 I could not find the actual cause of this issue or reproduce it in isolation yet, but this patch should not do any harm, as the balance gets done right before making the partition readonly and the verity signing.

ader1990 commented 2 hours ago

Adding the commit https://github.com/flatcar/scripts/commit/95d8361fe9594a807ab2e76ab6c3830c6024f204 notes here for visibility:

Note that /usr is also a zstd compressed btrfs partition, so the output
of `df` free size and the actual free size after a file write for
example, will be very different, because the data in that file write has
a compression rate only definable after the file sync.

Unfortunately, there is no determinism in the btrfs file system case, because even if
you could in theory pre-compress with zstd the file before, and have an
idea about the size to be used, you still cannot really predict also the metadata
size for that file write.