flatcar / Flatcar

Flatcar project repository for issue tracking, project documentation, etc.
https://www.flatcar.org/
Apache License 2.0
770 stars 32 forks source link

btrfs allocation issue #1473

Open tormath1 opened 5 months ago

tormath1 commented 5 months ago

Description

Recently noticed and I'm not sure really since when it is around but BTRFS allocation looks variable from one build to the other (at least on current Alpha and Beta):

Example on Beta-3941.1.0 (good behavior):

$ sudo btrfs fi usage /usr
Overall:
    Device size:        1015.99MiB
    Device allocated:        572.00MiB
    Device unallocated:      443.99MiB
    Device missing:          0.00B
    Device slack:            0.00B
    Used:            462.76MiB
    Free (estimated):        546.67MiB  (min: 546.67MiB)
    Free (statfs, df):       442.94MiB
    Data ratio:               1.00
    Metadata ratio:           1.00
    Global reserve:        2.57MiB  (used: 0.00B)
    Multiple profiles:              no

Data+Metadata,single: Size:568.00MiB, Used:462.75MiB (81.47%)
   /dev/dm-0     568.00MiB

System,single: Size:4.00MiB, Used:4.00KiB (0.10%)
   /dev/dm-0       4.00MiB

Unallocated:
   /dev/dm-0     443.99MiB

While on a main build:

$  sudo btrfs fi usage /usr
Overall:
    Device size:        1015.99MiB
    Device allocated:        684.00MiB
    Device unallocated:      331.99MiB
    Device missing:          0.00B
    Device slack:            0.00B
    Used:            462.88MiB
    Free (estimated):        546.61MiB  (min: 546.61MiB)
    Free (statfs, df):       330.94MiB
    Data ratio:               1.00
    Metadata ratio:           1.00
    Global reserve:        2.51MiB  (used: 0.00B)
    Multiple profiles:              no

Data+Metadata,single: Size:680.00MiB, Used:462.88MiB (68.07%)
   /dev/dm-0     680.00MiB

System,single: Size:4.00MiB, Used:4.00KiB (0.10%)
   /dev/dm-0       4.00MiB

Unallocated:
   /dev/dm-0     331.99MiB

Allocated space is different.

Impact

The impact is that the filesystem appears to be more used than in reality:

14:15:07   File    Size  Used Avail Use% Type
14:15:07  -/usr   1016M  465M  443M  52% btrfs
14:15:07  +/usr   1016M  465M  331M  59% btrfs

Random behavior example with the last alpha (4012.0.0) release:

 --- a/tmp/4011.0.0+nightly-20240624-2100-o4CAju
 +++ b/tmp/4012.0.0-0AUpWi
 @@ -1,5 +1,5 @@
  File    Size  Used Avail Use% Type
  /boot   127M   61M   66M  48% vfat
 -/usr   1016M  468M  331M  59% btrfs
 +/usr   1016M  468M  443M  52% btrfs

Similar thing can be observed after rerunning a Beta build.

ader1990 commented 5 months ago

Hello @tormath1, I will try to reproduce the issue in my env too, to take a better look.

ader1990 commented 5 months ago

Hello,

I have reproduced the behaviour in my environment using the Flatcar SDK to build a Flatcar image ~ 50% chance after running build_image and image_to_vm.sh. But I cannot reproduce the issue manually, I have tried with this simple script:

#!/bin/bash

set -xe

umount /mnt || true
losetup -d /dev/loop6 || true

# create a loopback file of ~2GB
dd of=test.loop if=/dev/zero bs=1MB count=2048
losetup /dev/loop6 test.loop
# use the exact values from Flatcar layout
mkfs.btrfs --mixed -m single -d single --byte-count 1065345024 --label USR-A /dev/loop6

# mount the btrfs partition
mount -o relatime,seclabel,space_cache=v2,subvolid=5,subvol=/ /dev/loop6 /mnt
btrfs fi usage /mnt

# set the zstd compression
btrfs property set /mnt compression zstd

# write a ~690MB file
dd if=/dev/zero of=/mnt/test_file bs=1KB count=682490 && sync
# replace the ~690MB file with a ~459MB file
dd if=/dev/zero of=/mnt/test_file bs=1KB count=459490 && sync

# df / usage shows correctly
btrfs fi usage /mnt

# try to rebalance and remove the unused btrfs space
btrfs balance start -v -dusage=5 -musage=5 /mnt

# df / usage shows correctly again, no disparity between Free estimated  and Free statsfs/df
btrfs fi usage /mnt

I think this issue is practically a non-issue, as from what I understood in the case of btrfs, the Linux syscalls used by df/statsfs are not properly showing in some conditions the actual correct values.

I will try to reproduce the disparity, but wanted to share this starting point if anyone else is also investigating.

ader1990 commented 5 months ago

I have tried a few times to create the image using this small fix and the sizes are converging:

diff --git a/build_library/disk_util b/build_library/disk_util
index f94317e3c1..32893c87c4 100755
--- a/build_library/disk_util
+++ b/build_library/disk_util
@@ -660,6 +660,7 @@ def ReadWriteSubvol(options, partition, disable_rw):
   with PartitionLoop(options, partition) as loop_dev:
     btrfs_mount = tempfile.mkdtemp()
     Sudo(['mount', '-t', 'btrfs', loop_dev, btrfs_mount])
+    Sudo(['btrfs', 'balance', 'start', '-dusage=0', '-musage=0', btrfs_mount])
     try:
       Sudo(['btrfs', 'property', 'set', '-ts', btrfs_mount, 'ro', 'true' if disable_rw else 'false'])
     finally:

@tormath1 I could not find the actual cause of this issue or reproduce it in isolation yet, but this patch should not do any harm, as the balance gets done right before making the partition readonly and the verity signing.

ader1990 commented 5 months ago

Adding the commit https://github.com/flatcar/scripts/commit/95d8361fe9594a807ab2e76ab6c3830c6024f204 notes here for visibility:

Note that /usr is also a zstd compressed btrfs partition, so the output
of `df` free size and the actual free size after a file write for
example, will be very different, because the data in that file write has
a compression rate only definable after the file sync.

Unfortunately, there is no determinism in the btrfs file system case, because even if
you could in theory pre-compress with zstd the file before, and have an
idea about the size to be used, you still cannot really predict also the metadata
size for that file write.
ader1990 commented 5 months ago

While checking the journalctl output on the latest main, I observed that this warning appears 'nologreplay' is deprecated, use 'rescue=nologreplay' instead. But there is no such mount option used in the flatcar/scripts repo as far as I know, the deprecated values were recently removed by https://github.com/flatcar/scripts/commit/18265de9d86dfe72532fd8d519d5897df9e7eead.

@jepio do you have an idea from where the warning might come? I checked flatcar init / bootengine repos, but those also look fine.

/usr mount log :

Jul 01 16:38:11 localhost systemd[1]: Found device dev-mapper-usr.device - /dev/mapper/usr.
Jul 01 16:38:11 localhost systemd[1]: Mounting sysusr-usr.mount - /sysusr/usr...o
Jul 01 16:38:11 localhost systemd[1]: Finished verity-setup.service - Verity Setup for /dev/mapper/usr.
Jul 01 16:38:11 localhost kernel: BTRFS info (device dm-0): first mount of filesystem 60877fc8-37bb-4e8a-ae4f-aaea0a123cfa
Jul 01 16:38:11 localhost kernel: BTRFS info (device dm-0): using crc32c (crc32cc-intel) checksum algorithm
Jul 01 16:38:11 localhost kernel: BTRFS warning (device dm-0): 'nologreplay' is deprecated, use 'rescue=nologreplay' instead
Jul 01 16:38:11 localhost kernel: BTRFS info (device dm-0): disabling log replay at mount time
Jul 01 16:38:11 localhost kernel: BTRFS info (device dm-0): using free space treee
Jul 01 16:38:11 localhost systemd[1]: Mounted sysusr-usr.mount - /sysusr/usr.
jepio commented 5 months ago

probably here: https://github.com/flatcar/bootengine/blob/flatcar-master/dracut/10usr-generator/usr-generator?

ader1990 commented 3 months ago

I could actually obtain some really weird results during my experiments:

root@localhost ~ # btrfs fi usage /usr
Overall:
    Device size:                1015.99MiB
    Device allocated:           1014.94MiB
    Device unallocated:            1.05MiB
    Device missing:                  0.00B
    Device slack:                    0.00B
    Used:                        465.39MiB
    Free (estimated):            542.93MiB      (min: 542.93MiB)
    Free (statfs, df):               0.00B
    Data ratio:                       1.00
    Metadata ratio:                   1.00
    Global reserve:                2.63MiB      (used: 0.00B)
    Multiple profiles:                  no

Data+Metadata,single: Size:1010.94MiB, Used:465.38MiB (46.03%)
   /dev/mapper/usr      1010.94MiB

System,single: Size:4.00MiB, Used:4.00KiB (0.10%)
   /dev/mapper/usr         4.00MiB

Unallocated:
   /dev/mapper/usr         1.05MiB
root@localhost ~ # df -h /usr
Filesystem       Size  Used Avail Use% Mounted on
/dev/mapper/usr 1016M  469M     0 100% /usr

root@localhost ~ # uname -a
Linux localhost 6.6.43-flatcar #1 SMP PREEMPT_DYNAMIC Wed Aug  7 13:29:34 -00 2024 x86_64 Intel(R) Xeon(R) Gold 6134 CPU @ 3.20GHz GenuineIntel GNU/Linux
root@localhost ~ # cat /etc/os-release
NAME="Flatcar Container Linux by Kinvolk"
ID=flatcar
ID_LIKE=coreos
VERSION=4054.0.0+nightly-20240806-2100
VERSION_ID=4054.0.0
BUILD_ID=nightly-20240806-2100
SYSEXT_LEVEL=1.0
PRETTY_NAME="Flatcar Container Linux by Kinvolk 4054.0.0+nightly-20240806-2100 (Oklo)"
ANSI_COLOR="38;5;75"
HOME_URL="https://flatcar.org/"
BUG_REPORT_URL="https://issues.flatcar.org"
FLATCAR_BOARD="amd64-usr"
CPE_NAME="cpe:2.3:o:flatcar-linux:flatcar_linux:4054.0.0+nightly-20240806-2100:*:*:*:*:*:*:*"

How I managed to obtain those results -> added a btrfs fi defrag in the workflow. Still puzzled on what is happening and if it is an issue in the Linux kernel or btrfs-progs.

Actual command used in the disk_util: Sudo(['btrfs', 'fi', 'defrag', '-r', '-v', options.disk_image]).

ader1990 commented 3 months ago

Made some progress and there might be a way to solve the problem, will make a PR with it. It seems that the only way to deallocate the size is to shrink and increase the filesystem size.

btrfs filesystem resize -500m /tmp/btrfs-mount
btrfs filesystem resize +500m /tmp/btrfs-mount

Flatcar Results:

root@localhost ~ # df -h /usr
Filesystem       Size  Used Avail Use% Mounted on
/dev/mapper/usr 1016M  468M  443M  52% /usr
root@localhost ~ # btrfs fi usage /usr
Overall:
    Device size:                1015.99MiB
    Device allocated:            572.00MiB
    Device unallocated:          443.99MiB
    Device missing:                  0.00B
    Device slack:                    0.00B
    Used:                        465.46MiB
    Free (estimated):            544.02MiB      (min: 544.02MiB)
    Free (statfs, df):           442.94MiB
    Data ratio:                       1.00
    Metadata ratio:                   1.00
    Global reserve:                2.52MiB      (used: 0.00B)
    Multiple profiles:                  no

Data+Metadata,single: Size:568.00MiB, Used:465.46MiB (81.95%)
   /dev/mapper/usr       568.00MiB

System,single: Size:4.00MiB, Used:4.00KiB (0.10%)
   /dev/mapper/usr         4.00MiB

Unallocated:
   /dev/mapper/usr       443.99MiB
ader1990 commented 3 months ago

Came up with a script to get the closest reproduction

#!/bin/bash

set -xe

LOOP=/dev/loop15
mkdir /tmp/btrfs-mount || true
umount /tmp/btrfs-mount  || true
losetup -d $LOOP || true

# create a loopback file of ~2GB
dd of=test.loop if=/dev/zero bs=1MB count=2048
losetup $LOOP test.loop
# use the exact values from Flatcar layout
mkfs.btrfs --mixed -m single -d single --byte-count 1065345024 --label USR-A $LOOP

# mount the btrfs partition
mount -o relatime,seclabel,space_cache=v2,subvolid=5,subvol=/ $LOOP /tmp/btrfs-mount

btrfs fi usage /tmp/btrfs-mount

# set the zstd compression
btrfs property set /tmp/btrfs-mount compression zstd

# write a ~690MB file
dd if=/dev/random of=/tmp/btrfs-mount/test_file bs=1KB count=682490 && sync
# replace the ~690MB file with a ~459MB file
dd if=/dev/random of=/tmp/btrfs-mount/test_file bs=1KB count=459490 && sync

# Allocated value is really high
btrfs fi usage /tmp/btrfs-mount

# decrease the filesystem to more than it can actually do
btrfs filesystem resize -500m /tmp/btrfs-mount | true

# Allocated value is got reset to a low value
btrfs fi usage /tmp/btrfs-mount

Output:

# Initial clean fs

Overall:
    Device size:                1015.99MiB
    Device allocated:             12.00MiB
    Device unallocated:         1003.99MiB
    Device missing:                  0.00B
    Used:                         36.00KiB
    Free (estimated):           1010.59MiB      (min: 1010.59MiB)
    Free (statfs, df):          1010.91MiB
    Data ratio:                       1.00
    Metadata ratio:                   1.00
    Global reserve:                1.38MiB      (used: 0.00B)
    Multiple profiles:                  no

# Before resize

Overall:
    Device size:                1015.99MiB
    Device allocated:           1014.94MiB
    Device unallocated:            1.05MiB
    Device missing:                  0.00B
    Used:                        438.72MiB
    Free (estimated):            570.84MiB      (min: 570.84MiB)
    Free (statfs, df):           572.22MiB
    Data ratio:                       1.00
    Metadata ratio:                   1.00
    Global reserve:                1.38MiB      (used: 0.00B)
    Multiple profiles:                  no

# After failed resize
ERROR: unable to resize '/tmp/btrfs-mount': No space left on device
Overall:
    Device size:                1015.99MiB
    Device allocated:            572.00MiB
    Device unallocated:          443.99MiB
    Device missing:                  0.00B
    Used:                        438.73MiB
    Free (estimated):            571.89MiB      (min: 571.89MiB)
    Free (statfs, df):           572.21MiB
    Data ratio:                       1.00
    Metadata ratio:                   1.00
    Global reserve:                1.38MiB      (used: 0.00B)
    Multiple profiles:                  no
ader1990 commented 3 months ago

Opened an issue upstream: https://bugzilla.kernel.org/show_bug.cgi?id=219167