Zygo / bees

Best-Effort Extent-Same, a btrfs dedupe agent
GNU General Public License v3.0
624 stars 56 forks source link

btrfs send size #284

Open stroucki opened 1 week ago

stroucki commented 1 week ago

I don't know if this is related to #270, but I was doing measurements on the size of btrfs send outputs that I send to backups. It appears that btrfs send includes only one copy of data from a reflink duplicate, but sends all copies from the files after bees has processed the data.

# base state
root@heike:/tmp/btrfs/test# df -m .
Filesystem     1M-blocks   Used Available Use% Mounted on
-                 342016 236280    104111  70% /tmp/btrfs/test

# create snapshot
root@heike:/tmp/btrfs/test# btrfs subvol snap -r . ../test.snap
Create a readonly snapshot of '.' in '../test.snap'

# measure size of btrfs send (proto 2 doesn't seem to have any benefit)
root@heike:/tmp/btrfs/test# btrfs send --proto=1 ../test.snap |wc -c
At subvol ../test.snap
208
root@heike:/tmp/btrfs/test# btrfs send --proto=2 ../test.snap |wc -c
At subvol ../test.snap
224

# create a large file that compresses well and is not nulls
root@heike:/tmp/btrfs/test# perl -e 'print "1" x (1024*1024*1024)' >ones

# create new snapshot and measure
root@heike:/tmp/btrfs/test# btrfs subvol del ../test.snap
Delete subvolume (no-commit): '/tmp/btrfs/test.snap'
root@heike:/tmp/btrfs/test# btrfs subvol snap -r . ../test.snap
Create a readonly snapshot of '.' in '../test.snap'
root@heike:/tmp/btrfs/test# btrfs send --proto=1 ../test.snap |wc -c
At subvol ../test.snap
1074577892
root@heike:/tmp/btrfs/test# btrfs send --proto=2 ../test.snap|wc -c
At subvol ../test.snap
1074004500

# ok, expect that since btrfs send uncompresses data

# create reflinked copy of that file
root@heike:/tmp/btrfs/test# cp --reflink=always -a ones ones.copy

# create new snapshot and measure
root@heike:/tmp/btrfs/test# btrfs subvol del ../test.snap
Delete subvolume (no-commit): '/tmp/btrfs/test.snap'
root@heike:/tmp/btrfs/test# btrfs subvol snap -r . ../test.snap
Create a readonly snapshot of '.' in '../test.snap'
root@heike:/tmp/btrfs/test# btrfs send --proto=1 ../test.snap |wc -c
At subvol ../test.snap
1075389196
root@heike:/tmp/btrfs/test# btrfs send --proto=2 ../test.snap|wc -c
At subvol ../test.snap
1074815836

# ok, the second file obviously shares data in the snapshot

# actual disk space used now
root@heike:/tmp/btrfs/test# df -m .
Filesystem     1M-blocks   Used Available Use% Mounted on
-                 342016 236315    104079  70% /tmp/btrfs/test

# start bees run in background
# get lots of messages like
# 2024-06-28 14:43:32 32671.32673<7> crawl_1145_258: exception (ignored): except
ion type std::runtime_error: FIXME: too many duplicate candidates, bailing out here
# 2024-06-28 14:44:01 32671.32673<6> crawl_1145_258: addr 0x6f8e273000 refs 20534 beats previous record 20533

# to not get bees distracted with the existing snapshot
root@heike:/tmp/btrfs/test# btrfs subvol del ../test.snap
Delete subvolume (no-commit): '/tmp/btrfs/test.snap'

# bees has finished now, so some data is freed
Filesystem     1M-blocks   Used Available Use% Mounted on
-                 342016 236288    104105  70% /tmp/btrfs/test

# create new snapshot and measure
root@heike:/tmp/btrfs/test# btrfs subvol snap -r . ../test.snap
Create a readonly snapshot of '.' in '../test.snap'
root@heike:/tmp/btrfs/test# btrfs send --proto=1 ../test.snap |wc -c
At subvol ../test.snap
2149278476
root@heike:/tmp/btrfs/test# btrfs send --proto=2 ../test.snap|wc -c
At subvol ../test.snap
2148049756

# why is the volume size now doubled?
kakra commented 6 days ago

This is probably because bees breaks metadata sharing while combining extents. So instead of the same metadata in a snapshot, btrfs now has individual metadata pointers to the same extent.

Zygo commented 6 days ago

btrfs send does not attempt to make reflinks to any extent with more than 64 references:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/btrfs/send.c#n1553

    /*
     * Backreference walking (iterate_extent_inodes() below) is currently
     * too expensive when an extent has a large number of references, both
     * in time spent and used memory. So for now just fallback to write
     * operations instead of clone operations when an extent has more than
     * a certain amount of references.
     */

Since the time that comment was written, backref walking performance has been improved and some bugs fixed. In kernel 6.2, the limit was raised from 64 to 1024 but this is still too low for your data set. You have extents with 20x more references:

2024-06-28 14:44:01 32671.32673<6> crawl_1145_258: addr 0x6f8e273000 refs 20534 beats previous record 20533

As it says in the kernel commit, the workaround is to run dedupe on the destination of the receive until backref performance improves enough for send to handle extents with 20,000 references.