Closed stroucki closed 3 months ago
This is probably because bees breaks metadata sharing while combining extents. So instead of the same metadata in a snapshot, btrfs now has individual metadata pointers to the same extent.
btrfs send does not attempt to make reflinks to any extent with more than 64 references:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/btrfs/send.c#n1553
/*
* Backreference walking (iterate_extent_inodes() below) is currently
* too expensive when an extent has a large number of references, both
* in time spent and used memory. So for now just fallback to write
* operations instead of clone operations when an extent has more than
* a certain amount of references.
*/
Since the time that comment was written, backref walking performance has been improved and some bugs fixed. In kernel 6.2, the limit was raised from 64 to 1024 but this is still too low for your data set. You have extents with 20x more references:
2024-06-28 14:44:01 32671.32673<6> crawl_1145_258: addr 0x6f8e273000 refs 20534 beats previous record 20533
As it says in the kernel commit, the workaround is to run dedupe on the destination of the receive until backref performance improves enough for send to handle extents with 20,000 references.
Thanks for your quick replies. For me, it means that using btrfs send
serialized data to manage backups is inefficient and may only have value when using incremental dumps often.
I don't know if this is related to #270, but I was doing measurements on the size of
btrfs send
outputs that I send to backups. It appears thatbtrfs send
includes only one copy of data from a reflink duplicate, but sends all copies from the files afterbees
has processed the data.