Zygo / bees

Best-Effort Extent-Same, a btrfs dedupe agent
GNU General Public License v3.0
630 stars 57 forks source link

Bees is duping already deduped files instead of deduping #225

Closed DirtYiCE closed 2 years ago

DirtYiCE commented 2 years ago

I've started bees on a disk with a few snapshots (6), with 69GB free space out of 500GB, after running it for a few hours, I ended up 38GB free space (and I killed bees). The manual mentions that with snapshots I can expect some 1% size increase, but this is much more. Also, I deduped previously with duperemove, so I didn't really expect a size increase. I made a btrfs fi du -s * in the root when I noticed that something is not right, and a bit later:

     Total   Exclusive  Set shared
  42.91GiB    39.57GiB     2.91GiB
 388.00KiB   276.00KiB    48.00KiB
 379.54MiB   379.54MiB     8.00KiB
 368.17GiB     4.92GiB   311.64GiB x
 397.81GiB    17.43GiB   322.76GiB *
 388.85GiB     2.42GiB   327.83GiB *
 388.65GiB    16.08GiB   315.51GiB *
  24.60GiB   184.09MiB    16.52GiB x
  24.61GiB   208.36MiB    16.53GiB *
  24.83GiB   507.92MiB    16.46GiB *
  24.61GiB   171.37MiB    16.53GiB *

     Total   Exclusive  Set shared
  42.91GiB    39.57GiB     2.91GiB
 388.00KiB   276.00KiB    48.00KiB
 379.54MiB   379.54MiB     8.00KiB
 367.85GiB     5.43GiB   305.71GiB x
 397.48GiB    17.82GiB   316.81GiB *
 388.52GiB     2.83GiB   321.71GiB *
 388.32GiB    19.04GiB   307.18GiB *
  24.60GiB   181.78MiB    16.52GiB x
  24.61GiB   205.79MiB    16.53GiB *
  24.83GiB   505.07MiB    16.46GiB *
  24.61GiB   168.77MiB    16.53GiB *

I removed the filenames but they're the same set of files/dirs. Lines marked with * are read-only snapshots of the dir marked with x above. I only have a partial log as I just started it from a terminal, but it's also full of filenames and hashes which I would rather not post, but here are the stats from the shutdown: stats.txt

Is this normal? If it continues like this I'll run out of space. Using version 0.7. Btrfs mount options: rw,relatime,compress=lzo,ssd,discard=async,space_cache,subvolid=5,subvol=/

kakra commented 2 years ago

https://github.com/Zygo/bees/blob/master/docs/gotchas.md#snapshots

It may help to use btrfs property on the snapshots to remove the read-only flag, then let dedupe it, then make them read-only again. I'm not sure how duperemove handles read-only snapshots but space may only be freed after bees fully processed all snapshots. From then on, it should be okay to create new snapshots as bees will only scan new data and leave old snapshot data alone.

Zygo commented 2 years ago

If you omit the -a (--workaround-btrfs-send) flag, the read-only snapshots will be deduped too.

bees can identify extents that contain both blocks which are duplicate and blocks that are unique; however, btrfs can only deduplicate complete extents, so a workaround is needed to dedupe mixed extents. bees will make a copy of the unique parts of the extent, thus making the entire extent duplicate data, then eliminate the entire original extent by dedupe. This requires temporary space for the copy until every reference to the original extent is scanned and deduped.

If the files have large extents and lots of scattered duplicate blocks, but the data in each extent is mostly unique and most extents are shared, the worst case can result in a near-doubling of the data size, followed by rapid shrinking to the fully deduplicated size. It is possible that this will require more temporary space than is available in the filesystem. This doesn't happen very often (5-10% expansion is most common) but duperemove may set up conditions that result in larger expansion before reduction starts.

The expansion of data size is temporary, while the expansion of metadata size mentioned in the manual for snapshots is permanent (for the lifetime of the snapshot).

Duperemove makes no attempt to deduplicate extents that contain a mix of duplicate and unique data blocks, so it doesn't use temporary space. Duperemove will also ultimately free much less space: if a large extent (up to 128M) contains a single unique byte, duperemove cannot deduplicate it at all on btrfs.

The stats show 86GB of temporary copies were created (tmp_bytes=86232501126).

DirtYiCE commented 2 years ago

@kakra Actually I didn't dedupe the read-only snapshots with duperemove, I deduped the read-write subvolume before making the snapshot. But of course this was slow and inefficient, this is why I stated to look at bees. @Zygo I didn't use that flag (actually my flags were only --thread-count 1 so it won't use all my CPU), so it should dedupe those snapshots (actually the exclusive size of the last 3 snapshots decreased a little bit, so it did something). I don't know what duperemove did, I know that I have a few big VM images with small changes between each snapshot, I saw that bees spent a lot of time deduping them. I'll try to temporarily unset the read-only flags manually and see what it gives.

DirtYiCE commented 2 years ago

I'm not sure setting the snapshots read-write helped much, but in the end I've deleted an old snapshot and now bees finished. I now have 59GB free space, which is still 10GB less than what I started with. I guess my only real option to return my partition to a sane state right now is to delete all snapshots?

kakra commented 2 years ago

This is probably due to duplicated metadata... I'd suggest to simply rotate those snapshots out of existence while you create new ones.

To limit CPU usage, you can also use the loadavg limiter instead of limiting the threads. That works very well for me.

DirtYiCE commented 2 years ago

Okay, I think something is definitely not alright. This is from a different partition (and computer), that had bees running from the beginning

btrfs fi du -s *
     Total   Exclusive  Set shared  Filename
  10.27GiB    10.14GiB   128.68MiB  distfiles
 561.15GiB     9.48GiB   410.28GiB  home
 603.11GiB    26.92GiB   409.17GiB  home_20220525
 561.15GiB     9.48GiB   410.28GiB  home_20220601

I made that last snapshot in the morning, I definitely didn't overwrite 9.48GB of data, I just moved a few files there then removed all of them, the exclusive size should be in the kbytes range. (It has a bad name, it's not actually my home dir.) Actually here it is with kbytes precision

10767808.00KiB  10635472.00KiB  131772.00KiB  distfiles
588413632.00KiB  9943312.00KiB  430208256.00KiB  home
632408576.00KiB  28225716.00KiB  429048832.00KiB  home_20220525
588413632.00KiB  9943304.00KiB  430208256.00KiB  home_20220601

I set the log level to 5 to not flood my disk with logs, bees only produced a single log entry today:

2022-06-01 13:48:10 3422.3856<3> crawl_257: scan_forward 1.629M [0x35587000..0x35728000] fid = 257:1144695 fd = 13 '/run/bees/[...]'
DirtYiCE commented 2 years ago

Looking at the not summarized output from btrfs, it looks like it duplicated a lot of data in disk images that I didn't touch since I've copied them to this partition.

DirtYiCE commented 2 years ago

I guess btrfs fi du is not exactly reliable. If I stop bees, then

# btrfs subvolume snapshot home tmp
Create a snapshot of 'home' in './tmp'
# btrfs fi du -s home tmp
     Total   Exclusive  Set shared  Filename
 561.15GiB     9.48GiB   410.28GiB  home
 561.15GiB     9.48GiB   410.28GiB  tmp

while the actual disk usage reported by df only increased by 32 kbytes. (And it decreses by 32kB if I delete the snapshot). I went back and checked partition where I only used duperemove, and actually even there, after a snapshot I ended up with about 100MB exclusive size, so it's not reliable there either, it was just too small to notice. I guess I should stop looking at btrfs fi du output, as it's inaccurate (or I just don't know how to interpret them).

Back to the original issue, @Zygo would it be possible to add a warning to that gotchas page that if you have snapshots, in the worst case you might need not insignificant amount of temporary disk space? (It only mentions needing to read the data twice and extra metadata size)

Zygo commented 2 years ago

I'd check space usage with compsize or btdu. They can account for things like compression and unreachable blocks.

btrfs fi du is not very good at estimating physical sizes because it only looks at logical reachable blocks. It may have other problems, but ignoring physical sizes is bad enough that the other problems don't matter.

There are some warnings about temporary disk space scattered around in the bees docs (and issues list) but they could certainly be placed close together:

DirtYiCE commented 2 years ago

I've looked at the FS with btdu, and it looks like I have a lot of unreachable extents, even in the new FS, but witht the old partition it's much worse. I have a 22GB disk image with 23GB unreachable extents! Can I do anything about this? btdu suggests rewriting or defrag, but I guess if I do this, bees will just find it and dedupe back. Compsize also shows the problem:

Processed 1 file, 138201 regular extents (302630 refs), 0 inline.
Type       Perc     Disk Usage   Uncompressed Referenced  
TOTAL       90%       41G          46G          21G       
none       100%       37G          37G          11G       
lzo         45%      3.6G         8.1G          10G
kakra commented 2 years ago

I have a 22GB disk image with 23GB unreachable extents! Can I do anything about this?

Sounds like what I saw for my vm images... If you're using qemu, running chattr +m on the images directory and using qcow2 format is the best way to go: It avoids compression, instead qcow2 can use compression (tho, only for existing data). The image files have to be re-created from scratch after changing the containing directory to +m, one way to do this is using the qemu image converter which could compress the data on its way. Another one is to move the images to a subdirectory, then running rsync -av subdirectory/* ., then let bees do its job. If you enable qcow2 compression on the images, the images won't dedup with the previously existing snapshots.

btdu suggests rewriting or defrag, but I guess if I do this, bees will just find it and dedupe back.

Another idea: Do you use autodefrag? If yes, don't. It breaks extents out of snapshots, leaving unreachable extents behind. This can easily double the space usage, especially with vm images. That's why I suggested qcow2 above because it grows the image file in chunks and doesn't rewrite tiny small blocks.

If you're using mysql/mariadb/sqlite, I also suggest using chattr +m for it, and also disable double-write for the innodb/sqlite file. For mysql/mariadb that's innodb_doublewrite = 0 (which is safe because btrfs uses cow), for sqlite you switch to WAL mode (software using that file doesn't need to be aware of it, so you can almost blindly enable it).

The problem with double writes is that old (to be replaced) data is first appended or written to a log, flushed, and then the new data is written in place to the data file (thus breaking extents and leaving unreachable cow data), flushed again, and then the log is truncated/invalidated, or on transaction abort, it will use the log to undo the previous writes. This is an exceptionally bad write pattern for btrfs unless you enable nocow (and even then it's still bad).

Zygo commented 2 years ago

if you have enough space, stop bees, defrag all the copies, and start bees again.

VM images generally accumulate a lot of unreachable blocks, even without dedupe. Any long write followed by several short writes over the same region of the file will create unreachable space for the original long write. Periodic defrag helps a little, but there isn't a good solution for this on btrfs right now.

DirtYiCE commented 2 years ago

@kakra This is a virtualbox image, so that wouldn't help much. And with qemu, I use lvm thin when performance becomes a problem, it handles this kind of workload a bit better. I don't really store dbs on this FS (maybe some program stores ansqlite db, but they aren't big enough to cause a problem).

@Zygo I'm starting to lose my mind. I've found a smaller file that's only 6.2 GB and only had about 2.2GB unreachable extents. Compsize 1 file:

Processed 1 file, 74236 regular extents (122994 refs), 0 inline.
Type       Perc     Disk Usage   Uncompressed Referenced
TOTAL       91%       21G          23G         7.0G
none       100%       19G          19G         4.0G
lzo         36%      1.1G         3.2G         2.9G

Compsize with snapshots:

Processed 3 files, 80928 regular extents (377142 refs), 0 inline.
Type       Perc     Disk Usage   Uncompressed Referenced
TOTAL       91%       22G          25G          21G
none       100%       21G          21G          12G
lzo         36%      1.2G         3.5G         9.0G

They are the same files (they have the same sha256sum), so bees should dedupe them perfectly. I have 65GB free. I defrag the files, afterwards I end up with:

Processed 1 file, 49141 regular extents (49141 refs), 0 inline.
Type       Perc     Disk Usage   Uncompressed Referenced  
TOTAL       68%      4.8G         7.0G         7.0G       
none       100%      4.0G         4.0G         4.0G       
lzo         25%      775M         3.0G         3.0G 
Processed 3 files, 148323 regular extents (148323 refs), 0 inline.
Type       Perc     Disk Usage   Uncompressed Referenced  
TOTAL       67%       14G          21G          21G       
none       100%       12G          12G          12G       
lzo         25%      2.3G         9.1G         9.1G 

So far so good, I have 54GB free. I restart bees and wait for it to stop deduping.

It reports deduping 12.9GB:

2022-06-02 03:21:13 6246.6282<6> progress_report:       dedup_bytes=13897651745 dedup_copy=5690704736 dedup_hit=197294 dedup_ms=222849 dedup_prealloc_bytes=4194304 dedup_prealloc_hit=1 dedup_try=197294

So I should have about 67GB free? Wrong. I only have 58GB free, compsize:

Processed 1 file, 67959 regular extents (112039 refs), 0 inline.
Type       Perc     Disk Usage   Uncompressed Referenced  
TOTAL       92%       22G          24G         6.1G       
none       100%       21G          21G         3.5G       
lzo         36%      1.0G         2.8G         2.5G
Processed 3 files, 76976 regular extents (337806 refs), 0 inline.
Type       Perc     Disk Usage   Uncompressed Referenced  
TOTAL       92%       26G          28G          18G       
none       100%       25G          25G          10G       
lzo         35%      1.1G         3.3G         7.7G

The disk usage according to compsize is higher than what I had before dedupe. btdu shows 2.2GB unreachable for this file again. 3.9GB representive size for the non-snapshot, 2.5GB and 2.3GB for the two snapshots. That's 8.7 if I sum them up, not 26 what compsize reports. It looks like it managed to deduplicate some dll files from the VM image with the normal copies of those files I had lying around. Should I defragment them too? But they should be a few hundred MB max. In the log I have 5083 instances of WORKAROUND: abandoned toxic match for hash and 3758 instances of exception (ignored): exception type std::runtime_error: FIXME: bailing out here, need to fix this further up the call stack. I can also find lines like scan: 23.086M 0xcc8a000 [dddddddddddddd++++++++++++++++++++++++++++++++dddddddddddddddd+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ddddddddddddddddddddddddddddd++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++dddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd++++++++++++++++++++++++++++++++++++dddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++dddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd+++++++++++dd++dddddddddddddddddddddd+++++++++++dddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++dddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd++++++++++++++++++++++++++++++++++++++++++dddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++dddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd] 0xe3a0000 I guess + means that it didn't dedupe that block, and since it ended up as a mixed chunk, it had to copy it.. But here it writes In other words, bees only needs to find one block in common between two extents in order to be able to dedupe the entire extents. so I don't know what is going on. Maybe I should have stayed with ext4 many years ago ☹️

Zygo commented 2 years ago

If you dedupe a 100M file with a 4K file, then point compsize at the 4K file only, compsize will report that the 4K file occupies 100M of space (because it does--it's referencing 4K out of a 100M extent, so it technically owns all 100M of the space even though it's using only 4K). That can result in some wild size values if you're running it on only a few files from a large duplicate set. It can also become hard to delete things, as these references get embedded in files anywhere there's a duplicate block, regardless of the size of the files. bees can make thousands of references to a single block of an extent, making it nearly impossible to remove any other blocks in that extent from the filesystem.

It looks like bees is finding all the other larger extents on the filesystem and replacing the defragmented files with surviving parts of the older fragmented ones. That's more or less expected--the current algorithm will replace duplicates immediately as it finds them with the first copy block stored in the hash table, which might be in an arbitrary and suboptimal order.

If you can defrag every file referencing duplicate blocks at the same time, then that should get rid of all the references to the old extents, but anything less will allow bees to find the old extents and attach them to to new files. Unfortunately there isn't an automated tool to do this on btrfs yet. You can get close by running btdu and looking at its list of files sharing extents, and defrag all of those.

On VM image files, 50% unreachable seems to be about average for btrfs, and it looks like you're getting about that (4.8G file after compression, 2.2G unreachable).

Toxic matches and bailouts are fairly normal. bees monitors its own CPU usage (and btrfs underneath) and will avoid cases where the cost of adding new references to an extent exceeds any reasonable benefit. When it starts taking significant fractions of a seconds to walk the extent ref tree for a single extent, it's time to stop adding new reflinks to that extent. It will happen for very common block values where there would otherwise be millions of refs to a single block. This can be a few percent of the total filesystem size--common block values are extremely common.

bees doesn't try to dedupe 100% of all possible duplicates, it dedupes a sample of duplicate blocks and tries to average 95% of possible duplicates. So identical big files don't get entirely deduped, but small files embedded in VM images do. The line of +'s and d's is a 23M extent being carved up into pieces to be deduped with other files separately.

kakra commented 2 years ago

How big is the hash file anyways?

DirtYiCE commented 2 years ago

128MB

DirtYiCE commented 2 years ago

I guess what worked for me in the end is:

  1. get rid of every but one snapshot to speed things up
  2. stop bees
  3. defrag a bunch of space hog big files with many unreachable extents
  4. dedupe them with duperemove -rAhd --lookup-extents=no
  5. start bees

Without duperemove, bees only deduped a fraction of these files. After duperemove I had 110GB free space, after bees finish I still have 110GB free, even though during the dedupe free space temporarily increased to 112GB. I guess something is still too aggressive in bees, maybe this mixed extent dedupe workaround sometimes does more harm than good. After defrag the unreachable blocks from my disk images completely disappeared, but after bees they're back, even though I now "only" have 9GB and not 30-something (of 39GB total disk image). (I guess this is because btdu arbitrarily assigns shared extents to files with shortest strlen(path) when in reality these unreachable blocks are a combination of the 3 metric fucktons of tiny files it deduped the disk image with.) Would having a bigger hash file help? I don't mind doing workarounds like this once in a blue moon, but if this crops up every time I touch the disk image, I'll have to switch back.

kakra commented 2 years ago

Well, I'm working with a 1 GB hash file on a 32 GB system. But my images also show unreachable space:

/var/lib/libvirt/images # compsize *
Processed 6 files, 698807 regular extents (1472021 refs), 0 inline.
Type       Perc     Disk Usage   Uncompressed Referenced
TOTAL       95%      241G         252G         214G
none       100%      236G         236G         196G
lzo         69%      496K         716K         420K
zstd        31%      5.1G          16G          17G

This is probably unavoidable as @Zygo points out (and also happens during use not just through bees), it doesn't matter which deduper you used. Of course, you could use the images as nocow files but this won't avoid the problem creeping up again after snapshots but at least bees won't touch them in this case (because it currently has no way to compare nocow extents to nocow extents only, and thus just avoids them).

On another set of files it looks differently:

# compsize /home/kakra/SteamApps/
Processed 739229 files, 3123630 regular extents (4784280 refs), 205542 inline.
Type       Perc     Disk Usage   Uncompressed Referenced
TOTAL       95%      2.4T         2.6T         2.5T
none       100%      2.4T         2.4T         2.3T
lzo         75%       96K         128K         120K
zstd        33%       63G         190G         211G

It still has a slack of 100 GB but compression outperforms that in the totals, it seems.

Here's an example from one of our web servers which share a lot of files due to identical frameworks used:

server /mnt/btrfs-pool/webs # compsize *
Processed 3255724 files, 1817451 regular extents (3916652 refs), 1543073 inline.
Type       Perc     Disk Usage   Uncompressed Referenced
TOTAL       85%      196G         228G         247G
none       100%      185G         185G         190G
zstd        24%       10G          42G          57G

And the containers used to run the websites (which are clones of each other with additional package installed):

server /var/lib/machines # compsize *
Processed 5241181 files, 1290910 regular extents (3719564 refs), 3549461 inline.
Type       Perc     Disk Usage   Uncompressed Referenced
TOTAL       55%       53G          96G         195G
none       100%       38G          38G          89G
zstd        26%       15G          58G         106G

I used * to not include the hidden snapshot directories. All systems use hourly snapshots through snapper with retention thinning.

DirtYiCE commented 2 years ago

I understand that, but I haven't modified those disk images since the snapshot was made, and I defragged them, so I didn't expect to have so much unreachable space (it was zero after defrag+duperemove, only as bees found some random files to deduplicate with it started to increase).

Zygo commented 2 years ago

When bees overwrites a partially matching extent, it will overwrite all references to the non-matching blocks at the same time; however, this only works if all of the referenced blocks are visible in the file where they were overwritten, because bees is only looking at references to specific blocks that are referenced from the file being scanned. If there are other blocks in the extent referenced by other files, they may escape the overwrite function, and may not be detected later unless there is another partially duplicate extent. I don't have an example of this off the top of my head (it needs several extents with specific contents in each one), but there's nothing in current bees versions that explicitly prevents it, so it could be happening randomly in some data sets.

If that is happening, it's probably not fixable without moving to the extent-based dedupe engine. The extent-based dedupe engine disposes of all references to an extent at the same time, regardless of which files they belong to, so it cannot be confused in this way. This issue will probably disappear once that's up and running.

DirtYiCE commented 2 years ago

Alright, thanks for the clarification. I guess that's what I'll have to live with for the time being.