Zygo / bees

Best-Effort Extent-Same, a btrfs dedupe agent
GNU General Public License v3.0
630 stars 57 forks source link

Deduplicating read-only snapshots with -a? #223

Closed mischaelschill closed 2 years ago

mischaelschill commented 2 years ago

I started running bees on a fs with read-only snapshots, and used -a so it shouldn't touch them. However, I still see the following lines in the status file:

tid 84252: crawl_256: Extending matching range: BeesRangePair: 228K src[0xacb0000..0xace9000] dst[0x20ed496000..0x20ed4cf000] src = 33 /run/bees/mnt/a3eb1c85-3ee3-4555-ba72-73e5e51a59e4/#426 (deleted) dst = 19 /run/bees/mnt/a3eb1c85-3ee3-4555-ba72-73e5e51a59e4/.snapshots/164/snapshot/system.img

Why?

kakra commented 2 years ago

I think this only means that it relocates a shareable extent from your read/write subvolumes to use an extent from your read-only subvolumes. It doesn't actually touch the snapshots, i.e. it doesn't write to them. But a dedup candidate extent happens to be also part of a snapshot. Actually, this is probably some cosmetic issue: bees doesn't actually walk files but extents, and it uses a reverse lookup to make a user friendly filename from an extent number, and that matched a file from a snapshot which is probably identical to the same file in your read/write subvolume.

mischaelschill commented 2 years ago

Thank you for the explanation!

Zygo commented 2 years ago

Something is wrong there. There's a check in BeesContext::dedup which should silently discard any attempt to use a readonly subvol as the dst argument (src is still allowed because we can add as many references to an existing extent as we like without modifying any of its original refs).

The check for readonly is performed only when the subvol is opened and bees will cache FDs for several minutes, so if a subvol is initially read-write and then later made read-only, bees may continue to modify the subvol until the next time the cache is flushed, which is based on the transaction rate of the filesystem.

Was the snapshot received by btrfs receive or similar (i.e. created read-write and then later made read-only)? Was the snapshot recently created?

kakra commented 2 years ago

The path name looks like a snapper snapshot at least...