Zygo / bees

Best-Effort Extent-Same, a btrfs dedupe agent
GNU General Public License v3.0
665 stars 55 forks source link

btrfs send fails after non-simultaneous use of bees #115

Open automorphism88 opened 5 years ago

automorphism88 commented 5 years ago

After running bees on a filesystem containing a parent snapshot, and then trying to do an incremental send from that snapshot, after bees has been run in between, but not simultaneously (bees fully stopped before starting the send), the send fails with the following in dmesg:

[57807.246706] BTRFS error (device dm-2): Send: inconsistent snapshot, found updated extent for inode 8924138 without updated inode item, send root is 4457, parent root is 4425

This is with kernel 5.1.8 and bees 0.6.1.

Zygo commented 5 years ago

OK, so send had two bugs after all. Are you willing to report this to the linux-btrfs mailing list?

automorphism88 commented 5 years ago

OK, so send had two bugs after all. Are you willing to report this to the linux-btrfs mailing list?

I've never posted to the mailing list before but I'm sure I can figure out how. What about the kernel bugzilla? I've posted there before.

Zygo commented 5 years ago

Kernel bugzilla is worth a shot. Best case, it's a simple fix. Worst case, it'll get ignored, and I'll forward it to the mailing list when I get a few spare cycles (possibly after a bit more analysis).

automorphism88 commented 5 years ago

https://bugzilla.kernel.org/show_bug.cgi?id=203933

ghost commented 5 years ago

Does this happen with the other duperemove tool too, or just bees?

Zygo commented 5 years ago

Does this happen with the other duperemove tool too, or just bees?

It's a kernel bug, so it should affect all deduplicators on btrfs.

fdmanana commented 5 years ago

So I tried the following:

$ mkfs.btrfs -f /dev/sdb $ mount /dev/sdb /mnt

$ xfs_io -f -c "pwrite -S 0xab 0 1M" /mnt/foo

$ btrfs subvolume snapshot -r /mnt /mnt/snap1

$ xfs_io -f -c "pwrite -S 0xab 0 1M" /mnt/bar

$ btrfs subvolume snapshot -r /mnt /mnt/snap2

# deduplicate foo into bar, so that both point to the same extent(s) $ xfs_io -c "dedupe /mnt/snap2/foo 0 0 1M" /mnt/snap2/bar

# do the incremental send, see if it fails $ btrfs send -p /mnt/snap1 -f /dev/null /mnt/snap2 $ echo $? 0

dmesg/syslog is also clean. Applying send streams to a filesystem also shows both files are there and with correct content.

Can you provide more details on how the deduplication is being done exactly? Full, just same extents, order, etc. Also, any special mount options?

Thanks.

kakra commented 5 years ago

I'm not sure how xfs_io works. But bees does not dedupe extents directly, it instead rewrites new extents based on the hash table into a temporary file, then dedupes all discovered extents to this new temporary file (which is then deleted again). This may be quite different from what xfs_io does and could probably very well affect what btrfs-send sees as the result.

ghost commented 5 years ago

Where is this temporary file created? It does seem plausible that this could be the cause for the send error. Or rather, is send correct that the ro snapshots did in fact change?

kakra commented 5 years ago

@Gatak More or less "nowhere"... I think bees creates it in the root subvolume, acquires an open file descriptor of it, then immediately deletes it, only then it's writing file data to the FD. So it's writing to an anonymous, btrfs-backed file. If your reasoning behind the question is if the file is created in the RO snapshot: No, it isn't.

@Zygo may know more but probably deduping extents from the RO snapshot to this newly created file removes the original extents and thus "modifies" the snapshot (not the file contents but the extent structure). But I think exactly that should've been fixed in bees already by ignoring RO snapshots.

ghost commented 5 years ago

But I think exactly that should've been fixed in bees already by ignoring RO snapshots.

Then space/duplicates taken by RO snapshots is also not considered or reduced, which was the point to start with, wasn't it?

Zygo commented 5 years ago

@Gatak, @kakra: Temporary files are created in the root subvol with O_TMPFILE. They never have names, they are created by the kernel with a zero link count. The root subvol is used for temporaries because it's a part of the filesystem that necessarily exists and is writable, and dedupe doesn't care where the temporary extents live. If bees can dedupe a dst extent out of existence using extents that already exist, bees just does that. If no suitable extent is found (e.g. existing extents contain a mix of duplicate and unique data), bees creates a new extent with the right size and content to use as a src extent.

To the snapshots, the bees temporary files are just files in another subvolume with a higher transid. They should have the same effect as the second pwrite.

Zygo commented 5 years ago

@fdmanana That script looks right, in the sense that bees does something similar, but I haven't reproduced this myself, and I don't think it's quite that simple. We probably need to set up a larger, more realistic test (e.g. copy /usr into a subvol instead of just one extent), run bees and send until it fails, then try to figure out what happened to the filesystem when the error is detected.

fdmanana commented 5 years ago

So I managed to find out how it happens exactly, it's not that trivial to reproduce and happens sort of randomly, no wonder why I have not ever hit it or had other user reports before. I'll send a fix soon (this week) to the btrfs mailing list.

No need to use bees for triggering this. Thanks.

fdmanana commented 5 years ago

Sent:

https://lore.kernel.org/linux-btrfs/20190717122339.4926-1-fdmanana@kernel.org/T/#u

https://lore.kernel.org/linux-btrfs/20190717122439.28327-1-fdmanana@kernel.org/T/#u

Zygo commented 5 years ago

This issue started happening back in 2015 when deduplication was updated to not update the inode's ctime and mtime and update only the iversion.

That sounds familiar: https://www.spinics.net/lists/linux-btrfs/msg45113.html Oops, my bad? ;)

kdave commented 5 years ago

Kernel fix queued for 5.3 and will appear in the stable trees.

HaleTom commented 5 years ago

@kdave could you let us know when it should have appeared in stable trees?

I'm waiting on this before trying bees to ensure btrbk backups continue to function.

Zygo commented 5 years ago

According to

for x in $(git log --date-order --all --grep=b4f9a1a8 --format=%h); do echo "$x: $(git tag --contains "$x")"; done

it is in 5.2.7, 4.19.65, 4.14.137, and 4.9.188.

DiagonalArg commented 3 years ago

@HaleTom - I'm just diving into btrbk and bees. I'm seeing this issue is still open, so I'm wondering if it worked for you?