Btrfs <-> Btrfs transfer optimization and multiple sources

AmesCornish / buttersink

Buttersink is like rsync for btrfs snapshots

GNU General Public License v3.0

195 stars 18 forks source link

Btrfs <-> Btrfs transfer optimization and multiple sources #58

Closed eugene-bright closed 6 years ago

eugene-bright commented 6 years ago

I'm personally interested in btrfs <-> btrfs transferring scenario. As btrfs-send has an option -c <clone-src> that allow unlimited number of snapshots to be used as data source for CoW. This option provide much more opportunities in diff size optimization and simplify algorithms as FS does the job itself as I see. Buttersink does not support concept of multiple sources for now and probably never will. But I write it here for further considerations.

AmesCornish commented 6 years ago

I agree. Multiple sources would make btrfs -> s3 too complicated, but could be a useful optimization for btrfs -> btrfs.

eugene-bright commented 6 years ago

@AmesCornish, is S3 support crucial for you? For now buttersink is indispensable tool as it only works by direct ioctl manipulations (and thus may do things right). Due to buttersink's design limitations and growing technical debt I'm thinking about starting own project from the scratch. Currently I do not have time to such a drastic action. But I would like to know what were your core motivation and crucial features list when you started to work on buttersink.

AmesCornish commented 6 years ago

For me S3 is #1, ssh syncs are #2, and the "original UUID" fix I wish were handled in btrfs itself. What are the key "design limitations" for you?

eugene-bright commented 6 years ago

I'm not so smart to dig into messy code. So defining clear interfaces along with inverse of control and type annotations are must have for me. The weakest part of buttersink is serialization. It took me a day to pass one new attribute over SSH. It's also hard to debug server started over SSH. Now I currently can debug only client side part under ipdb. SSH is not a target for me at all as I have full control over my backup server installation. So I would like to use state-of-the-art well defined RPC protocols. Btrfs <-> btrfs is number one for me. So I do not need 75% of the current code base, especially snapshot base optimizations.

eugene-bright commented 6 years ago

Could you tell me more about UUID fixing? I've read the note from Butter.py but still can't grasp it fully. What does happen if patching is not performed?

AmesCornish commented 6 years ago

When you use btrfs "send", it can avoid sending duplicate data only if the data is already present, in both the source and the destination, in a snapshot with the exact same original UUID. If the UUIDs are different in the source and destination referenced snapshots then the data chunks are resent. Specifically, if you try to receive a btrfs send and you don't have the exact required UUIDs present in the destination it will fail.

AmesCornish commented 6 years ago

btrfs <--> btrfs should be improved with d25e71e