elemental-lf / benji

Benji Backup: A block based deduplicating backup software for Ceph RBD images, iSCSI targets, image files and block devices
https://benji-backup.me
Other
138 stars 43 forks source link

LVM Backups with hints #59

Open adambmedent opened 4 years ago

adambmedent commented 4 years ago

Our Ceph RBD backups have been going really well and we would love to start using Benji for LVM backups.

However some of our LVM volumes are quite large (6-15TB). This results in Benji having to read the entire LVM snapshot on each backup. Getting about 200MB/s from each Benji backup results in a 10 hour backup of a 6TB volume or 20+ hours on a 15TB volume.

Does anyone know of a way to come up with a "hints" file between two LVM snapshots, similar to the hints file for ceph/rbd? I noticed in the documentation that its "possible" but there wasn't much talk about it.

elemental-lf commented 4 years ago

It is possible to specify hints for any type of backup. But there currently is no program to generate these hints for LVM. Code to do something similar in Ruby is found here, both in a variant for classic snapshots and for thin snapshots. I would be delighted to include a script to generate the hints for LVM snapshots, preferably in Python of course.

adambmedent commented 4 years ago

Appreciate the input, I had a feeling that was the case.

I don't know Ruby or Python so I won't be much help.

elemental-lf commented 4 years ago

I found out that for thin LVM volumes there is a CLI tool which does the metadata parsing and difference calculation: thin_delta. Must have missed it somehow. See https://github.com/tasket/wyng-backup/blob/94d410fc415094c6cf3f526ae52e7abf1eada2fd/wyng#L930 and https://github.com/LINBIT/thin-send-recv/blob/c48c568055c14d1fa893ac110cadb8c1ded32aea/thin_send_recv.c#L156 for usage by other programs. I haven't looked at the output but it is XML formatted. So it shouldn't be too hard to translate that into Benji's hints format.

refi64 commented 3 years ago

So I want to look into this at some point because I'm looking to set up Benji Backup for personal use and also use LVM+XFS, however I do have a question about the required diff format. From what I can tell it's basically:

{
  "offset": "the start offset of the data changed",
  "length": "the length of the data changed in bytes",
  "exists": "not sure?"
}

Would anyone be able to indicate what "exists" means, and whether the starting offsets are relative to the snapshot or current version? If we, say, add data that results in brand-new blocks, or remove entire blocks, what would the JSON objects look like then?

elemental-lf commented 3 years ago

Thank you for looking into this! I think it would by a valuable addition. exists means that the range in question contains data and is not sparse. offset is relative to the snapshot, but I don't understand what you mean by current version. Benji assumes that the backed up block devices can get bigger or smaller at the end only. There is a special case for the last block which Benji will always read if the last block is affected by the size change, i.e. if the old or new end of the device wasn't/isn't on a block boundary (according to the block size used by Benji).

Medusajoe commented 3 months ago

Just gonna lie Benji Allan begay all.this is abuse and Child abuse

Medusajoe commented 3 months ago

Thank you for stealing my money my children money Benji Allan begay go be with Jerry your man