enzingerm / snapborg

Synchronize snapper snapshots to a borg repository
GNU General Public License v3.0
35 stars 6 forks source link

[Feature request] Improve performance by using `snapper diff` #20

Closed profiluefter closed 1 year ago

profiluefter commented 1 year ago

As I understand it this program does not use the fact that the difference between snapshots can be easily computed by btrfs but rather relies on backing up the whole filesystem and borg's deduplication.

Could this be improved using the snapper diff command?

# snapper -c arch-os diff -x "--brief" 80..81
Files /.snapshots/80/snapshot/home/fabian/.cache/dolphin/qmlcache/03481a8285faafcfe50ff21d8e06663c441d51ad.qmlc and /.snapshots/81/snapshot/home/fabian/.cache/dolphin/qmlcache/03481a8285faafcfe50ff21d8e06663c441d51ad.qmlc differ
Files /.snapshots/80/snapshot/home/fabian/.cache/dolphin/qmlcache/072c13ee78fdf882f488b8a9517314853e936f35.qmlc and /.snapshots/81/snapshot/home/fabian/.cache/dolphin/qmlcache/072c13ee78fdf882f488b8a9517314853e936f35.qmlc differ
...

It seems like an easy improvement in theory. I guess the problem would be to tell borg to also reference the files from an older snapshot. Also this requires knowing which snapshots are already backed up but I think that is already contained in the snapshot metadata.

Are there other difficulties with this approach that I didn't see or would this be possible to implement?

Also thanks for creating this tool! I've been using it for a while now and it works great!

enzingerm commented 1 year ago

Thanks, it's nice to see that other people have similar use cases :)

You are right, the whole directory structure will be put into the repo each time, even if most files didn't have changes. At first glance what you suggest seems like a good improvement but I see a few problems there.

I think it's good if one borg backup represents the whole directory structure at a given time. This is also what people who use standalone borg, without snapper, would expect it to be.

profiluefter commented 1 year ago

Hi, thanks for the reply! I thought more about telling borg to duplicate the last snapshot and applying the diffs itself. However I only now checked if there is such API available and it seems like it isn't. While it's probably possible to add this to borg, I think it would be a lot of work.

I definitely agree that the backup should represent the whole directory as everything else would add unnecessary complexity and a risk of data loss if used incorrectly.

Would there be a great performance benefit anyway? borg by itself already doesn't process unmodified files again, which already have been backed up before.

Interesting! Well, that's not happening for me. Each snapshot takes about half an hour for a 100GB partition on an SSD while being just a few MBs in size after deduplication. (just estimates but somewhere in that dimension)

I'll look into fixing this as that seems to be the root cause of my issue/search for better solutions!