collective / collective.recipe.backup

bin/backup script: sensible defaults around bin/repozo
https://pypi.org/project/collective.recipe.backup/
5 stars 7 forks source link

Use hard-links to create the first blobstorage backup #57

Closed ale-rt closed 2 years ago

ale-rt commented 3 years ago

After the first copy we save tons of space by creating hard-links for the next copies: https://github.com/collective/collective.recipe.backup/blob/3ea54caf61dc718bb88bc681fa57ef5053883f97/src/collective/recipe/backup/copyblobs.py#L882 It would be nice to add an option to to also have hard-links for the first copy.

ale-rt commented 3 years ago

I am temporarily testing this branch https://github.com/collective/collective.recipe.backup/tree/57.feature. If I find time I will try to also switch to GHA and to add a test.

mauritsvanrees commented 3 years ago

@ale-rt Do you want to pursue this further?

ale-rt commented 2 years ago

Hi and sorry for the delayed answer, I did forgot about this one until today it has become interesting to me again.

As of today I actually "bootstrap" the first copy with:

cp -rfl $SOURCE/blobstorage/* $TARGET/blobstorage/

This is something I would like to finalize at a certain point.

If you do not mind I would keep it open.

mauritsvanrees commented 2 years ago

Are you using the pre_command option for that?

One thing to watch out for if we put this in the package as default behavior, is that the backup might be on a different volume, maybe some NFS or other network drive. I am not sure if hard links are then silently ignored and you get standard copies, which would be good, or if the rsync then fails, which would be bad.

ale-rt commented 2 years ago

Nope, I am doing that by hand.

Which of course it is not ideal but it gets the job done. And of course the blobs and the backup are on the same volume. I agree, this should not be the default behavior.

ale-rt commented 2 years ago

I found out that I can achieve something like this

rsync_options = --link-dest=${:blob-storage}

but I need a small patch.

Infact running it with current master I still have duplicated data:

[ale@flo plone.backup]$ ./bin/plonebackup
...
INFO: rsync -a --link-dest=$PLONE/var/blobstorage $PLONE/var/blobstorage var/blobs/blobstorage.2022-10-03-13-09-21
INFO: Creating symlink from latest to blobstorage.2022-10-03-13-09-21

[ale@flo plone.backup]$ du -sh plone/var/blobstorage/ var/blobs/blobstorage.2022-10-03-13-09-21/
393M    plone/var/blobstorage/
393M    var/blobs/blobstorage.2022-10-03-13-09-21/

Changing the command to:

rsync -a --link-dest=$PLONE/var/blobstorage $PLONE/var/blobstorage/ var/blobs/blobstorage.2022-10-03-13-09-21/

notice the slash at the end of the paths, I have:

[ale@flo plone.backup]$ du -sh plone/var/blobstorage/ var/blobs/blobstorage.2022-10-03-13-09-21/
393M    plone/var/blobstorage/
15M     var/blobs/blobstorage.2022-10-03-13-09-21/

which is what I expect.

Unluckily the second copy has the --link-dest parameter doubled, so it then blows up the disk space again:

[ale@flo plone.backup]$ du -sh plone/var/blobstorage/ var/blobs/blobstorage.*
393M    plone/var/blobstorage/
15M     var/blobs/blobstorage.2022-10-03-15-31-14
393M    var/blobs/blobstorage.2022-10-03-15-33-16
mauritsvanrees commented 2 years ago

Released in 5.0.0a1.