ignore-existing with hardlinks

RsyncProject / rsync

An open source utility that provides fast incremental file transfer. It also has useful features for backup and restore operations among many other use cases.

https://rsync.samba.org

Other

2.71k stars 327 forks source link

ignore-existing with hardlinks #357

Open ynikitenko opened 2 years ago

ynikitenko commented 2 years ago

I store several versions of my directory in snapshots (called commits), these are just hard links to files in the original directory. I don't want to pull actual changes (if there were any), and I add --ignore-existing to rsync options,

rsync -avH -P --delete-after --ignore-existing --include=/.ys/commits --include=/.ys/logs --exclude=/.ys/* src/ dest/

Unfortunately, in this case all hard links for newer snapshots are broken (and when I drop this option, they are preserved). As I understand, it follows from the fact that --ignore-existing is a transfer rule, so all other (existing) hard links are ignored by rsync, and the snapshot is created anew.

Do I understand this right? Can I preserve all hard links and simultaneously ignore any changes in existing files?

georgalis commented 1 year ago

@ynikitenko while I do not presume your objective, I have implemented systems of timestamp based hardlink snapshots (commits?). Although it requires more storage, I have found duplicating the source data before creating hardlink snapshots is essential, as well as using the --link-dest switch. In https://github.com/georgalis/pub/blob/master/sub/backlink.sh for each 'commit' I create a new timestamp directory and link the last timestamp directory when preserving my src directory tree. I am not following your need for --ignore-existing maybe just don't use --delete and your target will be cumulative, even when files are deleted from the source?

ynikitenko commented 1 year ago

@georgalis sorry for a late reply. In my program --ignore-existing is important for safety. From yarsync README:

If a file gets corrupt, it will not be transferred by default, but when the user chooses to pull --backup, any diverged files will be visible (with their different versions preserved).

For me it is important that I use rsync and don't do these things manually. I think it is more reliable if these details are handled in one place, at the level of rsync. I want to synchronise directories, that is why I both add new files and remove deleted ones.

I don't want my repositories to take more space than needed, that's why I would not duplicate data (unless some files really diverged between commits - but it is impossible to know for hard links; only when these hardlinks are in different systems/repositories).