Open tridge opened 7 months ago
note that --fuzzy is a partial handling of this issue, key problem is it only looks in the same directory. Extending this to be able to look across the whole destination tree, perhaps with sort by file size for faster matching, would make it more useful
I would like to propose a --fuzzy2 option in rsync, which also considers the entire tree.
I have a prototype written in awk, which I currently run before the actual rsync run.
It's not perfect, could be done better probably, but it works and saves me a lot when doing file/folder renames. It will only move files, not folders. Awk delta calculation for 10K Files in source * 10K Files in target ~ 0,3s
Creating the required folder tree in target must be done at first, e.g. with rsync :-)
rsync -a --include='*/' --exclude='*' "${sourepath}" "${targetpath}"
AWK: source and target infos must be put into an array:
array format:
Files_last_modification_time _ filesize filepath
e.g.
1718541359.8524070000t-147 /home/claus/.bashrc
1717861293.8939940000t-57 /home/claus/.bash_profile
The first column is the key-id, here date + size. 2nd column has the file path. Date+size key must be replaced by hash, when using --checksum.
populate the array:
aa source array
bb target array
awk main loop
(x == key-id)
( aa[x] == file path source)
( bb[x] == file path target)
for (x in aa) {
if (x in bb) {
if (aa[x]!=bb[x]) { print "mv --no-clobber targetpath""bb[x]" "targetpath""aa[x]" }
delete bb[x];
}
}
After reviewing and executing the proposed mv commands, I run the real rsync, which cleans up remaining things.
This is a copy of the old bugzilla issue from here: https://bugzilla.samba.org/show_bug.cgi?id=2294 this certainly would be a big win in many cases. It is complicated by the incremental method of calculating the hashes (we don't hash the full file list before starting transfers).