Open eharris opened 11 months ago
Is it possible you are quoting the directory path used in --compare-dest=
? I only just today resolved a similar issue as OP and resolved by removing quotes around explicit path (which was a variable, but also didn't work as a string)....which IMO is a bug.
--compare-dest="$compPath"
--compare-dest=$compPath
It might be helpful if you shared the rsync command that produces the outcome you're reporting.
Here's a super simplified case to reproduce:
> mkdir test
> mkdir test/nochange
> cp -a test comp
> rsync -av --delete --compare-dest=`pwd`/comp/ test/ dest/
sending incremental file list
created directory dest
sent 91 bytes received 40 bytes 262.00 bytes/sec
total size is 0 speedup is 0.00
> ls dest/
nochange
As you can see, the nochange
directory was created/copied, even though it is identical (due to the cp -a
) in the comp/
directory.
Interestingly, rsync
does not report it is creating it even though -v
is active.
(the pwd
was necessary to make the --compare-dest
parameter absolute)
Try it with --omit-dir-times
@tmknight no change, as would be expected from the example I gave previously.
I made some assumptions about your test. Try this:
mkdir -p src/test dst cmp echo this is a test > src/test0.txt echo this is a test > src/test/test.txt cp -a src/test cmp/ rsync -a --compare-dest=`pwd`/cmp/ src/ dst/
It is known that the directories are created whilst rsync traverses
@tmknight I don't understand what the point of your test case is, as it doesn't test the case that is the problem I'm trying to get addressed, which is that directories that have not changed (and have no descendants that have changed) are being copied/created in the destination even though they already exist with the exact same metadata in the --compare-dest
target.
Your assertion that "it is known" to behave this way of course makes sense when --compare-dest
is not in effect, since it should be making the destination identical to the source. The point of this ticket is to address the problem that it does NOT make sense to copy/create empty directories in the destination when no leaves below them have any changes (files OR directories) that are not already present in the --compare-dest
target(s). Yes, this may make the traversal a bit more complicated, since ancestor-directory creation will need to be delayed until a descendant difference is found, but that seems like it should be a solvable problem.
It also makes no sense that the creation of those empty and unnecessary directories is not reflected in the output when -v
is in effect.
I'm trying to use
--compare-dest
to create an incremental backup that only contains changed files and directories, suitable for rsync'ing back on top of an older full backup to bring it up to date.The problem I'm experiencing is that rsync with the
--compare-dest
option appears to copy ALL directories, even ones that have no changed contents.For example, on a source directory of about 1 million items (129k dirs and 880k files):
Using
rsync
with--compare-dest
results in a destination with 129k dirs and 135k files (including 106k dirs that have no changes) Usingrsync
with--compare-dest -m
reduces the copied dirs by about 9k, but still has 97k dirs that contain no changes. The actual number of directories that contain changes is only 23k. I have verified this using--itemize-changes
(and filling in the extra dirs that are not reported as changed even though they do contain changes on a deeper branch).To me, this behavior of
--compare-dest
seems wrong. Why is it preserving all the directories including ones that contain no changes? And why does the use of-m
(which is undesirable since it may "lose" directories that actually have changed, such as a different mtime) still preserve so many unchanged directories that should be empty (and would be if other unchanged sub-directories had not also been improperly copied)?.(Side note: the cleanup that
-m
should do but doesn't can be performed by a subsequentfind dir/ -depth -type d -print0 | xargs -0 -- rmdir --ignore-fail-on-non-empty
, however this results in directories that contained empty dirs that were cleaned up having the wrong mtime)In an attempt to work around this, I have written a python script to try to get rid of all these unnecessary and unchanged directories by processing the output of
--itemize-changes
, and then giving that list of files/dirs to rsync as an explicit--include-from
filter list.The problem with this approach is that
rsync
gets massively slower and becomes cpu-bound when given a full filter list of items to include. In my testing with the same sources above, usingrsync --compare-dest
onto an already populated destination (fully cached and quiescent system) results in a run that takes less than 90 seconds. With the same conditions but using a--include-from
filter list that includes 158k rules/items, the same run takes over 30 minutes, over 20 times slower, even though the destination contains over 100k fewer items (all directories).I think that
--compare-dest
needs to be fixed to NOT copy directories that do not contain any changes at the current or any deeper level.This is using rsync version 3.2.7 on Debian 11.