Open Haravikk opened 4 years ago
On other file-systems, you may still wish to compress file contents when rsync'ing for backup, copying to a mobile drive etc. You could do this using --copy-cmd and gzip (or xz or similar) to compress destination files, like-so:
Hmm… Let's have a big file (e.x. 4Gb) on the server.
rync server:bigfile .
;
sent 43 bytes received 4,296,015,958 bytes 8,635,208.04 bytes/sec
total size is 4,294,967,296 speedup is 1.00
dd if=/dev/urandom of=bigfile bs=1k count=1 seek=1000000 conv=notrunc
rync server:bigfile .
again. Only changes (and checksums) are transferred.
sent 524,363 bytes received 327,794 bytes 28,886.68 bytes/sec
total size is 4,294,967,296 speedup is 5,040.11
IMO such rsync behavior is incompatible with your proposal.
I'm not sure what you're trying to demonstrate here?
One of my opening lines is:
When provided, instead of copying a file using the default method (delta transfer), rsync will instead use the provided custom command.
The trade off for using the custom command is that rsync can't do checksum based transfers, but you're still gaining the full benefits of rsync's various features for finding changed files (by timestamp, and optionally size if you know it should be the same), plus filtering of transfer lists etc.
You would (and should) only use a custom copy command in cases where the benefits of doing so are well known to be better than relying on rsync's checksummed transfer algorithm, i.e- you know that changed files will need to be copied (or cloned) in their entirety.
But there are plenty of cases where the copy command itself will lack a lot of rsync's flexibility; cp
as I've used in these examples doesn't have any of the finding/filtering/comparison options that rsync does, and nor do compression tools.
There could be an argument for an intermediate transfer option, e.g- for compression, the command is used to generate a file locally for transfer, so the per-block comparison can still occur with a previously compressed version of the file, but this would need to be an additional option as it won't be suitable for every command.
Just so you know, this is fairly unlikely to be implemented. It would likely be limited to local copies only, so if I ever get around to doing some big changes to rsync's local copying workflow then I will be considering this idea as an additional feature.
I appreciate any consideration of this; I should stress, not all options are required, I have a tendency to overthink stuff like this, and the ability to do this with remote transfers is always something that can be done later.
In terms of local transfers the only changes that are required are:
--copy-cmd
to take the command, once a file is identified for transfer this is executed in place of normal behaviour, disables comparison of files by size by default.--copy-cmd-size
to re-enable comparison by size when the copy command should preserve it.Warnings for incompatible options (--append
and --inplace
) might be appropriate, but they're not required as the use of the copy command would bypass them anyway.
This proposal is for the addition of a new option in the following form (name debatable):
--copy-cmd=COMMAND
When provided, instead of copying a file using the default method (delta transfer), rsync will instead use the provided custom command. The command is given as a string, similar to the
--rsh
option. This command only applies to file copies, other actions (metadata updates, linking etc.) occur as normal.How it actually behaves differs depending upon the nature of the source and destination:
{src}
and{dest}
in the command are swapped for the (quoted and escaped) paths of the source and destination file respectively.{src}
placeholder is used and the command should output the file data to standard output for transfer to the remote host (where the data is written to the destination).{dest}
placeholder is used, and the command should take file data from standard input and write it out to the destination location.Regardless of mode of operation, if the copy command returns a non zero status, rsync will treat the transfer as failed and produce an error or warning. It will also produce a warning in
local -> local
andremote -> local
modes if no file is produced at the expected destination (i.e- command did not include the{dest}
placeholder, didn't use it properly, or failed with a status of zero, which can happen if the command is several piped together).This option would be incompatible with
--append
(also--inplace
?). Use of this option will also disable the use of file-size for comparisons by default, as a custom command may produce a differently sized output file. An option will be needed to tellrsync
to explicitly retain this behaviour, e.g---copy-size
, for when the sizes should match (when--copy-cmd
is used for transparent compression, cloning etc.). Comparisons by modification time however should work as normal no matter what the copy command produces, as rsync should still be setting the time(s) on the file afterwards.Examples
There are a few useful examples of how you could take advantage of this command:
Local File Cloning
Some filesystems support the use of clone/shadow-copy/reflink based zero-cost copying of files which functions similar to hard linking, except that each clone can be written to independently without affecting the other(s), i.e- at time of cloning they share the same data blocks on disk, but when written to they diverge, usually thanks to copy-on-write. To take advantage of this you might use
--copy-cmd
like so:rsync -a --copy-cmd='cp -c {src} {dest}' /path/to/source /path/to/destination
rsync -a --copy-cmd='cp --reflink {src} {dest}' /path/to/source /path/to/destination
This is useful when you know you want to copy something for editing, but want it to be as quick and lightweight as possible, but where the plain command (
cp -c
orcp --reflink
) doesn't offer the same flexibility thatrsync
does. This is also useful when you want to snapshot just a single directory, even though the full volume might support snapshots (as systems that support these commands usually do).Invisible Compression
Some filesystems support on-demand per-file compression; for example on macOS, HFS+ and APFS both support invisible file compression. While there are patches that allow
rsync
to preserve this where a file is already compressed, there may be cases where you'd like to add/remove compression while copying, e.g- ensuring backups use up as little space as possible. You could do this using--copy-cmd
like so:rsync -a --copy-cmd='ditto --hfsCompression {src} {dest}' /path/to/source /path/to/destination
In this case rsync will ensure that all copied files have compression enabled where possible in the destination, even if the files were not compressed in the source.
Explicit Compression
On other file-systems, you may still wish to compress file contents when
rsync
'ing for backup, copying to a mobile drive etc. You could do this using--copy-cmd
andgzip
(orxz
or similar) to compress destination files, like-so:rsync -a --copy-cmd='gzip -9c {src} > {dest}' /path/to/source /path/to/dest
rsync -a --copy-cmd='gzip -9c {src}' /path/to/source host:/path/to/dest
rsync -a --copy-cmd='gzip -9 > {dest}' host:/path/to/source /path/to/dest
To decompress:
rsync -a --copy-cmd='gzip -dc {src} > {dest}' /path/to/source /path/to/dest
rsync -a --copy-cmd='gzip -dc {src}' /path/to/source host:/path/to/dest
rsync -a --copy-cmd='gzip -dc > {dest}' host:/path/to/source /path/to/dest
Implementation Considerations
For examples such as explicit compression, it may be useful to provide a supporting option
--copy-cmd-out-ext
or similar, so that files compressed using--copy-cmd
can have a customised extension, for example--copy-cmd-out-ext=.gz
such thatrsync
remains aware of the change in name, i.e- for a file with pathfoo/bar/baz
,rsync
would treat it on the destination side asfoo/bar/baz.gz
, but look for both versions (in-case the file was previously transferred without this extension). This would also benefit from a--copy-cmd-in-ext
when reversing the direction of a copy, this would instead informrsync
to remove the extension if found on an incoming file (foo/bar/baz.gz
becomesfoo/bar/baz
).As a
--copy-cmd
may not be able to place the same guarantee on the correctness of attributes, these should be set after the copy command has been executed (doesrsync
already set attributes after transfer?).