Closed baodrate closed 1 year ago
would be very useful for using cpz as a backup tool.
Just a fair warning, cpz uses the copy_file_range syscall which means the physical bytes aren't actually copied on file systems that support it. You'd need to do a cross-fs copy to get a true copy. Though if you're using a FS like bcachefs you can tune the number of physical copies you want on a per file basis.
file permission
cpz copies these by default. I'd consider it a bug if this is not the case (on Linux, for other platforms this doesn't happen).
user ID and group ID
This I'm not willing to do because it would introduce an extra syscall for every file. You should sudo into the uid/gid you want the files to be copied as.
The time of last data modification and time of last access.
Same reasoning here.
In theory a flag could be added for both, but I'm really resisting adding flags because:
preserves hard links between source files in the copies
This one is wild! Unless I'm missing something obvious, I think you need to keep track of every hard linked file you've seen so far and then match newly encountered hard links with what you've copied so far. This definitely won't happen flag or not because it requires synchronization between the copy threads.
If your use case is backups, it might actually be faster to first do a cpz pass to pound the NVMe queues, sync, and then run a second metadata + checksum rsync pass. For serious backups, I wouldn't trust the physical media to have written stuff correctly, so you need to use rsync either way.
cpz uses the copy_file_range syscall which means the physical bytes aren't actually copied on file systems that support it
Ah, I see. I think cp --reflink
uses FICLONE
. effectively (as far as the filesystem is concerned) the same, right?
This I'm not willing to do because it would introduce an extra syscall for every file. You should sudo into the uid/gid you want the files to be copied as.
Fair enough. Though note that this isn't an option if there are a mix of owners for the files or if the GID doesn't match the group of the user (common on network file systems or other multi-user directories like git repositories)
trying to be the fastest while also being a coreutils clone which is not the goal
Yeah, totally fair
This one is wild!
I had pretty much the same reaction! I didn't know about this one until I was referencing the docs to write this ticket. I had to test the behavior to believe it.
it might actually be faster to first do a cpz pass to pound the NVMe queues, sync, and then run a second metadata + checksum rsync pass. For serious backups, I wouldn't trust the physical media to have written stuff correctly, so you need to use rsync either way
That's smart, good tip thanks. and I did mean "backup" pretty loosely, but you're right, that's probably a good idea in any case
Ah, I see. I think cp --reflink uses FICLONE. effectively (as far as the filesystem is concerned) the same, right?
Yup!
Thanks for the suggestion though!
Basically, the
-p
flag from (POSIX)cp
:note that most implementations of
cp
also specify-a
(-a
/--archive
in GNU), which implies:-p
-R
: recursive (cpz already does this by default)-P
: symlinks in the sources are copied as symlinks, rather than followed (cpz also does this by default)Also, for reference, GNU's
cp -a
additionally:cp -p=all
)cp -d
)cp -a
two files that are hard-linked, you'll create a separate pair of hard-linked files)Providing
-a
(ideally, but perhaps not necessarily, GNU's version) would be very useful for usingcpz
as a backup tool.