SUPERCILEX / fuc

Modern, performance focused unix commands
Apache License 2.0
340 stars 8 forks source link

Feature request: preserve permissions option #22

Closed baodrate closed 1 year ago

baodrate commented 1 year ago

Basically, the -p flag from (POSIX) cp:

-p Duplicate the following characteristics of each source file in the corresponding destination file:

  1. The time of last data modification and time of last access. If this duplication fails for any reason, cp shall write a diagnostic message to standard error.
  2. The user ID and group ID. If this duplication fails for any reason, it is unspecified whether cp writes a diagnostic message to standard error.
  3. The file permission bits and the S_ISUID and S_ISGID bits. Other, implementation-defined, bits may be duplicated as well. If this duplication fails for any reason, cp shall write a diagnostic message to standard error.

note that most implementations of cp also specify -a (-a/--archive in GNU), which implies:

Also, for reference, GNU's cp -a additionally:

Providing -a (ideally, but perhaps not necessarily, GNU's version) would be very useful for using cpz as a backup tool.

SUPERCILEX commented 1 year ago

would be very useful for using cpz as a backup tool.

Just a fair warning, cpz uses the copy_file_range syscall which means the physical bytes aren't actually copied on file systems that support it. You'd need to do a cross-fs copy to get a true copy. Though if you're using a FS like bcachefs you can tune the number of physical copies you want on a per file basis.

file permission

cpz copies these by default. I'd consider it a bug if this is not the case (on Linux, for other platforms this doesn't happen).


user ID and group ID

This I'm not willing to do because it would introduce an extra syscall for every file. You should sudo into the uid/gid you want the files to be copied as.

The time of last data modification and time of last access.

Same reasoning here.

In theory a flag could be added for both, but I'm really resisting adding flags because:


preserves hard links between source files in the copies

This one is wild! Unless I'm missing something obvious, I think you need to keep track of every hard linked file you've seen so far and then match newly encountered hard links with what you've copied so far. This definitely won't happen flag or not because it requires synchronization between the copy threads.


If your use case is backups, it might actually be faster to first do a cpz pass to pound the NVMe queues, sync, and then run a second metadata + checksum rsync pass. For serious backups, I wouldn't trust the physical media to have written stuff correctly, so you need to use rsync either way.

baodrate commented 1 year ago

cpz uses the copy_file_range syscall which means the physical bytes aren't actually copied on file systems that support it

Ah, I see. I think cp --reflink uses FICLONE. effectively (as far as the filesystem is concerned) the same, right?

This I'm not willing to do because it would introduce an extra syscall for every file. You should sudo into the uid/gid you want the files to be copied as.

Fair enough. Though note that this isn't an option if there are a mix of owners for the files or if the GID doesn't match the group of the user (common on network file systems or other multi-user directories like git repositories)

trying to be the fastest while also being a coreutils clone which is not the goal

Yeah, totally fair

This one is wild!

I had pretty much the same reaction! I didn't know about this one until I was referencing the docs to write this ticket. I had to test the behavior to believe it.

it might actually be faster to first do a cpz pass to pound the NVMe queues, sync, and then run a second metadata + checksum rsync pass. For serious backups, I wouldn't trust the physical media to have written stuff correctly, so you need to use rsync either way

That's smart, good tip thanks. and I did mean "backup" pretty loosely, but you're right, that's probably a good idea in any case

SUPERCILEX commented 1 year ago

Ah, I see. I think cp --reflink uses FICLONE. effectively (as far as the filesystem is concerned) the same, right?

Yup!


Thanks for the suggestion though!