SUPERCILEX / fuc

Modern, performance focused unix commands
Apache License 2.0
340 stars 8 forks source link

Feature request: support `cpz -t`/`cpz -T` #21

Closed baodrate closed 1 year ago

baodrate commented 1 year ago

This is a GNU-ism rather than standard POSIX, but it'd be nice if cpz supported GNU cp's -t (--target-directory) and -T (--no-target-directory) options, so that the user can ensure consistent behavior re: treating the target as a directory or a name.

SUPERCILEX commented 1 year ago

I actually prefer the way we're currently handling it:

This way, there's no implicit behavior based on the contents of the file system (e.g. cp will copy into a directory if it already exists on the file system). Instead, behavior is resolved statically based on the arguments.

Let me know if there's something I missed.

baodrate commented 1 year ago

The general difficulty I have with this scheme is that it makes it trickier to construct the cpz command. (tl;dr at the end)

If you copy one file, the destination is treated as a directory if the path looks like one (i.e. it has a slash appended to the name or is ..).

This is similar to rsync's behavior, so although uncommon, it is not without precedent. However, it presents an ergonomic issue: you have to more careful about what target looks like or you'll have implicit behavior:

source_dir=./ptrn_*/                          # glob for a directory we want to copy
cp -RT "$source_dir" "/backups/$source_dir"   # safe, fails if /backups/$source_dir exists
cpz "$source_dir" "/backups/$source_dir"      # not safe, need to remove trailing slashes first

If you copy multiple files... If you copy one file...

cpz is implicit on the combination of the file system contents and the number of arguments. This means that you can't construct a command that will behave the same for 1-N arguments. This leads me to the quality of -t I forgot to mention in the original post, which is that it lets you move the target argument to the beginning of the command:

find . -type d -name 'ptrn_*' -exec cp -Rvt /some/targetdir {} +
# or
find . -type d -name 'ptrn_*' | xargs cp -Rvt /some/targetdir

The results are appended to the command line specified with -exec. And no matter how many files/directories are found, you can be sure that it will either copy them into the existing /some/targetdir or fail. No implicit behavior. That's not currently possible with cpz because the target has to go at the end.

In some situations we can perform the task in a way where the (static) target can be specified at the end of the command, e.g. by using fd:

# fd lets you specify the arguments anywhere in the exec cmd line
fd -t d '^ptrn_.*' -X cpz {} /some/targetdir
# or more simply, w/ plain shell globbing:
cpz ptrn_*/ /some/targetdir

Whether /some/targetdir is treated as a directory or a name is implicit if the number of arguments cpz will be called with is variable, as in this case. You can guarantee the former case by changing the target to /some/targetdir/, but you can't guarantee the second case.


tl;dr:

GNU cp's -t/-T options (also common across the GNU coreutils) gives it two properties over POSIX cp:

Of course, this is all GNU-specific, so I wouldn't say that it's strictly required for cpz to be an ergonomic replacement for cp. But it can still be nice to have since it can be implemented without any breaking changes to cpz's current UI.

SUPERCILEX commented 1 year ago

cpz is implicit on the combination of the file system contents and the number of arguments.

Not quite sure we're on the same page. By "copy multiple files", I mean "more than 2 arguments were passed into the command." Edit: actually we're probably on the same page, you just mean that if the arguments are passed in from another command that'll add an implicit dependency on the state of the file system.

it lets you move the target argument to the beginning of the command

Ah, that's interesting. I'm not sure clap can even model that behavior, so this might be annoying to implement if we go through with it.

you can't guarantee [treating a path as a file]

That's true, but what do you do when multiple files are passed in to be copied to a single file? The idea was that you would always want a directory in that case. I guess we could make that fail, but I don't like the ergonomics hit.


It sounds to me like this could be fixed by flipping the argument order. So it would be cpz DEST FROM.... Are there examples of cases where someone would receive a directory-looking destination path and treat want to treat it as a file?

baodrate commented 1 year ago

you just mean that if the arguments are passed in from another command that'll add an implicit dependency on the state of the file system

Yeah. Sorry, I was having difficulty wording this properly.

I'm not sure clap can even model that behavior

Ah, that's disappointing to hear. I haven't written too much rust and specifically haven't used clap, so I'm not familiar with its limitations.

what do you do when multiple files are passed in to be copied to a single file? The idea was that you would always want a directory in that case. I guess we could make that fail

So referencing the last example:

cp -RT ptrn_*/ /some/targetdir

this globs for a directory with ptrn_*/ and creates a copy of it at the filepath /some/targetdir. However, since ptrn_*/ can potentially match multiple directories, it provides a guard against that (yes, failing if multiple sources are provided).

I guess we could make that fail, but I don't like the ergonomics hit.

Yeah, my idea with implementing -T is that it provides a completely optional guard, it doesn't require a change in the behavior of plain-old cpz. It would just imply (and enforce) that the target is the final pathname, and prevents you from unintentionally supplying more than one source. -t similarly wouldn't change the default behavior

For clarity, GNU's cp actually documents these as three separate forms:

SYNOPSIS
       cp [OPTION]... [-T] SOURCE DEST
       cp [OPTION]... SOURCE... DIRECTORY
       cp [OPTION]... -t DIRECTORY SOURCE...

this could be fixed by flipping the argument order. So it would be cpz DEST FROM....

Are you suggesting a change to cpz's default argument structure? I don't think that'd be comfortable for most users, since most people are used to cp SOURCE DEST (although it probably would be a better interface, if only we could go back in time)

Are there examples of cases where someone would receive a directory-looking destination path and treat want to treat it as a file?

If I understand this question, this is demonstrated in the first example I wrote:

source_dir=./ptrn_*/
cp -RT "$source_dir" "/backups/$source_dir"

you could also imagine something like:

backup() {
    name=$(basename "$1")
    date=$(date -u +%Y-%m-%d)
    # handles $1 appropriately whether it is a regular file or a directory, whether it ends with a slash or not
    cp -vRT "$1" "/backups/${date}/${name}"
}
backup ./ptrn_*/
SUPERCILEX commented 1 year ago

Are you suggesting a change to cpz's default argument structure?

Yes and no. I agree that it would be confusing without a time machine, so I was thinking of adding a flag that said "flip the argument order" and then people could create an alias for cpz that always used that flag if they want that order by default.

cp -RT "$source_dir" "/backups/$source_dir"

I guess I'm a little confused here. -T means "treat DEST as a normal file" so why are you passing in a directory?

baodrate commented 1 year ago

so I was thinking of adding a flag that said "flip the argument order"

Yeah, that's the primary behavior I'm looking for from -t

-T means "treat DEST as a normal file" so why are you passing in a directory?

That's interesting, there's a bit of a discrepancy between the man pages and the docs. For clarification, the gnu docs describes -T as:

Do not treat the last operand specially when it is a directory or a symbolic link to a directory. See Target directory.

Which expands to:

Do not treat the last operand specially when it is a directory or a symbolic link to a directory. This can help avoid race conditions in programs that operate in a shared area. For example, when the command ‘mv /tmp/source /tmp/dest’ succeeds, there is no guarantee that /tmp/source was renamed to /tmp/dest: it could have been renamed to /tmp/dest/source instead, if some other process created /tmp/dest as a directory. However, if mv -T /tmp/source /tmp/dest succeeds, there is no question that /tmp/source was renamed to /tmp/dest.

In the opposite situation, where you want the last operand to be treated as a directory and want a diagnostic otherwise, you can use the --target-directory (-t) option.

personally I usually use it as a guard against poorly formatted commands rather than race conditions, but the same scenario underlies both

I would have linked that page to begin with, if I knew about it. it does a much better job of explaining this than I have

SUPERCILEX commented 1 year ago

Gotya. So I think -T is irrelevant for cpz because we don't actually look at DEST until we try to do the copy. So if DEST exists and is a file but you said you wanted it to be a directory, opening will fail. Vice versa for DEST being a dir but you said you wanted it to be a file.

For -t, it sounds like the important part is argument flipping. So how about adding a -t, --reverse-args flag that flips SRC and DEST?

The other question is how to ensure you're creating a file or dir. We could say that if a path doesn't look directory-like, then SRC must only contain a single argument, but again, not a fan of the ergonomics hit and I don't want another flag for this. I think I'd argue this isn't a big deal. If you want a directory, you can always get that behavior. If you want a file and got a directory instead, then you have a bug anyway so should just fix the root cause. The worst thing that will happen is some later command tries to open the directory as a file and dies.

baodrate commented 1 year ago

I think I'd argue this isn't a big deal.

Yeah, I can agree with that. The race is the tricky bit and it's already being avoided, the other issues can be safely delegated to the user.

For -t, it sounds like the important part is argument flipping. So how about adding a -t, --reverse-args flag that flips SRC and DEST?

Yeah, that would handle pretty much 99% of my issues. There's little difference between cpz --reverse-args DEST SOURCE and cpz [--target-directory DEST] SOURCE.... I assume the preference for the former is so you don't have to change any of the logic about interpreting the source and dest args?

SUPERCILEX commented 1 year ago

I assume the preference for the former is so you don't have to change any of the logic about interpreting the source and dest args?

It's less around changing the logic but rather having two different ways of doing things that aren't consistent. If the only thing that changes is argument order, that's easy to reason about. Will implement.