Should `delta` follow symlinks?

rrbutani commented 2 years ago

(filing this as a feature request instead of a bug report because I'm not sure if this is the intended behavior or not)

Currently, when using delta as a diff tool with paths that are symlinks, the literal contents of the symlink (as in, the path of the file that's being pointed to) is used in the diff instead of the contents of the file that's being pointed to.

For example:

echo "hello" > foo
ln -s foo bar
delta -s foo bar

yields:

❯ delta foo bar -s

removed: foo
─────────────────────────────────────────────────────────────────────────────────

───┐
0: │
───┘
│ 1  │hello                              │    │

added: bar
─────────────────────────────────────────────────────────────────────────────────

───┐
1: │
───┘
│    │                                   │ 1  │foo
\ No newline at end of file

instead of no output (as diff, for example, does).

This is because of git diff --no-index's behavior with respect to symlinks. As far as I know there isn't any user-facing way to change this behavior (1, 2; there was a patch that changed the default behavior to be to follow symlinks but afaict it was never merged).

Is this intended behavior for delta?

Adding calls to fs::canonicalize to these two paths would change delta's behavior w.r.t. symlinks to match diff and other tools but would also mean there isn't a way to actually diff the literal contents of symlinks (diff has --no-dereference for example). I'm not sure if this matters.

(Also right now delta just lets git diff --no-index error when a file that's specified doesn't exist; if we call canonicalize in delta we'd need to handle the possibility of broken symlinks. I assume just printing an error message matching that of git diff (i.e. error: Could not access: ...) for symmetry is acceptable)

If this is deemed a reasonable change, I'm very happy to make a PR with the above change and some tests.

dandavison commented 2 years ago

Hi @rrbutani, thanks for writing this up. IMO this is a bug and it would be fantastic if you could fix this. I don't think we need to support diffing the contents of the symlink itself -- my intention is that the semantics of delta a b are the same as diff a b.

While you're at it (and maybe more interesting challenge?) is there any way to workaround the related process substitution problem (see https://github.com/dandavison/delta/issues/666 and duplicate reports linked therein):

~ delta <(echo aaa) <(echo bbb)
error: /dev/fd/11: unsupported file type
fatal: cannot hash /dev/fd/11

(That's MacOS but I think there's a similar error on Linux and *BSDs)

rrbutani commented 2 years ago

Unfortunately I don't think calling canonicalize fixes the process substitution case; i.e.

git diff --no-index $(realpath <(echo foo)) $(realpath <(echo bar))

does not work.

(realpath <(echo yo) yields a pipe which I don't think you can treat as a file anyways (?); diff and other tools are similarly confused by this kind of thing: cat $(realpath <(echo yo)) for example)

I do have a workaround for when there's only 1 process substitution in the args though: git diff will happily read from stdin which means we can rewrite delta <(echo yo) /some/file as git diff-index - /some/file < <(echo yo) (or the Rust Command equivalent of that anyways; i.e. passing the /dev/fd/<whatever> that delta gets as arg1 into git diff-index as stdin).

Unfortunately this doesn't help the case where there are two process substitutions.

So far git diff seems unhappy with any kind of special file (mkfifo also doesn't work with it). The best I can come up with is to write one of the arguments to a temporary file if necessary but that's obviously suboptimal for a lot of reasons.

I'll take a peek at the git diff --no-index source code next to see if I missed anything.

rrbutani commented 2 years ago

@dandavison If I don't find a better way though, is ^ worth implementing/merging? Is the temp file solution worth writing (seems like the kind of thing that may be better being opt in, as a flag)?

Alternatively we could choose to just not use git diff --index as the diff tool when we have two process substitutions. I'm not sure what utility it has over regular diff tools when we know for certain that the things being compared aren't tracked by source control.

Edit: whoops, just saw #543. ~~I don't think the specific example in that issue (diff.context) applies to git diff with --no-index though.~~

dandavison commented 2 years ago

Yes #543 made me realize that there is benefit in using git diff for this -- I do think it makes sense to stick with it.

The best I can think come up with is to write one of the arguments to a temporary file if necessary but that's obviously suboptimal for a lot of reasons.

I know it's not pretty, but I actually think that would in practice solve the vast majority of people's pain with this. What do you think? If you're up for implementing it I think that would be great -- quite a few people have stumbled across this problem.

And the symlink solution also of course!

I know there are these two git patches but I haven't looked into how they relate yet

dandavison commented 2 years ago

(seems like the kind of thing that may be better being opt in, as a flag)?

As long as the temp file only happens when the user uses shell process substitution syntax then I think we should make it default, so people get the benefit automatically without having to google the problem. I'm assuming we can avoid copying normal files in "normal" diff invocations.

rrbutani commented 2 years ago

Yes #543 made me realize that there is benefit in using git diff for this -- I do think it makes sense to stick with it.

Do you know if there are git diff settings/differences in it's output that apply to untracked files?

Most of git help --config | grep ^diff seems like it doesn't apply to the process substitution args case; it feels very tempting to try to extract the options that do apply like diff.context and pass them along, just in this specific case.

I know it's not pretty, but I actually think that would in practice solve the vast majority of people's pain with this. What do you think? If you're up for implementing it I think that would be great -- quite a few people have stumbled across this problem.

I'm definitely happy to make this change but I'm not sure I know how to catch and handle all the edge cases (I think we'd want to use mktemp which handles cleanup and is cross platform; is it fine to pull in another dep? Should we bound the size of the temp file we create, somehow? It doesn't seem inconceivable to me that we could run out of disk space writing a temp file (I think wrangling lengthy streams is definitely a primary motivating use case for process substitution) and while I'm pretty sure mktemp would do the right thing in this case -- its cleanup is in a Drop impl which will still get run if we panic and the process starts to unwind -- I have not personally tried this).

Mostly though I think the expectation with process substitution is that stuff isn't being written to disk and I'm a little uneasy subverting that expectation.

rrbutani commented 2 years ago

I think it's ultimately a question of which behavior is more surprising when asked to diff two process substitutions: slightly different diff output or one of the streams being written to disk, temporarily.

(I feel the latter would be more surprising but my usage is also maybe a bit pathological: diffing lengthy command outputs on machines with slow disk drives where the added latency of writing to a temp file first would be noticeable, etc)

Regardless, I'm happy to make a PR implementing whichever approach you think is better.

dandavison commented 2 years ago

slightly different diff output

Here you're implying that we could fall back to diff in the two-process-substitution case, right?

Hm. I agree it's an option, but I think I'd prefer not to have the inconsistency of having both the diff and git diff versions implemented in delta. People who are diffing two process substitutions with lengthy output can always do diff <(echo a) <(echo b) | delta after all (and they probably have the sophistication to switch to that without much thinking).

dandavison commented 2 years ago

But your efficient one-process subst trick is cool -- happy to have that as a code path in addition to the last-resort copying.

dandavison / delta

Should `delta` follow symlinks? #929