Open rrbutani opened 2 years ago
Hi @rrbutani, thanks for writing this up. IMO this is a bug and it would be fantastic if you could fix this. I don't think we need to support diffing the contents of the symlink itself -- my intention is that the semantics of delta a b
are the same as diff a b
.
While you're at it (and maybe more interesting challenge?) is there any way to workaround the related process substitution problem (see https://github.com/dandavison/delta/issues/666 and duplicate reports linked therein):
~ delta <(echo aaa) <(echo bbb)
error: /dev/fd/11: unsupported file type
fatal: cannot hash /dev/fd/11
(That's MacOS but I think there's a similar error on Linux and *BSDs)
Unfortunately I don't think calling canonicalize
fixes the process substitution case; i.e.
git diff --no-index $(realpath <(echo foo)) $(realpath <(echo bar))
does not work.
(realpath <(echo yo)
yields a pipe which I don't think you can treat as a file anyways (?); diff
and other tools are similarly confused by this kind of thing: cat $(realpath <(echo yo))
for example)
I do have a workaround for when there's only 1 process substitution in the args though: git diff
will happily read from stdin
which means we can rewrite delta <(echo yo) /some/file
as git diff-index - /some/file < <(echo yo)
(or the Rust Command
equivalent of that anyways; i.e. passing the /dev/fd/<whatever>
that delta
gets as arg1 into git diff-index
as stdin).
Unfortunately this doesn't help the case where there are two process substitutions.
So far git diff
seems unhappy with any kind of special file (mkfifo
also doesn't work with it). The best I can come up with is to write one of the arguments to a temporary file if necessary but that's obviously suboptimal for a lot of reasons.
I'll take a peek at the git diff --no-index
source code next to see if I missed anything.
@dandavison If I don't find a better way though, is ^ worth implementing/merging? Is the temp file solution worth writing (seems like the kind of thing that may be better being opt in, as a flag)?
Alternatively we could choose to just not use git diff --index
as the diff tool when we have two process substitutions. I'm not sure what utility it has over regular diff tools when we know for certain that the things being compared aren't tracked by source control.
Edit: whoops, just saw #543. I don't think the specific example in that issue (diff.context
) applies to git diff
with --no-index
though.
Yes #543 made me realize that there is benefit in using git diff
for this -- I do think it makes sense to stick with it.
The best I can think come up with is to write one of the arguments to a temporary file if necessary but that's obviously suboptimal for a lot of reasons.
I know it's not pretty, but I actually think that would in practice solve the vast majority of people's pain with this. What do you think? If you're up for implementing it I think that would be great -- quite a few people have stumbled across this problem.
And the symlink solution also of course!
I know there are these two git patches but I haven't looked into how they relate yet
(seems like the kind of thing that may be better being opt in, as a flag)?
As long as the temp file only happens when the user uses shell process substitution syntax then I think we should make it default, so people get the benefit automatically without having to google the problem. I'm assuming we can avoid copying normal files in "normal" diff invocations.
Yes #543 made me realize that there is benefit in using
git diff
for this -- I do think it makes sense to stick with it.
Do you know if there are git diff
settings/differences in it's output that apply to untracked files?
Most of git help --config | grep ^diff
seems like it doesn't apply to the process substitution args case; it feels very tempting to try to extract the options that do apply like diff.context
and pass them along, just in this specific case.
I know it's not pretty, but I actually think that would in practice solve the vast majority of people's pain with this. What do you think? If you're up for implementing it I think that would be great -- quite a few people have stumbled across this problem.
I'm definitely happy to make this change but I'm not sure I know how to catch and handle all the edge cases (I think we'd want to use mktemp
which handles cleanup and is cross platform; is it fine to pull in another dep? Should we bound the size of the temp file we create, somehow? It doesn't seem inconceivable to me that we could run out of disk space writing a temp file (I think wrangling lengthy streams is definitely a primary motivating use case for process substitution) and while I'm pretty sure mktemp
would do the right thing in this case -- its cleanup is in a Drop
impl which will still get run if we panic
and the process starts to unwind -- I have not personally tried this).
Mostly though I think the expectation with process substitution is that stuff isn't being written to disk and I'm a little uneasy subverting that expectation.
I think it's ultimately a question of which behavior is more surprising when asked to diff two process substitutions: slightly different diff output or one of the streams being written to disk, temporarily.
(I feel the latter would be more surprising but my usage is also maybe a bit pathological: diffing lengthy command outputs on machines with slow disk drives where the added latency of writing to a temp file first would be noticeable, etc)
Regardless, I'm happy to make a PR implementing whichever approach you think is better.
slightly different diff output
Here you're implying that we could fall back to diff
in the two-process-substitution case, right?
Hm. I agree it's an option, but I think I'd prefer not to have the inconsistency of having both the diff
and git diff
versions implemented in delta. People who are diffing two process substitutions with lengthy output can always do diff <(echo a) <(echo b) | delta
after all (and they probably have the sophistication to switch to that without much thinking).
But your efficient one-process subst trick is cool -- happy to have that as a code path in addition to the last-resort copying.
(filing this as a feature request instead of a bug report because I'm not sure if this is the intended behavior or not)
Currently, when using
delta
as a diff tool with paths that are symlinks, the literal contents of the symlink (as in, the path of the file that's being pointed to) is used in the diff instead of the contents of the file that's being pointed to.For example:
yields:
instead of no output (as
diff
, for example, does).This is because of
git diff --no-index
's behavior with respect to symlinks. As far as I know there isn't any user-facing way to change this behavior (1, 2; there was a patch that changed the default behavior to be to follow symlinks but afaict it was never merged).Is this intended behavior for
delta
?Adding calls to
fs::canonicalize
to these two paths would changedelta
's behavior w.r.t. symlinks to matchdiff
and other tools but would also mean there isn't a way to actually diff the literal contents of symlinks (diff
has--no-dereference
for example). I'm not sure if this matters.(Also right now
delta
just letsgit diff --no-index
error when a file that's specified doesn't exist; if we callcanonicalize
indelta
we'd need to handle the possibility of broken symlinks. I assume just printing an error message matching that ofgit diff
(i.e.error: Could not access: ...
) for symmetry is acceptable)If this is deemed a reasonable change, I'm very happy to make a PR with the above change and some tests.