dandavison / delta

A syntax-highlighting pager for git, diff, grep, and blame output
https://dandavison.github.io/delta/
MIT License
21.34k stars 359 forks source link

🐛 Paths not updated for a diff of modified binary file #1621

Closed imbrish closed 4 months ago

imbrish commented 4 months ago

I have a commit in which a binary file is modified, and a new text file is added.

If I diff only the binary file, everything is good:

diff --git a/meta/archive.zip b/meta/archive.zip
index 00de669..d47cd84 100644
Binary files a/meta/archive.zip and b/meta/archive.zip differ

image

But if I diff it together with the text file, the label for the modified binary file becomes incorrect:

diff --git a/meta/added.txt b/meta/added.txt
new file mode 100644
index 0000000..b5eab54
--- /dev/null
+++ b/meta/added.txt
@@ -0,0 +1 @@
+This file was not here before.
diff --git a/meta/archive.zip b/meta/archive.zip
index 00de669..d47cd84 100644
Binary files a/meta/archive.zip and b/meta/archive.zip differ

image

I have built delta from the source, commit e208f4e, but tested also for tag 0.16.5 (03f1569) with the same outcome.

May be related to #96 and #1502.

imbrish commented 4 months ago

Apparently, I have used the wrong executable to test 0.16.5, here is the actual result for b.diff (still not quite right):

image

See also a combined example c.diff (the foo.dll deleted, bar.dll added and abc.dll modified):

diff --git a/exts/foo.dll b/exts/foo.dll
deleted file mode 100644
index 4aa3244..0000000
Binary files a/exts/foo.dll and /dev/null differ
diff --git a/exts/bar.dll b/exts/bar.dll
new file mode 100644
index 0000000..cc5b40c
Binary files /dev/null and b/exts/bar.dll differ
diff --git a/exts/abc.dll b/exts/abc.dll
index 00de669..d47cd84 100644
Binary files a/exts/abc.dll and b/exts/abc.dll differ

0.16.5 03f1569:

image

e208f4e:

image

imbrish commented 4 months ago

This was my first time looking at rust code, but it seems that the minus_file and plus_file are updated from diff --git line at the deleted file mode or new file mode lines, neither of which appear for the modified library diff. Thus, the previous path values remain, and are re-used for the modified diff.

For the current HEAD this means that (binary file) is appended to the filename twice. I think this information should better be stored at something like self.minus_file_binary and used when writing a new header. All of such diff data should probably be reset at each new diff header.

Anyway, why not update the file names already at diff --git line? Could be later overwritten at ---, +++ etc.