dandavison / delta

A syntax-highlighting pager for git, diff, grep, and blame output
https://dandavison.github.io/delta/
MIT License
23.04k stars 382 forks source link

🐛 CRLF line endings are stripped from diff #754

Open plgruener opened 2 years ago

plgruener commented 2 years ago

I'm trying to show CRLF line endings in my diffs (on Linux, to quickly spot if someone committed a file with CRLF instead of LF). Less has the option -u/--underline-special to do just that, so I could just use that as pager for delta. However, delta always seems to strip the carriage return byte from diffs, see:

delta-CRLF

Here is a hexdump of both diffs. You can see the CR-bytes (0x0d) are missing. delta-CRLF-hexdump2

Even if delta strips the ANSI color escape codes (which is turned off here with --raw), it should never just delete any other character from the output. Let the pager decide to show those control chars or not.

dandavison commented 2 years ago

Thanks @plgruener. Do you have time to look at https://github.com/dandavison/delta/pull/664 and help me decide what the correct course of action is? In that PR, code was added that deliberately strips trailing CR. cc @norisio in case you have time to advise also.

yoichi commented 2 years ago

https://github.com/dandavison/delta/pull/664 is not the only cause. byte_lines may also strip CR. The correct course will be

yoichi commented 2 years ago

strip CR only if delta writes to console

I've tried this by applying sample code in https://rosettacode.org/wiki/Check_output_device_is_a_terminal#Rust But it didn't work with git add -p.

To find a solution, I think it would be better if I understand the cause of https://github.com/dandavison/delta/pull/664 a little more.

CR will be emitted directly into a terminal (at least when delta used through git add -p). Thus the rendered result will be blank (CR erases the line).

This (=CR erases the line) does not happen with the default pager. Why does this happen with delta?

dandavison commented 2 years ago

This (=CR erases the line) does not happen with the default pager. Why does this happen with delta?

I think this is because:

Suppose we have a line ending with CR LF.

Git changes that to CR ANSI LF.

With delta, bytelines strips the LF but leaves the CR (due to ANSI escape added by git). So now the line ends with CR ANSI but no LF. I think this causes the line to be overwritten by the terminal emulator.

Whereas with the usual pager the line ends CR ANSI LF.

yoichi commented 2 years ago

This (=CR erases the line) does not happen with the default pager. Why does this happen with delta?

I found right_fill_background_color's output overwrites the text before CR. Without interactive.diffFilter, there is only reset (ESC[m) after CR, the text before CR remains intact.

I've tried this by applying sample code in https://rosettacode.org/wiki/Check_output_device_is_a_terminal#Rust But it didn't work with git add -p.

git add -p (git-add--interactive.perl) sends diff to pipe (not directly to the terminal) then print it. So the terminal detection did not work. git diff with delta send output to less --RAW-CONTROL-CHARS, and less replaces control characters with "^%c" (the caret notation), but the feature is effective only for terminal output (https://github.com/gwsw/less/blob/4f5e45b4262009c6fcd87c0efc077725271c8c57/main.c#L222). Therefore it is not usable with git add -p.

I think one solution is to define delta's new command line option to replace control characters by caret notations (like less --RAW-CONTROL-CHARS on terminal) and use the option in the interactive.diffFilter value.

yoichi commented 2 years ago

I think one solution is to define delta's new command line option to replace control characters by caret notations (like less --RAW-CONTROL-CHARS on terminal) and use the option in the interactive.diffFilter value.

I've made a fix in this way and created a pull request. Would you please take a look? @norisio @dandavison https://github.com/dandavison/delta/pull/1000

dandavison commented 2 years ago

Is it true that the problem is occurring because

(1) git is placing an ANSI character between CR and LF: CR ANSI LF (2) bytelines is stripping the LF, leaving CR ANSI

?

If so, would one solution be for delta to strip trailing CR if it is followed by the trailing ANSI character? Presumably that would be very rare in real input.

In any case, delta's output is for humans to look at, not for machines to read, so it's not the end of the world if we occasionally remove a byte someone wanted as long as it really is rare.

norisio commented 2 years ago

git is placing an ANSI character between CR and LF

Only when used with -p, according to the images in https://github.com/dandavison/delta/issues/754#issue-1039048806, which show there are no CR-ANSI-LF .

norisio commented 2 years ago

Therefore this issue is not related to the CR-ANSI-LF problem, I guess. Simply because using bytelines to split the input stream. It always erases CR before LF.

dandavison commented 2 years ago

Only when used with -p

Thanks!

Therefore this issue is not related to the CR-ANSI-LF problem

Right. OK, so:

I think it is correct that delta's internal code paths work with a stripped line. So maybe the solution is we update bytelines so that it does not strip the line endings, or strips them but returns them to us with the line. Then, when delta is ready to print a line, it adds back the original line ending.

I do think it's worth bearing in mind that delta's output is for humans to look at, not for machines to parse: if you want machine-readable diff output then that's delta's input, not its output. And indeed, if you redirect git's output into a pipe or file, then delta doesn't even get invoked (git doesn't invoke core.pager).

HOWEVER:

  1. It seems a reasonable goal that delta prettifies a line with ANSI colors but nevertheless retains the original line ending
  2. delta has --raw, and it is very reasonable to expect that to retain the line endings even if delta didn't by default.
Atemu commented 10 months ago

What is the status on this?

it's worth bearing in mind that delta's output is for humans to look at, not for machines to parse

This is true but us humans also need to understand the output. If it show a line removed but the same line added again, that's extremely confusing.

You might be able to deduce it's due to line endings but then you're left wondering what changed about line endings.
Were they the correct ones but aren't anymore? The other way around? No idea, all you see is the same characters.

Line endings are already complex enough in git as is.