dandavison / delta

A syntax-highlighting pager for git, diff, grep, and blame output
https://dandavison.github.io/delta/
MIT License
23.01k stars 382 forks source link

UTF-8 problems on Windows #271

Closed StaticPH closed 3 years ago

StaticPH commented 4 years ago

I'm finding that either delta or my terminal emulator is having a hard time with the non-ascii characters used for decorations. Long lines also don't quite behave ideally. While I think delta is somehow the problem here, I'm not really sure, but I figured it would be good to at least file an issue in case anyone else encounters a similar problem. Even if I'm wrong, if someone could point me in the right direction, I'd appreciate it.

Assorted pertinent information: - Running in the latest version of MSYS2 mintty on Windows 8.1 - I am using the version of less that comes with MSYS2; unsure if that is the same as what git-for-windows ships with. - I installed both delta and bat via cargo, and neither are aliased. - My bat config file contains only custom --map-syntax flags ![image](https://user-images.githubusercontent.com/7786502/89243122-38708200-d5d1-11ea-8f2b-4225a4d61fb3.png) ![image](https://user-images.githubusercontent.com/7786502/89244550-c732ce00-d5d4-11ea-840e-d53e972ed091.png) Regardless of the value of core.pager: ![image](https://user-images.githubusercontent.com/7786502/89246907-21825d80-d5da-11ea-843a-b3e174f61dc0.png) Raw git diff for an arbitrary git-managed file: ![image](https://user-images.githubusercontent.com/7786502/89249811-239bea80-d5e1-11ea-97c8-ffa868863089.png) (for now, only the first few lines of it, because that's really all I expect will be needed and the whole thing is just over 200 lines)

Without any value set for core.pager: image

Side note regarding the width I know that the issue with the width can be handled with `git diff .inputrc | delta -ns --width="$(tput cols)"`, but that still doesn't provide any means of displaying parts of a line that continue off screen. ![image](https://user-images.githubusercontent.com/7786502/89247654-d0736900-d5db-11ea-9a9b-bde638ea72ca.png) ![image](https://user-images.githubusercontent.com/7786502/89247822-2cd68880-d5dc-11ea-809e-ee13ce1ed17f.png) The complete text of line 49 reads ` # If enabled, lists all matches in case multiple possible completions are possible.`. Please just disregard how odd "possible completions are possible" sounds.

After git config --global core.pager=delta, the first few lines of git diff .inputrc are shown as: image and just for good measure, after adding to my global gitconfig

[delta]
    features = side-by-side line-numbers
    syntax-theme = Monokai Extended
[delta "line-numbers"]
    line-numbers = true
[interactive]
    diffFilter = delta --color-only

It only gets worse image and either way, there's still that little detail regarding the parts of the line that aren't initially shown. Oh yeah, as I was getting screenshots and writing out this issue report, I discovered that trying to pipe either git diff .inputrc | delta or git diff .inputrc(with the previously mentioned gitconfig settings applied) through programs like head or tail only seems to pass along the default git diff output; not sure if that's supposed to happen.

dandavison commented 4 years ago

Hi @StaticPH,

Thanks for all the information! The short version is that I think that this will be fixed by installing a different version of less and replacing the one installed with git, as documented here: https://github.com/dandavison/delta/#using-delta-on-windows

The explanation is something like this (tell me if anything sounds wrong):

First, let's note that delta, by default, spawns a child less process to display output. It does this using the relative path name less, so it will pick up whatever less is on $PATH in the parent process of delta.

  1. With core.pager = delta unicode characters are not rendered correctly. In this situation, delta is run as a child process of git. So the git process has had an opportunity to set $PATH the way it wants. I believe that the less that's being picked up here may be a (bad) less on your system that is installed with git.

  2. But when you pipe git to delta, they are correct. In this situation, git has not had an opportunity to alter $PATH as seen by the delta process. So it is possible (and consistent with your observations) that a different (good) less is being picked up here.

I discovered that trying to pipe either git diff .inputrc | delta or git diff .inputrc through programs like head or tail only seems to pass along the default git diff output;

That's correct and expected: by default git will only emit ANSI color escape sequences when output is directed to a terminal emulator; not to a pipe. To force git to emit ANSI sequences to a pipe, use git diff --color=always.

StaticPH commented 4 years ago

Thanks for all the information! The short version is that I think that this will be fixed by installing a different version of less and replacing the one installed with git, as documented here: https://github.com/dandavison/delta/#using-delta-on-windows

I can still try the alternate less, but I was hoping I wouldn't need to do so. I confirmed that the less binaries included with git and MSYS2 are different in some regard, the former being some 7.8kB larger. If, as you say, delta is finding a "good" version of less (presumably the one from MSYS2), do you have any idea how I could force git to find the same one, instead of whatever it currently is?

It may be worth noting that I have also observed similar issues where other rust binaries are involved without git, so I'm not sure this encoding issue can simply be attributed to git finding some other less executable somewhere

Examples ![image](https://user-images.githubusercontent.com/7786502/89461722-5c9ba280-d73a-11ea-85ad-ede47a2bbd0c.png) `desed -E fmt.sed snippet.txt` where fmt.sed contains `s#(.*: )([[:digit:]]{1,10})(.*)(,)# printf "%s%s%s" "\1" "$(date -d @\2)" "\4"#e` ![image](https://user-images.githubusercontent.com/7786502/89465177-a6d35280-d73f-11ea-9695-c1cb59d810dc.png)

That's correct and expected: by default git will only emit ANSI color escape sequences when output is directed to a terminal emulator; not to a pipe. To force git to emit ANSI sequences to a pipe, use git diff --color=always.

I don't mean that the coloring reverts, I mean that the decorations, line numbering, and side-by-side view are also missing; it looks the same as default git diff --color=always output. image

dandavison commented 4 years ago

that still doesn't provide any means of displaying parts of a line that continue off screen.

Right, I have not yet implemented line-wrapping in side-by-side mode: as you can imagine, it would require some extra code complexity, with independent wrapping in left and right panels. What I suggest is setting width to a value larger than the number of columns in your terminal emulator, setting PAGER=less -RS (or otherwise ensuring less uses the -S flag), and then using less to scroll right when necessary.

I can still try the alternate less, but I was hoping I wouldn't need to do so. I confirmed that the less binaries included with git and MSYS2 are different in some regard, the former being some 7.8kB larger. If, as you say, delta is finding a "good" version of less (presumably the one from MSYS2), do you have any idea how I could force git to find the same one, instead of whatever it currently is?

The only way I know is to replace Git's less binary with the one you want to use, as described here: https://github.com/lzybkr/less/releases/tag/fix_windows_vt

There's some good information on Windows less and delta in this thread that you might find helpful: https://github.com/dandavison/delta/issues/197

It may be worth noting that I have also observed similar issues where other rust binaries are involved without git, so I'm not sure this encoding issue can simply be attributed to git finding some other less executable somewhere

Indeed, so that proves that the incorrect display of, for example unicode box-drawing characters, is unrelated to delta, and also unrelated to less, correct? (Because cargo whatfeatures presumably doesn't invoke a pager). So that suggests to me that your terminal emulator is not handling utf-8 correctly.

(I added this rough shell script today to help with diagnosing utf-8 and ANSI color handling issues; if you feel like trying it and telling me what I should do to make it convenient for Windows users that would be fantastic: https://github.com/dandavison/delta/blob/master/etc/bin/diagnostics)

Examples

That's correct and expected: by default git will only emit ANSI color escape sequences when output is directed to a terminal emulator; not to a pipe. To force git to emit ANSI sequences to a pipe, use git diff --color=always.

I don't mean that the coloring reverts, I mean that the decorations, line numbering, and side-by-side view are also missing; it looks the same as default git diff --color=always output.

OK, but in your command you are piping git into head. That causes git to recognize that its output is not going to a terminal emulator and in that situation it does not invoke core.pager at all. So it is expected that the output precisely resembles vanilla git output.

dandavison commented 4 years ago

Hi @StaticPH, would you be able to run this shell script and post a screenshot of the output when you get a chance? https://github.com/dandavison/delta/blob/master/etc/bin/diagnostics I'd like to get to the bottom of the problems here.

StaticPH commented 4 years ago

I'm going to assume you meant for "env | grep -E '(less|pager|bat)'" to be "env | grep -Ei '(less|pager|bat)'", but if you didn't, then just ignore LESS and PATHEXT; I doubt the changes I have there would make any difference, but I included them just in case.

image And while it is hard to see, there is definitely a difference between the 2nd and 3rd printing of 'text'; the characters themselves are a brighter blue, which I believe is what you were looking for there.

dandavison commented 4 years ago

do you have any idea how I could force git to find the same one, instead of whatever it currently is?

The only way I know is to replace Git's less binary with the one you want to use, as described here: https://github.com/lzybkr/less/releases/tag/fix_windows_vt

Ah, I'm sorry, I think I do have a better idea: set either PAGER or BAT_PAGER to the absolute path to your "good" less executable. If delta sees either of those env vars it will use the executable that they point to.

To check whether it's a good less executable you could do ./diagnostics | less -R. If the unicode characters look good when piped into less, then which less should be an absolute path to a good less executable.

Feel free to post a screenshot of ./diagnostics | less -R

(Also, is it worth double-checking that the problems still occur if the LESS env var is not set at all?)

StaticPH commented 4 years ago

Output of ./diagnostics | /usr/bin/less -R --quit-if-one-screen with LESS set: image Output with LESS unset: image

With both PAGER and BAT_PAGER set to /usr/bin/less, LESS unset, and less unaliased (so less is simply /usr/bin/less): image The result remains unchanged if using the absolute Windows-style path for less, both with and without the '.exe' extension. I also tried again after running git config --global --unset core.pager; git config --global pager.diff delta, just in case that would make a difference, but it didn't.

dandavison commented 4 years ago

Do you know where the less that comes with git is located? E.g. is it at C:\Program Files\Git\usr\bin\less.exe as suggested here? If so can you (a) pipe ./diagnostics into that, and (b) try temporarily moving it out of the way so that we are sure delta is not using it?

Another idea is, what happens if you add some of the problematic unicode characters to a file and then attempt to view the resulting diff with git (with delta out of the picture)?

And another idea is -- does bat work for you? It also uses unicode box-drawing characters, and also uses less by default.

Another idea is -- have you tried in a different terminal emulator?

I'm sorry this investigation is dragging on so long! But I don't think there's anything fundamentally wrong with the utf-8 that delta is emitting: lots of people are using it without problems on Windows and other platforms.

dandavison commented 4 years ago

Could it be some sort of default encoding or locale setting? Is the LANG environment variable relevant?

StaticPH commented 4 years ago

Could it be some sort of default encoding or locale setting? Is the LANG environment variable relevant?

LANG has been set to "en_US.UTF-8" for my environment since forever; I've also tried with LC_ALL="en_US.UTF-8". Neither seems to have any impact on this for me.

dandavison commented 4 years ago

Any luck @StaticPH?

StaticPH commented 4 years ago

None, sadly, but I also haven't installed the latest update. Though from what I understand, said update shouldn't have any effect.

daniel-liuzzi commented 3 years ago

FWIW, this happens also when calling delta directly; even the exact same ΓöÇΓöÇΓöÇΓöÇΓ character sequence appears:

image

Interestingly enough, everything renders correctly when delta is called by Git.

image

Versions:

dandavison commented 3 years ago

FWIW, this happens also when calling delta directly; even the exact same ΓöÇΓöÇΓöÇΓöÇΓ character sequence appears: ... Interestingly enough, everything renders correctly when delta is called by Git.

I'm thinking this is because there are two versions of less on your system. When git runs, I think it ensures that its own less has precedence in child processes, and thus delta picks up the good less in that context. So, can you locate the good less (perhaps something like C:\Program Files\Git\usr\bin\less.exe? according to this) and make delta use that (e.g. by setting the DELTA_PAGER env var to that absolute path)?

daniel-liuzzi commented 3 years ago

@dandavison so I checked and sure enough, there were two versions:

Your theory is right; delta uses the one in the system path, whereas Git uses its own.

Setting DELTA_PAGER to v551 fixes the problem. Thanks for the heads up.

daniel-liuzzi commented 3 years ago

And I spoke too soon. Setting DELTA_PAGER does fix the issue... while the less from Gow was still in the system path. The moment I uninstalled Gow (which I really don't need anyway) calling delta directly stops working.

image

It's as though delta still needs less to be in system path (even when it ultimately ends up not using it.)

dandavison commented 3 years ago

@daniel-liuzzi the specific error in your screenshot is caused by the absence of the diff command -- I guess Gow was providing diff in addition to less.

dandavison commented 3 years ago

I'm hoping that the DELTA_PAGER fix does work and that delta a b will work with diff present!

Tangentially -- you might be interested in #197 which discusses Chocolatey and less on Windows.

daniel-liuzzi commented 3 years ago

@daniel-liuzzi the specific error in your screenshot is caused by the absence of the diff command -- I guess Gow was providing diff in addition to less.

You're right again. I wrongly assumed "diff" in the error was probably git diff getting mangled up somehow. I didn't even know about the diff command! Anyway, since I was unable to source another diff, for now I just reinstalled Gow and now everything is good!

I used to use Chocolatey few years ago, but have since switched to Scoop which works beautifully.

dandavison commented 3 years ago

@StaticPH have you tried setting DELTA_PAGER (or BAT_PAGER if you're on an older version of delta) to the absolute path to less v551?

dandavison commented 3 years ago

@StaticPH I think that the incorrect output you're seeing must be due to an inadequate less version being picked up, or a problem with your terminal emulator. I'll close this for now but feel free to add more if you haven't got to the bottom of it.

StaticPH commented 3 years ago

Okay, so a number of things have changed on my end, and I'm not entirely certain which change(s) are responsible, but something changed and now I'm not seeing the issue anymore in mintty. And in a pleasant surprise, it even renders properly in the classic Windows Command Prompt (not pictured below). If nothing else, at least we can now establish a known-good environment for this. 2020-12-06 04 58 58  9d635bb204f4 Please pay no mind to the grammatical error in the screen capture I created at some ungodly hour of the morning. Also, the line decorations having small spaces between them is a side-effect of having rendered my terminal output as html to capture the whole thing as a single image. They're still nice and smooth in the terminal itself.