dandavison / delta

A syntax-highlighting pager for git, diff, grep, and blame output
https://dandavison.github.io/delta/
MIT License
24.28k stars 391 forks source link

🚀 Add --input-command option to tell delta the command line responsible for input #799

Open dandavison opened 2 years ago

dandavison commented 2 years ago

cc @th1000s Although delta tries, it's not always possible for it to identify the process (e.g. grep/blame/git show etc) responsible for generating its input. This is because when one does grep pattern file.txt | delta, the grep process may exit before delta interrogates the running system processes. (That outcome is somewhat less likely when one does grep pattern large_codebase/**/*.rs).

I suggest adding a command line option allowing users to do things like grep -n pattern file.txt | head -n 100 | delta --input-command 'grep -n'.

In terms of implementation, the parent process is cached on first call via lazy_static, and so we will need to find some way of getting the user options to that call.

bew commented 2 years ago

Although delta tries, it's not always possible for it to identify the process responsible for generating its input.

Why would delta need to identify that process?

dandavison commented 2 years ago

Delta does this for a few reasons now:

It does violate compositional purity of a unix pipeline! But delta is always the end of a pipeline -- delta output is for humans -- and I think the benefits of doing this win out.

th1000s commented 2 years ago

That would mean typing a lot characters twice (or knowing your shells Ctrl-K / Ctrl-Y tricks). What about making delta a wrapper for itself, e.g. delta --pipe grep -n pattern file.txt | head -n 100 | delta to remove the redundancy (delta-delta notwithstanding)?

The command line could be passed via some IPC mechanism, or by looking for other delta processes and just waiting log enough if the started grep child process was too fast.

dandavison commented 2 years ago

That would mean typing a lot characters twice (or knowing your shells Ctrl-K / Ctrl-Y tricks). What about making delta a wrapper for itself

I'm actually not sure that ergonomics/convenience needs to be a goal here. For that we have git grep and rg --json, and (without wishing to stray onto controversial ground) I think that between them those two options give users pretty much all the code searching functionality and speed they could want.

So here what I had in mind was more "making it possible, if inconvenient, to do things with 100% correctness", such as pipe grep/ag/ack into delta. I was thinking this feature would allow the small number of people who care about doing this to write wrapper shell scripts to overcome the lack of convenience.

It probably also came to mind because I was imagining the implementation would be quite simple, just applying the same parsing code to the CLI option before doing so to the strings obtained from sysinfo.

Saklad5 commented 2 years ago

Would it be worthwhile to ask Git itself to add a method of passing contextual information to the pager it invokes by way of environment variable?

That is, do you think it'd be a good idea if I looked into making that happen?

Saklad5 commented 2 years ago

Wait, Git already prepends this information doesn't it?

dandavison commented 2 years ago

Hi @Saklad5,

Looking at the output of git -c core.pager=env show (don't be tempted to pipe it to grep!), the env vars that git sets for the child process appear to be:

GIT_CONFIG_PARAMETERS='core.pager'='env'
GIT_EXEC_PATH=/usr/local/Cellar/git/2.35.1/libexec/git-core
GIT_PREFIX=

delta already makes use of GIT_CONFIG_PARAMETERS (so that -c delta.foo=bar works) and GIT_PREFIX (it contains the path to where git was invoked inside the repo, which delta uses to relativize paths).

That is, do you think it'd be a good idea if I looked into making that happen?

I believe that it would be helpful to delta and other git pagers if git were to set an env var containing (a suitable encoding of) the entire command line with which it was invoked. If git developers wanted to do that, l think I'd be happy about it!

Incidentally, delta development has come across some other areas where perhaps there's room for improvement in git. They both relate to git diff. Firstly it doesn't handle shell process substitution syntax https://github.com/dandavison/delta/issues/666:

$ git diff <(echo a) <(echo b)
error: /dev/fd/11: unsupported file type
fatal: cannot hash /dev/fd/11

Secondly is it a bit odd that the output of git diff /tmp/a /tmp/b (files outside repo) is the same as git diff tmp/a tmp/b (files inside repo) https://github.com/dandavison/delta/issues/1014?

$ git -P diff /tmp/a /tmp/b
diff --git a/tmp/a b/tmp/b
index 7898192..6178079 100644
--- a/tmp/a
+++ b/tmp/b
$ diff <(git diff --no-index /tmp/a /tmp/b) <(git -P diff --no-index tmp/a tmp/b); echo $?
0
Saklad5 commented 2 years ago

Incidentally, delta development has come across some other areas where perhaps there's room for improvement in git. They both relate to git diff. Firstly it doesn't handle shell process substitution syntax #666:

$ git diff <(echo a) <(echo b)
error: /dev/fd/11: unsupported file type
fatal: cannot hash /dev/fd/11

Secondly is it a bit odd that the output of git diff /tmp/a /tmp/b (files outside repo) is the same as git diff tmp/a tmp/b (files inside repo) #1014?

$ git -P diff /tmp/a /tmp/b
diff --git a/tmp/a b/tmp/b
index 7898192..6178079 100644
--- a/tmp/a
+++ b/tmp/b
$ diff <(git diff --no-index /tmp/a /tmp/b) <(git -P diff --no-index tmp/a tmp/b); echo $?
0

To support file descriptors properly, Git would need to read through them in a single pass. I'm not sure how easy that would be.

As for passing the entire command line, that seems a bit heavy-handed and insecure: most people probably wouldn't expect a pager to get all that. I hid those comments because I actually think it already provides this functionality in a form Delta could use: you can even take over the entire command from Git and control the internals yourself as an "external diff driver".

dandavison commented 2 years ago

To support file descriptors properly, Git would need to read through them in a single pass. I'm not sure how easy that would be.

There have been two patches submitted to git development lists that superficially seem to do this. One of them is here: https://marc.info/?l=git&m=141347274420939&w=2 (see links from #666)

I actually think it already provides this functionality in a form Delta could use: you can even take over the entire command from Git and control the internals yourself as an "external diff driver".

If you have time to expand this proposal that would be great. I've been meaning to learn more about that but have not done so yet. One thing to note is that delta handles git diff, git grep, git blame, git show, git log etc output.

dandavison commented 2 years ago

the env vars that git sets for the child process appear to be:

GIT_PID would be quite nice to have; we have already solved the problem of finding the git process and its command line, but it would be simpler with the pid.

Saklad5 commented 2 years ago

Git has two broad types of commands: "porcelain" (for humans) and "plumbing" (for machines). Porcelain commands are almost exclusively wrappers around plumbing commands, and plumbing commands allow you to do pretty much anything as you see fit.

Using GIT_EXTERNAL_DIFF basically overrides the normal git diff wrapper, passing all the input it usually gets for use as you see fit. As a bonus, the plumbing commands are specifically designed to be extremely stable between Git versions, so you don't have to worry about building a house of cards if you use them.

dandavison commented 2 years ago

OK, but which application would compute the diff if not git?

Saklad5 commented 2 years ago

OK, but which application would compute the diff if not git?

That's the thing: git diff is just a wrapper for the low-level commands, and it is outright encouraged for tools like Delta to run them directly. Take a look at this guide on your choices.

The basic idea here is that Delta gets configured as an external diff driver, and proceeds to run the low-level Git commands it needs to do what it wants. This also stops Git from doing unnecessary work like the usual formatting.

dgudim commented 3 months ago

Just stumbled upon the same issue. Can this behavior be tweaked somehow?