Open dandavison opened 2 years ago
Although delta tries, it's not always possible for it to identify the process responsible for generating its input.
Why would delta need to identify that process?
Delta does this for a few reasons now:
git grep
, rg
, etc. https://github.com/dandavison/delta/blob/master/src/handlers/grep.rs#L378
standard $file$sep$linenum$sep$code
grep output is for humans -- it cannot be parsed unambiguously -- and so delta only risks trying to parse it if it has reason to believe that it is receiving grep output (however, rg --json
is highlighted without process inspection)git show $revision:/path/to/file.ext
. Since this "code" is arbitrary bytes, delta needs to look at the input command to know that it should be doing this, and what the language is. https://github.com/dandavison/delta/blob/master/src/handlers/git_show_file.rs#L12git blame
output, since the file path with extension is not present in blame output https://github.com/dandavison/delta/blob/master/src/handlers/blame.rs#L80It does violate compositional purity of a unix pipeline! But delta is always the end of a pipeline -- delta output is for humans -- and I think the benefits of doing this win out.
That would mean typing a lot characters twice (or knowing your shells Ctrl-K
/ Ctrl-Y
tricks). What about making delta a wrapper for itself, e.g. delta --pipe grep -n pattern file.txt | head -n 100 | delta
to remove the redundancy (delta-delta notwithstanding)?
The command line could be passed via some IPC mechanism, or by looking for other delta processes and just waiting log enough if the started grep
child process was too fast.
That would mean typing a lot characters twice (or knowing your shells
Ctrl-K
/Ctrl-Y
tricks). What about making delta a wrapper for itself
I'm actually not sure that ergonomics/convenience needs to be a goal here. For that we have git grep
and rg --json
, and (without wishing to stray onto controversial ground) I think that between them those two options give users pretty much all the code searching functionality and speed they could want.
So here what I had in mind was more "making it possible, if inconvenient, to do things with 100% correctness", such as pipe grep/ag/ack
into delta. I was thinking this feature would allow the small number of people who care about doing this to write wrapper shell scripts to overcome the lack of convenience.
It probably also came to mind because I was imagining the implementation would be quite simple, just applying the same parsing code to the CLI option before doing so to the strings obtained from sysinfo.
Would it be worthwhile to ask Git itself to add a method of passing contextual information to the pager it invokes by way of environment variable?
That is, do you think it'd be a good idea if I looked into making that happen?
Wait, Git already prepends this information doesn't it?
Hi @Saklad5,
Looking at the output of git -c core.pager=env show
(don't be tempted to pipe it to grep
!), the env vars that git sets for the child process appear to be:
GIT_CONFIG_PARAMETERS='core.pager'='env'
GIT_EXEC_PATH=/usr/local/Cellar/git/2.35.1/libexec/git-core
GIT_PREFIX=
delta already makes use of GIT_CONFIG_PARAMETERS
(so that -c delta.foo=bar
works) and GIT_PREFIX
(it contains the path to where git was invoked inside the repo, which delta uses to relativize paths).
That is, do you think it'd be a good idea if I looked into making that happen?
I believe that it would be helpful to delta and other git pagers if git were to set an env var containing (a suitable encoding of) the entire command line with which it was invoked. If git developers wanted to do that, l think I'd be happy about it!
Incidentally, delta development has come across some other areas where perhaps there's room for improvement in git. They both relate to git diff
. Firstly it doesn't handle shell process substitution syntax https://github.com/dandavison/delta/issues/666:
$ git diff <(echo a) <(echo b)
error: /dev/fd/11: unsupported file type
fatal: cannot hash /dev/fd/11
Secondly is it a bit odd that the output of git diff /tmp/a /tmp/b
(files outside repo) is the same as git diff tmp/a tmp/b
(files inside repo) https://github.com/dandavison/delta/issues/1014?
$ git -P diff /tmp/a /tmp/b
diff --git a/tmp/a b/tmp/b
index 7898192..6178079 100644
--- a/tmp/a
+++ b/tmp/b
$ diff <(git diff --no-index /tmp/a /tmp/b) <(git -P diff --no-index tmp/a tmp/b); echo $?
0
Incidentally, delta development has come across some other areas where perhaps there's room for improvement in git. They both relate to
git diff
. Firstly it doesn't handle shell process substitution syntax #666:$ git diff <(echo a) <(echo b) error: /dev/fd/11: unsupported file type fatal: cannot hash /dev/fd/11
Secondly is it a bit odd that the output of
git diff /tmp/a /tmp/b
(files outside repo) is the same asgit diff tmp/a tmp/b
(files inside repo) #1014?$ git -P diff /tmp/a /tmp/b diff --git a/tmp/a b/tmp/b index 7898192..6178079 100644 --- a/tmp/a +++ b/tmp/b
$ diff <(git diff --no-index /tmp/a /tmp/b) <(git -P diff --no-index tmp/a tmp/b); echo $? 0
To support file descriptors properly, Git would need to read through them in a single pass. I'm not sure how easy that would be.
As for passing the entire command line, that seems a bit heavy-handed and insecure: most people probably wouldn't expect a pager to get all that. I hid those comments because I actually think it already provides this functionality in a form Delta could use: you can even take over the entire command from Git and control the internals yourself as an "external diff driver".
To support file descriptors properly, Git would need to read through them in a single pass. I'm not sure how easy that would be.
There have been two patches submitted to git development lists that superficially seem to do this. One of them is here: https://marc.info/?l=git&m=141347274420939&w=2 (see links from #666)
I actually think it already provides this functionality in a form Delta could use: you can even take over the entire command from Git and control the internals yourself as an "external diff driver".
If you have time to expand this proposal that would be great. I've been meaning to learn more about that but have not done so yet. One thing to note is that delta handles git diff
, git grep
, git blame
, git show
, git log
etc output.
the env vars that git sets for the child process appear to be:
GIT_PID
would be quite nice to have; we have already solved the problem of finding the git process and its command line, but it would be simpler with the pid.
Git has two broad types of commands: "porcelain" (for humans) and "plumbing" (for machines). Porcelain commands are almost exclusively wrappers around plumbing commands, and plumbing commands allow you to do pretty much anything as you see fit.
Using GIT_EXTERNAL_DIFF
basically overrides the normal git diff
wrapper, passing all the input it usually gets for use as you see fit. As a bonus, the plumbing commands are specifically designed to be extremely stable between Git versions, so you don't have to worry about building a house of cards if you use them.
OK, but which application would compute the diff if not git?
OK, but which application would compute the diff if not git?
That's the thing: git diff
is just a wrapper for the low-level commands, and it is outright encouraged for tools like Delta to run them directly. Take a look at this guide on your choices.
The basic idea here is that Delta gets configured as an external diff driver, and proceeds to run the low-level Git commands it needs to do what it wants. This also stops Git from doing unnecessary work like the usual formatting.
Just stumbled upon the same issue. Can this behavior be tweaked somehow?
cc @th1000s Although delta tries, it's not always possible for it to identify the process (e.g.
grep
/blame
/git show
etc) responsible for generating its input. This is because when one doesgrep pattern file.txt | delta
, the grep process may exit before delta interrogates the running system processes. (That outcome is somewhat less likely when one doesgrep pattern large_codebase/**/*.rs
).I suggest adding a command line option allowing users to do things like
grep -n pattern file.txt | head -n 100 | delta --input-command 'grep -n'
.In terms of implementation, the parent process is cached on first call via
lazy_static
, and so we will need to find some way of getting the user options to that call.