Closed infokiller closed 2 years ago
Hi @infokiller (and @nymous) thanks. Would you be able to use make benchmark
in the delta repository to obtain timings? You would need to install the Rust dev tool chain but that should be straightforward. If you can't install the dev tool chain, you can still obtain a timing for your installed version of delta by editing the Makefile target named benchmark
to change the path to the delta executable.
Thanks for the quick response @dandavison (and wonderful project of course!). Yes, I didn't know about the benchmark target, I'll run it and post results.
Great! (And you'll see you need to install hyperfine
).
What's your OS?
❯ uname -a
Linux desk18 5.14.18-hardened1-1-hardened #1 SMP PREEMPT Sun, 14 Nov 2021 16:10:03 +0000 x86_64 GNU/Linux
Benchmark results are not significantly different:
❯ make benchmark
cargo build --release
Compiling git-delta v0.10.2 (/home/user/src/config/delta)
Finished release [optimized] target(s) in 11.10s
git log -p 23c292d3f25c67082a2ba315a187268be1a9b0ab > /tmp/delta-benchmark-input.gitdiff
hyperfine 'target/release/delta --no-gitconfig < /tmp/delta-benchmark-input.gitdiff > /dev/null'
Benchmark 1: target/release/delta --no-gitconfig < /tmp/delta-benchmark-input.gitdiff > /dev/null
Time (mean ± σ): 2.563 s ± 0.074 s [User: 1.493 s, System: 0.935 s]
Range (min … max): 2.484 s … 2.737 s 10 runs
Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
make benchmark | 37.170 total | 67.89s user | 10.46s system | 210% cpu | 569 MB memory
❯ gco 0.11.0
Previous HEAD position was 2f21f08 Bump version
HEAD is now at 31b402a Document merge conflict handling
❯ make benchmark
cargo build --release
Compiling git-delta v0.11.0 (/home/user/src/config/delta)
Finished release [optimized] target(s) in 10.92s
git log -p 23c292d3f25c67082a2ba315a187268be1a9b0ab > /tmp/delta-benchmark-input.gitdiff
hyperfine --warmup 10 --min-runs 20 \
'target/release/delta --no-gitconfig < /tmp/delta-benchmark-input.gitdiff > /dev/null'
Benchmark 1: target/release/delta --no-gitconfig < /tmp/delta-benchmark-input.gitdiff > /dev/null
Time (mean ± σ): 2.523 s ± 0.057 s [User: 1.456 s, System: 0.882 s]
Range (min … max): 2.445 s … 2.659 s 20 runs
make benchmark | 1:26.93 total | 96.85s user | 27.67s system | 143% cpu | 589 MB memory
Update: scratch that, I think I can actually reproduce it, will post results soon
@dandavison I also tried to manually benchmark it by running delta a.txt b.txt
repeatedly in a shell but wasn't able to reproduce the slowness. Could it be due to different behavior when the output for delta is a terminal or a file?
It seems I can't reproduce the slowness with hyperfine (or the makefile), but I can do it with a simple shell script. Unfortunately, I'm unable to reproduce it in docker (which I wanted to provide you the full steps to reproduce). I'll try to see why I can't reproduce it in docker.
Without docker, here's hyperfine showing no difference:
host% hyperfine '~/.local/pkg/cargo/bin/delta-0.10.2 a.txt b.txt'
Benchmark 1: ~/.local/pkg/cargo/bin/delta-0.10.2 a.txt b.txt
Time (mean ± σ): 333.5 ms ± 47.4 ms [User: 111.6 ms, System: 62.6 ms]
Range (min … max): 260.3 ms … 422.7 ms 10 runs
host% hyperfine '~/.local/pkg/cargo/bin/delta-0.11.0 a.txt b.txt'
Benchmark 1: ~/.local/pkg/cargo/bin/delta-0.11.0 a.txt b.txt
Time (mean ± σ): 294.7 ms ± 36.4 ms [User: 118.6 ms, System: 52.6 ms]
Range (min … max): 240.2 ms … 365.4 ms 10 runs
Again without docker, I can reproduce it with a shell function (almost 10x slower):
❯ zsh --norcs
host% benchmark-command() {
local num_runs="${1:-100}"
shift
local before
before=$(($(date +%s%N) / 1000000))
printf 'Running command "%s" %d times\n' "$*" "${num_runs}"
for _ in $(seq 1 "${num_runs}"); do
eval -- "$@" > /dev/null
done
local after
after=$(($(date +%s%N) / 1000000))
local per_run
per_run=$(bc < <(printf 'scale=2; %d/%d\n' $((after - before)) "${num_runs}"))
echo "${per_run} ms per run ($((after - before)) ms total)"
}
host% benchmark-command 10 '~/.local/pkg/cargo/bin/delta-0.10.2 a.txt b.txt'
Running command "~/.local/pkg/cargo/bin/delta-0.10.2 a.txt b.txt" 10 times
172.20 ms per run (1722 ms total)
host% benchmark-command 10 '~/.local/pkg/cargo/bin/delta-0.11.0 a.txt b.txt'
Running command "~/.local/pkg/cargo/bin/delta-0.11.0 a.txt b.txt" 10 times
1661.10 ms per run (16611 ms total)
I suspected something in my environment is the issue but I get the same results with a clean env (running env -i bash --noprofile --norc
).
I will try to reproduce it later.
@infokiller do you only see the slowness when you are doing delta a b
and not when doing git diff
or git diff | delta
?
Could you try this branch and see if it changes your timings? 824-slowness-investigation
I wonder what is going on with the difference between hyperfine and your shell function. I'd be interested to see each of the 10 timings that are being averaged by your shell function, to see what the distribution is like.
I've used your shell function, and hyperfine, on delta src/delta.rs src/paint.rs
on intel and M1 MacOS and there's no difference similar to what you see.
(I did, I am fairly certain, discover the interesting but irrelevant fact that the binary from homebrew is 1.5x faster on the M1 machine than cargo build --release
-- I want to know why!)
@infokiller do you only see the slowness when you are doing
delta a b
and not when doinggit diff
orgit diff | delta
?Could you try this branch and see if it changes your timings?
824-slowness-investigation
I see it with git diff | delta
as well. I'll try that branch and give an update.
I wonder what is going on with the difference between hyperfine and your shell function.
Does delta behave differently when stdin is not a terminal?
I'd be interested to see each of the 10 timings that are being averaged by your shell function, to see what the distribution is like.
I can modify it to print it, but it may introduce some measurement overhead (which still exists now, but is averaged out across all runs).
I've used your shell function, and hyperfine, on
delta src/delta.rs src/paint.rs
on intel and M1 MacOS and there's no difference similar to what you see.(I did, I am fairly certain, discover the interesting but irrelevant fact that the binary from homebrew is 1.5x faster on the M1 machine than
cargo build --release
-- I want to know why!)
That's interesting indeed, if you figure out why let me know as I'm curious about it too.
Does delta behave differently when stdin is not a terminal?
Yes:
git diff --no-index arg1 arg2
, otherwise fail with an errror messageCould you wrap the call to delta (0.11) in a shell script and dump env | sort
somewhere both for the slow and the normal case? Are any noticeable variables changed when you diff the env dumps? Can you trigger it with --no-gitconfig
in your benchmark-command()
, or when setting HOME=/none
before calling delta?
I see this behavior too - can't reproduce in docker either but does happen on my normal machine 100% of the time. On the off chance that a more precise description of what I'm observing is helpful I can provide that, since I don't see one in here yet.
If running git diff | delta
with no gitconfig
, or just git diff
with pager.diff = delta
in gitconfig, the eventual first line of the diff appears on my pager instantly. In my dotfiles repository right now, that's Δ config/sx/sxrc
, correctly formatted in bold + underlined yellow text. It freezes with just that one line on the pager for about half of a second, then the screen instantly changes to the correct full diff.
Thanks @jebaum. Are you able to build the project from the git repo using cargo (instructions)? If so I would love to know whether you see the same problem on branch 824-slowness-investigation
.
(Also if you could let us know your OS)
Yep can do, will report back what I see later today. Thanks for being on top of it!
Running arch, kernel 5.15.3-arch1-1. git version 2.34.1 in case that matters
That branch fixes the issue for me
That branch fixes the issue for me
Thanks @jebaum! Is there any chance you'd be able to tell us which commit first has the problem, or which git tags do/don't have the problem? But you've already been very helpful, so no worries if you are busy.
Hm. Any thoughts @th1000s? The branch that fixes the issue is https://github.com/dandavison/delta/pull/825, which gets rid of the sysinfo::System
calls entirely.
I can definitely bisect it, shouldn't take too long. Will get to that by tomorrow. Lease I can do to give back a little bit to fantastic OSS
@infokiller if by any chance you'd be able to try branch #825 that would be fantastic.
Yes, that is sysinfo
querying absolutely all processes (and there being only 2 or 3 in a docker container it is not noticeable). If I only run that part of delta[3] (git log -p | time delta
), this takes ~ 250 ms, of which ~ 60 are delta startup time, so 190 to query everything. By comparison, time ps auxf
- which still has to print everything - is done in 30 here because it only needs 5k process related syscalls according to strace while delta needs 33k! That could certainly be trimmed down, probably even by calling ps
directly :)
Querying the entire process tree is the last fallback when looking for the related git process and this seems to be hit every time after trying the parent or sibling process. I only hit that path when I have a git wrapper in e.g. ~/bin/git
and never when delta is set as a pager in the git config directly (because then it is always the direct parent).
When you run git log -p | delta
, what does ps auxf | grep 'delt[a]' -B 3
show, A or B? And are the pids of the initial git and delta sequential? What should be tried in case of B is to look for child processes first, but this does not seem to be available in a portable way. Maybe implement that strategy on Linux by looking at /proc/<PID>/task/<PID>/children
directly or guessing more pids (like the sibling via $delta_pid - 1
) first.
A:
thomas 10002 [..] \_ git log -p
thomas 10003 [..] \_ delta
B:
thomas 350000 [..] \_ /bin/sh /home/thomas/bin/git log -p
thomas 350002 [..] | \_ /usr/bin/git log -p
thomas 350001 [..] \_ delta
3:
fn determine_calling_process() -> Option<CallingProcess> {
dbg!(calling_process_cmdline(ProcInfo::new(), describe_calling_process));
std::process::exit(1);
}
That's extremely useful @th1000s, thank you.
I'm worried that this might be affecting a non-negligible proportion of delta users. I'm thinking of releasing a hotfix which removes the last-resort call to find_sibling_process
, but leaves parent_process
and naive_sibling_process
. Does that seem like a reasonable course of action for now?
I've released 0.11.2
which should fix this, but with the cost that in some situations the calling process won't be identified (identification of calling process is needed for --word-diff
/--color-words
, git grep
, git show $commit:$file
, git blame
).
If anyone can confirm that the problem is not present on 0.11.2
that would be great.
I also encountered the slow problem.
I also encountered the slow problem.
Thanks @ttys3. Is it fixed in the latest release (0.11.2)?
I've released
0.11.2
which should fix this, but with the cost that in some situations the calling process won't be identified (identification of calling process is needed for--word-diff
/--color-words
,git grep
,git show $commit:$file
,git blame
).If anyone can confirm that the problem is not present on
0.11.2
that would be great.
I can confirm 0.11.2
fixes it for me! Here's the runtimes I'm seeing:
❯ git checkout 0.10.2 && make && benchmark-command 5 "./target/release/delta Makefile LICENSE"
Alias tip: gco 0.10.2 && make && benchmark-command 5 "./target/release/delta Makefile LICENSE"
HEAD is now at 2f21f08 Bump version
cargo build --release
Compiling git-delta v0.10.2 (/home/user/src/config/delta)
Finished release [optimized] target(s) in 11.49s
make | 11.609 total | 52.93s user | 1.29s system | 467% cpu | 581 MB memory
Running command "./target/release/delta Makefile LICENSE" 5 times
180.20 ms per run (901 ms total)
❯ git checkout 0.11.0 && make && benchmark-command 5 "./target/release/delta Makefile LICENSE"
Alias tip: gco 0.11.0 && make && benchmark-command 5 "./target/release/delta Makefile LICENSE"
Previous HEAD position was 2f21f08 Bump version
HEAD is now at 31b402a Document merge conflict handling
cargo build --release
Compiling git-delta v0.11.0 (/home/user/src/config/delta)
Finished release [optimized] target(s) in 11.78s
make | 11.894 total | 53.73s user | 1.21s system | 461% cpu | 588 MB memory
Running command "./target/release/delta Makefile LICENSE" 5 times
2045.60 ms per run (10228 ms total)
❯ git checkout 0.11.2 && make && benchmark-command 5 "./target/release/delta Makefile LICENSE"
Alias tip: gco 0.11.2 && make && benchmark-command 5 "./target/release/delta Makefile LICENSE"
Previous HEAD position was 31b402a Document merge conflict handling
HEAD is now at fbd1227 Bump version
cargo build --release
Compiling git-delta v0.11.2 (/home/user/src/config/delta)
Finished release [optimized] target(s) in 12.03s
make | 12.146 total | 53.75s user | 1.22s system | 452% cpu | 584 MB memory
Running command "./target/release/delta Makefile LICENSE" 5 times
185.60 ms per run (928 ms total)
(And sorry for the slow response, got caught up with some other stuff)
That's extremely useful @th1000s, thank you.
I'm worried that this might be affecting a non-negligible proportion of delta users. I'm thinking of releasing a hotfix which removes the last-resort call to
find_sibling_process
, but leavesparent_process
andnaive_sibling_process
. Does that seem like a reasonable course of action for now?
Seems like the best course of action to me. It was a 10x slowdown for me (over 2 seconds latency for trivial diffs), so IMO fixing it is more important than the features that depend on identifying the calling process.
@ttys3 @infokiller, thank you!
@ttys3 that's extremely useful information about sysinfo
trying to determine cpu frequency. And it looks like it might only be doing that on Linux, explaining why I see no problem on MacOS.
But, OK, it seems that we still have a question to answer: you're both using Linux, so why does @infokiller see 0.11.2 go back to the same high speed as 0.10.2, whereas @ttys3 reports that 0.11.2 is still slow? How do we investigate this?
Could you wrap the call to delta (0.11) in a shell script and dump
env | sort
somewhere both for the slow and the normal case? Are any noticeable variables changed when you diff the env dumps? Can you trigger it with--no-gitconfig
in yourbenchmark-command()
, or when settingHOME=/none
before calling delta?
Probably not relevant anymore since the issue was identified, but you probably missed that I already said earlier that I reproduced it with a clean env:
I suspected something in my environment is the issue but I get the same results with a clean env (running env -i bash --noprofile --norc). I will try to reproduce it later.
For reference:
0.10.2
does not, in default code paths, call determine_calling_process
0.10.3
started calling it in a default code path (in order to identify git diff --word-diff
)0.11.2
changes the call to determine_calling_process
so that it avoids inspecting an arbitrary number of processes and only inspects 2 processes: parent
and predicted sibling($PID - 1
) in the hope that this would be much faster.Is it possible that something about @ttys3's system triggers the Rust sysinfo
crate to do more expensive things like looking at CPU and that is not the case on @infokiller's system?
Thanks for explaining @dandavison I'm now trying to reproduce the issue in docker (knowing that I need to spin up many processes for the speedup to show).
I wasn't able to reproduce the issue with 10,000 sleep processes running in the container. Is it only the number of processes that caused the slowness?
yes, it has 16 core.
@ttys3 is that sufficient to explain the difference between machines? Other machines might have, say, 4 cores so wouldn't they still exhibit obvious slowness?
yes, it has 16 core.
@ttys3 is that sufficient to explain the difference between machines? Other machines might have, say, 4 cores so wouldn't they still exhibit obvious slowness?
I have a CPU with 8 cores/16 threads.
Does delta behave differently when stdin is not a terminal?
Yes:
- stdin is a tty: if there are two positional arguments then run
git diff --no-index arg1 arg2
, otherwise fail with an errror message- stdin not a tty: run delta as normal on stdin
Is there a way to use delta a b
even when stdin is not a terminal?
Is there a way to use delta a b even when stdin is not a terminal?
No, not currently. What it does essentially is call git diff --no-index a b | delta
(cc @ttys3 so, for the record, it does involve git)
So what we really want is a way to inspect one or two processes that will be fast on all machines and OSs.
@ttys3 is it your conclusion currently that there's no way to do that with the public API of sysinfo
? If so, do you think there's a modification to the sysinfo
crate that it would be reasonable to propose upstream?
@dandavison yes. I think it could be done on the upstream.
Currently, I use my self build with this commit
https://github.com/ttys3/delta/commit/d147fe638bd95e6d765f54fe6aeda6599a315e04
works fine.
Is there a way to use delta a b even when stdin is not a terminal?
No, not currently. What it does essentially is call
git diff --no-index a b | delta
(cc @ttys3 so, for the record, it does involve git)
actully, I did not call git at all.
I just test it by
time path-to-my-rust-proj/delta/target/release/delta Cargo.toml.old Cargo.toml
time path-to-my-rust-proj/delta/target/release/delta Cargo.toml.old Cargo.toml
Yes what I was saying is that delta actually starts a child git process when you do that. It's not very relevant to our discussion though, I just thought you might want to know!
I was able to reproduce a 10x slowdown with docker (the issue was related to the stdin stuff). @dandavison can you see if this reproduces for you as well?
build_oci_img () {
docker build "$@" - <<'EOF'
FROM archlinux
RUN pacman -Sy > /dev/null
RUN pacman -S --noconfirm --needed coreutils curl gcc bc git > /dev/null
ENV CARGO_HOME=/usr/local
RUN curl -fsSL 'https://sh.rustup.rs' | sh -s -- -y --profile minimal
RUN /usr/local/bin/cargo install --version 0.10.2 git-delta && cp /usr/local/bin/delta /usr/local/bin/delta-0.10.2
RUN /usr/local/bin/cargo install --version 0.11.0 git-delta && cp /usr/local/bin/delta /usr/local/bin/delta-0.11.0
RUN /usr/local/bin/cargo install --version 0.11.2 git-delta && cp /usr/local/bin/delta /usr/local/bin/delta-0.11.2
WORKDIR /repo
EOF
}
build_oci_img
docker run --rm -i "$(build_oci_img -q)" bash <<'EOF'
benchmark-command() {
local num_runs="${1:-100}"
shift
local before
before=$(($(date +%s%N) / 1000000))
printf 'Running command "%s" %d times\n' "$*" "${num_runs}"
for _ in $(seq 1 "${num_runs}"); do
eval -- "$@" > /dev/null
done
local after
after=$(($(date +%s%N) / 1000000))
local per_run
per_run=$(bc < <(printf 'scale=2; %d/%d\n' $((after - before)) "${num_runs}"))
echo "${per_run} ms per run ($((after - before)) ms total)"
}
benchmark-delta-versions() {
for v in 0.10.2 0.11.0 0.11.2; do
echo "Benchmarking version: $v"
benchmark-command 10 "git diff --no-index a b | delta-${v} >| delta-${v}-result"
done
}
main() {
echo aaa > a
echo aab > b
git diff --no-index a b | delta
benchmark-delta-versions
echo "Spinning up 5000 processes"
for i in {1..5000}; do
nohup sleep 30 & &> /dev/null
done
echo "Process counts:"
ps --no-headers -e -o fname | sort | uniq -c
benchmark-delta-versions
}
main
EOF
Result:
Benchmarking version: 0.10.2
Running command "git diff --no-index a b | delta-0.10.2 >| delta-0.10.2-result" 10 times
84.00 ms per run (840 ms total)
Benchmarking version: 0.11.0
Running command "git diff --no-index a b | delta-0.11.0 >| delta-0.11.0-result" 10 times
84.60 ms per run (846 ms total)
Benchmarking version: 0.11.2
Running command "git diff --no-index a b | delta-0.11.2 >| delta-0.11.2-result" 10 times
85.30 ms per run (853 ms total)
Spinning up 5000 processes
Process counts:
1 bash
1 ps
5000 sleep
1 sort
1 uniq
Benchmarking version: 0.10.2
Running command "git diff --no-index a b | delta-0.10.2 >| delta-0.10.2-result" 10 times
86.50 ms per run (865 ms total)
Benchmarking version: 0.11.0
Running command "git diff --no-index a b | delta-0.11.0 >| delta-0.11.0-result" 10 times
829.80 ms per run (8298 ms total)
Benchmarking version: 0.11.2
Running command "git diff --no-index a b | delta-0.11.2 >| delta-0.11.2-result" 10 times
89.60 ms per run (896 ms total)
@dandavison you should be able to just copy-paste the command into bash/zsh (on your Mac).
Currently, I use my self build with this commit
ttys3@d147fe6
works fine.
@ttys3 than you very much for your investigation! The downside of disabling that call is that delta will not be able to handle any of the following:
git diff --word-diff
/ git diff --color-words
git grep
and other grep's (but it will handle rg --json
)git show $commit:$file
git blame
language detection So that would be a shame, and I'd like to find a solution whereby we can do the process inspection in a way that is fast for all users.
@infokiller awesome, thank you very much! I will try the docker script tonight.
I wasn't able to reproduce the issue with 10,000 sleep processes running in the container. Is it only the number of processes that caused the slowness?
It has nothing to do with number of process, it is determine CPU freq cost time
I wasn't able to reproduce the issue with 10,000 sleep processes running in the container. Is it only the number of processes that caused the slowness?
It has nothing to do with number of process, it is determine CPU freq cost time
From what I understand it is affected by the number of processes, and I was able to reproduce it in docker (previously I failed because of stdin not being a terminal which caused delta no to do what I wanted). Can you try the shell commands I posted in https://github.com/dandavison/delta/issues/824#issuecomment-988948010 and see if you get the same result?
@infokiller
could you try to generate a flamegraph to see if the resulted graph match mine?
and my new patch has verified my guess ( please see https://github.com/ttys3/delta/commit/8f5a8356a815bdf4aca3e73a0ee2c765637cad8a )
delta is much slower for me starting from
0.10.3
. I was able to reproduce that without any gitconfig customizations. I will try to add a docker command to reproduce this, but wanted to open an issue in case others run into this.