dandavison / delta

A syntax-highlighting pager for git, diff, grep, and blame output
https://dandavison.github.io/delta/
MIT License
24.03k stars 390 forks source link

🐛 delta is much slower starting with `0.10.3` (`0.11.0` included) #824

Closed infokiller closed 2 years ago

infokiller commented 2 years ago

delta is much slower for me starting from 0.10.3. I was able to reproduce that without any gitconfig customizations. I will try to add a docker command to reproduce this, but wanted to open an issue in case others run into this.

dandavison commented 2 years ago

Hi @infokiller (and @nymous) thanks. Would you be able to use make benchmark in the delta repository to obtain timings? You would need to install the Rust dev tool chain but that should be straightforward. If you can't install the dev tool chain, you can still obtain a timing for your installed version of delta by editing the Makefile target named benchmark to change the path to the delta executable.

infokiller commented 2 years ago

Thanks for the quick response @dandavison (and wonderful project of course!). Yes, I didn't know about the benchmark target, I'll run it and post results.

dandavison commented 2 years ago

Great! (And you'll see you need to install hyperfine).

dandavison commented 2 years ago

What's your OS?

infokiller commented 2 years ago
❯ uname -a
Linux desk18 5.14.18-hardened1-1-hardened #1 SMP PREEMPT Sun, 14 Nov 2021 16:10:03 +0000 x86_64 GNU/Linux
infokiller commented 2 years ago

Benchmark results are not significantly different:

❯ make benchmark
cargo build --release
   Compiling git-delta v0.10.2 (/home/user/src/config/delta)
    Finished release [optimized] target(s) in 11.10s
git log -p 23c292d3f25c67082a2ba315a187268be1a9b0ab > /tmp/delta-benchmark-input.gitdiff
hyperfine 'target/release/delta --no-gitconfig < /tmp/delta-benchmark-input.gitdiff > /dev/null'
Benchmark 1: target/release/delta --no-gitconfig < /tmp/delta-benchmark-input.gitdiff > /dev/null
  Time (mean ± σ):      2.563 s ±  0.074 s    [User: 1.493 s, System: 0.935 s]
  Range (min … max):    2.484 s …  2.737 s    10 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

make benchmark | 37.170 total | 67.89s user | 10.46s system | 210% cpu | 569 MB memory
❯ gco 0.11.0
Previous HEAD position was 2f21f08 Bump version
HEAD is now at 31b402a Document merge conflict handling
❯ make benchmark
cargo build --release
   Compiling git-delta v0.11.0 (/home/user/src/config/delta)
    Finished release [optimized] target(s) in 10.92s
git log -p 23c292d3f25c67082a2ba315a187268be1a9b0ab > /tmp/delta-benchmark-input.gitdiff
hyperfine --warmup 10 --min-runs 20 \
        'target/release/delta --no-gitconfig < /tmp/delta-benchmark-input.gitdiff > /dev/null'
Benchmark 1: target/release/delta --no-gitconfig < /tmp/delta-benchmark-input.gitdiff > /dev/null
  Time (mean ± σ):      2.523 s ±  0.057 s    [User: 1.456 s, System: 0.882 s]
  Range (min … max):    2.445 s …  2.659 s    20 runs

make benchmark | 1:26.93 total | 96.85s user | 27.67s system | 143% cpu | 589 MB memory
infokiller commented 2 years ago

Update: scratch that, I think I can actually reproduce it, will post results soon

@dandavison I also tried to manually benchmark it by running delta a.txt b.txt repeatedly in a shell but wasn't able to reproduce the slowness. Could it be due to different behavior when the output for delta is a terminal or a file?

infokiller commented 2 years ago

It seems I can't reproduce the slowness with hyperfine (or the makefile), but I can do it with a simple shell script. Unfortunately, I'm unable to reproduce it in docker (which I wanted to provide you the full steps to reproduce). I'll try to see why I can't reproduce it in docker.

Without docker, here's hyperfine showing no difference:

host% hyperfine '~/.local/pkg/cargo/bin/delta-0.10.2 a.txt b.txt'
Benchmark 1: ~/.local/pkg/cargo/bin/delta-0.10.2 a.txt b.txt
  Time (mean ± σ):     333.5 ms ±  47.4 ms    [User: 111.6 ms, System: 62.6 ms]
  Range (min … max):   260.3 ms … 422.7 ms    10 runs

host% hyperfine '~/.local/pkg/cargo/bin/delta-0.11.0 a.txt b.txt'
Benchmark 1: ~/.local/pkg/cargo/bin/delta-0.11.0 a.txt b.txt
  Time (mean ± σ):     294.7 ms ±  36.4 ms    [User: 118.6 ms, System: 52.6 ms]
  Range (min … max):   240.2 ms … 365.4 ms    10 runs

Again without docker, I can reproduce it with a shell function (almost 10x slower):


❯ zsh --norcs
host% benchmark-command() {
  local num_runs="${1:-100}"
  shift
  local before
  before=$(($(date +%s%N) / 1000000))
  printf 'Running command "%s" %d times\n' "$*" "${num_runs}"
  for _ in $(seq 1 "${num_runs}"); do
    eval -- "$@" > /dev/null
  done
  local after
  after=$(($(date +%s%N) / 1000000))
  local per_run
  per_run=$(bc < <(printf 'scale=2; %d/%d\n' $((after - before)) "${num_runs}"))
  echo "${per_run} ms per run ($((after - before)) ms total)"
}

host% benchmark-command 10 '~/.local/pkg/cargo/bin/delta-0.10.2 a.txt b.txt'
Running command "~/.local/pkg/cargo/bin/delta-0.10.2 a.txt b.txt" 10 times
172.20 ms per run (1722 ms total)
host% benchmark-command 10 '~/.local/pkg/cargo/bin/delta-0.11.0 a.txt b.txt'
Running command "~/.local/pkg/cargo/bin/delta-0.11.0 a.txt b.txt" 10 times
1661.10 ms per run (16611 ms total)
infokiller commented 2 years ago

I suspected something in my environment is the issue but I get the same results with a clean env (running env -i bash --noprofile --norc). I will try to reproduce it later.

dandavison commented 2 years ago

@infokiller do you only see the slowness when you are doing delta a b and not when doing git diff or git diff | delta?

Could you try this branch and see if it changes your timings? 824-slowness-investigation

dandavison commented 2 years ago

I wonder what is going on with the difference between hyperfine and your shell function. I'd be interested to see each of the 10 timings that are being averaged by your shell function, to see what the distribution is like.

I've used your shell function, and hyperfine, on delta src/delta.rs src/paint.rs on intel and M1 MacOS and there's no difference similar to what you see.

(I did, I am fairly certain, discover the interesting but irrelevant fact that the binary from homebrew is 1.5x faster on the M1 machine than cargo build --release -- I want to know why!)

infokiller commented 2 years ago

@infokiller do you only see the slowness when you are doing delta a b and not when doing git diff or git diff | delta?

Could you try this branch and see if it changes your timings? 824-slowness-investigation

I see it with git diff | delta as well. I'll try that branch and give an update.

infokiller commented 2 years ago

I wonder what is going on with the difference between hyperfine and your shell function.

Does delta behave differently when stdin is not a terminal?

I'd be interested to see each of the 10 timings that are being averaged by your shell function, to see what the distribution is like.

I can modify it to print it, but it may introduce some measurement overhead (which still exists now, but is averaged out across all runs).

I've used your shell function, and hyperfine, on delta src/delta.rs src/paint.rs on intel and M1 MacOS and there's no difference similar to what you see.

(I did, I am fairly certain, discover the interesting but irrelevant fact that the binary from homebrew is 1.5x faster on the M1 machine than cargo build --release -- I want to know why!)

That's interesting indeed, if you figure out why let me know as I'm curious about it too.

dandavison commented 2 years ago

Does delta behave differently when stdin is not a terminal?

Yes:

th1000s commented 2 years ago

Could you wrap the call to delta (0.11) in a shell script and dump env | sort somewhere both for the slow and the normal case? Are any noticeable variables changed when you diff the env dumps? Can you trigger it with --no-gitconfig in your benchmark-command(), or when setting HOME=/none before calling delta?

jebaum commented 2 years ago

I see this behavior too - can't reproduce in docker either but does happen on my normal machine 100% of the time. On the off chance that a more precise description of what I'm observing is helpful I can provide that, since I don't see one in here yet.

If running git diff | delta with no gitconfig, or just git diff with pager.diff = delta in gitconfig, the eventual first line of the diff appears on my pager instantly. In my dotfiles repository right now, that's Δ config/sx/sxrc, correctly formatted in bold + underlined yellow text. It freezes with just that one line on the pager for about half of a second, then the screen instantly changes to the correct full diff.

dandavison commented 2 years ago

Thanks @jebaum. Are you able to build the project from the git repo using cargo (instructions)? If so I would love to know whether you see the same problem on branch 824-slowness-investigation.

(Also if you could let us know your OS)

jebaum commented 2 years ago

Yep can do, will report back what I see later today. Thanks for being on top of it!

Running arch, kernel 5.15.3-arch1-1. git version 2.34.1 in case that matters

jebaum commented 2 years ago

That branch fixes the issue for me

dandavison commented 2 years ago

That branch fixes the issue for me

Thanks @jebaum! Is there any chance you'd be able to tell us which commit first has the problem, or which git tags do/don't have the problem? But you've already been very helpful, so no worries if you are busy.

Hm. Any thoughts @th1000s? The branch that fixes the issue is https://github.com/dandavison/delta/pull/825, which gets rid of the sysinfo::System calls entirely.

jebaum commented 2 years ago

I can definitely bisect it, shouldn't take too long. Will get to that by tomorrow. Lease I can do to give back a little bit to fantastic OSS

dandavison commented 2 years ago

@infokiller if by any chance you'd be able to try branch #825 that would be fantastic.

th1000s commented 2 years ago

Yes, that is sysinfo querying absolutely all processes (and there being only 2 or 3 in a docker container it is not noticeable). If I only run that part of delta[3] (git log -p | time delta), this takes ~ 250 ms, of which ~ 60 are delta startup time, so 190 to query everything. By comparison, time ps auxf - which still has to print everything - is done in 30 here because it only needs 5k process related syscalls according to strace while delta needs 33k! That could certainly be trimmed down, probably even by calling ps directly :)

Querying the entire process tree is the last fallback when looking for the related git process and this seems to be hit every time after trying the parent or sibling process. I only hit that path when I have a git wrapper in e.g. ~/bin/git and never when delta is set as a pager in the git config directly (because then it is always the direct parent).

When you run git log -p | delta, what does ps auxf | grep 'delt[a]' -B 3 show, A or B? And are the pids of the initial git and delta sequential? What should be tried in case of B is to look for child processes first, but this does not seem to be available in a portable way. Maybe implement that strategy on Linux by looking at /proc/<PID>/task/<PID>/children directly or guessing more pids (like the sibling via $delta_pid - 1) first.

A:
thomas   10002  [..]          \_ git log -p
thomas   10003  [..]          \_ delta

B:
thomas     350000  [..]     \_ /bin/sh /home/thomas/bin/git log -p
thomas     350002  [..]     |   \_ /usr/bin/git log -p
thomas     350001  [..]     \_ delta

3:
fn determine_calling_process() -> Option<CallingProcess> {
    dbg!(calling_process_cmdline(ProcInfo::new(), describe_calling_process));
    std::process::exit(1);
}
dandavison commented 2 years ago

That's extremely useful @th1000s, thank you.

I'm worried that this might be affecting a non-negligible proportion of delta users. I'm thinking of releasing a hotfix which removes the last-resort call to find_sibling_process, but leaves parent_process and naive_sibling_process. Does that seem like a reasonable course of action for now?

dandavison commented 2 years ago

I've released 0.11.2 which should fix this, but with the cost that in some situations the calling process won't be identified (identification of calling process is needed for --word-diff/--color-words, git grep, git show $commit:$file, git blame).

If anyone can confirm that the problem is not present on 0.11.2 that would be great.

ttys3 commented 2 years ago

I also encountered the slow problem.

dandavison commented 2 years ago

I also encountered the slow problem.

Thanks @ttys3. Is it fixed in the latest release (0.11.2)?

infokiller commented 2 years ago

I've released 0.11.2 which should fix this, but with the cost that in some situations the calling process won't be identified (identification of calling process is needed for --word-diff/--color-words, git grep, git show $commit:$file, git blame).

If anyone can confirm that the problem is not present on 0.11.2 that would be great.

I can confirm 0.11.2 fixes it for me! Here's the runtimes I'm seeing:

❯ git checkout 0.10.2 && make && benchmark-command 5 "./target/release/delta Makefile LICENSE"
Alias tip: gco 0.10.2 && make && benchmark-command 5 "./target/release/delta Makefile LICENSE"
HEAD is now at 2f21f08 Bump version
cargo build --release
   Compiling git-delta v0.10.2 (/home/user/src/config/delta)
    Finished release [optimized] target(s) in 11.49s
make | 11.609 total | 52.93s user | 1.29s system | 467% cpu | 581 MB memory
Running command "./target/release/delta Makefile LICENSE" 5 times
180.20 ms per run (901 ms total)
❯ git checkout 0.11.0 && make && benchmark-command 5 "./target/release/delta Makefile LICENSE"
Alias tip: gco 0.11.0 && make && benchmark-command 5 "./target/release/delta Makefile LICENSE"
Previous HEAD position was 2f21f08 Bump version
HEAD is now at 31b402a Document merge conflict handling
cargo build --release
   Compiling git-delta v0.11.0 (/home/user/src/config/delta)
    Finished release [optimized] target(s) in 11.78s
make | 11.894 total | 53.73s user | 1.21s system | 461% cpu | 588 MB memory
Running command "./target/release/delta Makefile LICENSE" 5 times
2045.60 ms per run (10228 ms total)
❯ git checkout 0.11.2 && make && benchmark-command 5 "./target/release/delta Makefile LICENSE"
Alias tip: gco 0.11.2 && make && benchmark-command 5 "./target/release/delta Makefile LICENSE"
Previous HEAD position was 31b402a Document merge conflict handling
HEAD is now at fbd1227 Bump version
cargo build --release
   Compiling git-delta v0.11.2 (/home/user/src/config/delta)
    Finished release [optimized] target(s) in 12.03s
make | 12.146 total | 53.75s user | 1.22s system | 452% cpu | 584 MB memory
Running command "./target/release/delta Makefile LICENSE" 5 times
185.60 ms per run (928 ms total)

(And sorry for the slow response, got caught up with some other stuff)

infokiller commented 2 years ago

That's extremely useful @th1000s, thank you.

I'm worried that this might be affecting a non-negligible proportion of delta users. I'm thinking of releasing a hotfix which removes the last-resort call to find_sibling_process, but leaves parent_process and naive_sibling_process. Does that seem like a reasonable course of action for now?

Seems like the best course of action to me. It was a 10x slowdown for me (over 2 seconds latency for trivial diffs), so IMO fixing it is more important than the features that depend on identifying the calling process.

dandavison commented 2 years ago

@ttys3 @infokiller, thank you!

@ttys3 that's extremely useful information about sysinfo trying to determine cpu frequency. And it looks like it might only be doing that on Linux, explaining why I see no problem on MacOS.

But, OK, it seems that we still have a question to answer: you're both using Linux, so why does @infokiller see 0.11.2 go back to the same high speed as 0.10.2, whereas @ttys3 reports that 0.11.2 is still slow? How do we investigate this?

infokiller commented 2 years ago

Could you wrap the call to delta (0.11) in a shell script and dump env | sort somewhere both for the slow and the normal case? Are any noticeable variables changed when you diff the env dumps? Can you trigger it with --no-gitconfig in your benchmark-command(), or when setting HOME=/none before calling delta?

Probably not relevant anymore since the issue was identified, but you probably missed that I already said earlier that I reproduced it with a clean env:

I suspected something in my environment is the issue but I get the same results with a clean env (running env -i bash --noprofile --norc). I will try to reproduce it later.

dandavison commented 2 years ago

For reference:

dandavison commented 2 years ago

Is it possible that something about @ttys3's system triggers the Rust sysinfo crate to do more expensive things like looking at CPU and that is not the case on @infokiller's system?

infokiller commented 2 years ago

Thanks for explaining @dandavison I'm now trying to reproduce the issue in docker (knowing that I need to spin up many processes for the speedup to show).

infokiller commented 2 years ago

I wasn't able to reproduce the issue with 10,000 sleep processes running in the container. Is it only the number of processes that caused the slowness?

dandavison commented 2 years ago

yes, it has 16 core.

@ttys3 is that sufficient to explain the difference between machines? Other machines might have, say, 4 cores so wouldn't they still exhibit obvious slowness?

infokiller commented 2 years ago

yes, it has 16 core.

@ttys3 is that sufficient to explain the difference between machines? Other machines might have, say, 4 cores so wouldn't they still exhibit obvious slowness?

I have a CPU with 8 cores/16 threads.

infokiller commented 2 years ago

Does delta behave differently when stdin is not a terminal?

Yes:

  • stdin is a tty: if there are two positional arguments then run git diff --no-index arg1 arg2, otherwise fail with an errror message
  • stdin not a tty: run delta as normal on stdin

Is there a way to use delta a b even when stdin is not a terminal?

dandavison commented 2 years ago

Is there a way to use delta a b even when stdin is not a terminal?

No, not currently. What it does essentially is call git diff --no-index a b | delta (cc @ttys3 so, for the record, it does involve git)

dandavison commented 2 years ago

So what we really want is a way to inspect one or two processes that will be fast on all machines and OSs.

@ttys3 is it your conclusion currently that there's no way to do that with the public API of sysinfo? If so, do you think there's a modification to the sysinfo crate that it would be reasonable to propose upstream?

ttys3 commented 2 years ago

@dandavison yes. I think it could be done on the upstream.

Currently, I use my self build with this commit

https://github.com/ttys3/delta/commit/d147fe638bd95e6d765f54fe6aeda6599a315e04

works fine.

ttys3 commented 2 years ago

Is there a way to use delta a b even when stdin is not a terminal?

No, not currently. What it does essentially is call git diff --no-index a b | delta (cc @ttys3 so, for the record, it does involve git)

actully, I did not call git at all.

I just test it by

time path-to-my-rust-proj/delta/target/release/delta Cargo.toml.old Cargo.toml
dandavison commented 2 years ago

time path-to-my-rust-proj/delta/target/release/delta Cargo.toml.old Cargo.toml

Yes what I was saying is that delta actually starts a child git process when you do that. It's not very relevant to our discussion though, I just thought you might want to know!

infokiller commented 2 years ago

I was able to reproduce a 10x slowdown with docker (the issue was related to the stdin stuff). @dandavison can you see if this reproduces for you as well?

build_oci_img () {
  docker build "$@" - <<'EOF'
  FROM archlinux
  RUN pacman -Sy > /dev/null
  RUN pacman -S --noconfirm --needed coreutils curl gcc bc git > /dev/null
  ENV CARGO_HOME=/usr/local
  RUN curl -fsSL 'https://sh.rustup.rs' | sh -s -- -y --profile minimal
  RUN /usr/local/bin/cargo install --version 0.10.2 git-delta && cp /usr/local/bin/delta /usr/local/bin/delta-0.10.2
  RUN /usr/local/bin/cargo install --version 0.11.0 git-delta && cp /usr/local/bin/delta /usr/local/bin/delta-0.11.0
  RUN /usr/local/bin/cargo install --version 0.11.2 git-delta && cp /usr/local/bin/delta /usr/local/bin/delta-0.11.2
  WORKDIR /repo
EOF
}

build_oci_img
docker run --rm -i "$(build_oci_img -q)" bash <<'EOF'
benchmark-command() {
  local num_runs="${1:-100}"
  shift
  local before
  before=$(($(date +%s%N) / 1000000))
  printf 'Running command "%s" %d times\n' "$*" "${num_runs}"
  for _ in $(seq 1 "${num_runs}"); do
    eval -- "$@" > /dev/null
  done
  local after
  after=$(($(date +%s%N) / 1000000))
  local per_run
  per_run=$(bc < <(printf 'scale=2; %d/%d\n' $((after - before)) "${num_runs}"))
  echo "${per_run} ms per run ($((after - before)) ms total)"
}

benchmark-delta-versions() {
  for v in 0.10.2 0.11.0 0.11.2; do
    echo "Benchmarking version: $v"
    benchmark-command 10 "git diff --no-index a b | delta-${v} >| delta-${v}-result"
  done
}

main() {
  echo aaa > a
  echo aab > b
  git diff --no-index a b | delta
  benchmark-delta-versions
  echo "Spinning up 5000 processes"
  for i in {1..5000}; do
    nohup sleep 30 & &> /dev/null
  done
  echo "Process counts:"
  ps --no-headers -e -o fname | sort | uniq -c
  benchmark-delta-versions
}

main
EOF

Result:

Benchmarking version: 0.10.2
Running command "git diff --no-index a b | delta-0.10.2 >| delta-0.10.2-result" 10 times
84.00 ms per run (840 ms total)
Benchmarking version: 0.11.0
Running command "git diff --no-index a b | delta-0.11.0 >| delta-0.11.0-result" 10 times
84.60 ms per run (846 ms total)
Benchmarking version: 0.11.2
Running command "git diff --no-index a b | delta-0.11.2 >| delta-0.11.2-result" 10 times
85.30 ms per run (853 ms total)
Spinning up 5000 processes
Process counts:
      1 bash
      1 ps
   5000 sleep
      1 sort
      1 uniq
Benchmarking version: 0.10.2
Running command "git diff --no-index a b | delta-0.10.2 >| delta-0.10.2-result" 10 times
86.50 ms per run (865 ms total)
Benchmarking version: 0.11.0
Running command "git diff --no-index a b | delta-0.11.0 >| delta-0.11.0-result" 10 times
829.80 ms per run (8298 ms total)
Benchmarking version: 0.11.2
Running command "git diff --no-index a b | delta-0.11.2 >| delta-0.11.2-result" 10 times
89.60 ms per run (896 ms total)
infokiller commented 2 years ago

@dandavison you should be able to just copy-paste the command into bash/zsh (on your Mac).

dandavison commented 2 years ago

Currently, I use my self build with this commit

ttys3@d147fe6

works fine.

@ttys3 than you very much for your investigation! The downside of disabling that call is that delta will not be able to handle any of the following:

So that would be a shame, and I'd like to find a solution whereby we can do the process inspection in a way that is fast for all users.

dandavison commented 2 years ago

@infokiller awesome, thank you very much! I will try the docker script tonight.

ttys3 commented 2 years ago

I wasn't able to reproduce the issue with 10,000 sleep processes running in the container. Is it only the number of processes that caused the slowness?

It has nothing to do with number of process, it is determine CPU freq cost time

infokiller commented 2 years ago

I wasn't able to reproduce the issue with 10,000 sleep processes running in the container. Is it only the number of processes that caused the slowness?

It has nothing to do with number of process, it is determine CPU freq cost time

From what I understand it is affected by the number of processes, and I was able to reproduce it in docker (previously I failed because of stdin not being a terminal which caused delta no to do what I wanted). Can you try the shell commands I posted in https://github.com/dandavison/delta/issues/824#issuecomment-988948010 and see if you get the same result?

ttys3 commented 2 years ago

@infokiller

could you try to generate a flamegraph to see if the resulted graph match mine?

and my new patch has verified my guess ( please see https://github.com/ttys3/delta/commit/8f5a8356a815bdf4aca3e73a0ee2c765637cad8a )