git-for-windows / git

A fork of Git containing Windows-specific patches.
http://gitforwindows.org/
Other
8.38k stars 2.54k forks source link

Extremely slow 'git status --ignored' since 2.27.0, issue not present in 2.26.1 or Linux versions (via WSL) #3318

Open jmiserez opened 3 years ago

jmiserez commented 3 years ago

(Scroll to the bottom for the Github issue boilerplate)

Problem description

Summary

2.27.0 seems to have introduced a performance regression which was not present in 2.26.1 and is not present in the Linux git clients, but is present in Git for Windows 2.27.0 and higher. Namely extremely slow performance (15.1s in 2.32.0 vs. 0.09s in 2.26.1) when running git status --ignored on a repository with deep folder structures (~25 folders deep) containing symlinks (specifically pnpm folders). git status without --ignored works fine (~0.04s), as the folders in question are then ignored. git status --ignored on Linux via WSL (same filesystem, machine, folder) runs fine (~0.55s) as well. The same with the old Git for Windows 2.26.1 runs fast as well (~0.09s).

In my tests, I've reproduced the issue when running Git for Windows 2.27.0 and later on Windows 10 (on a Windows filesystem), but not when running the identical Linux/Ubuntu version on the same Windows filesystem (via WSL). The issue is present for a large number of users in our organization across many different Windows 10 machines, but is never present when using Git for Windows 2.26.1 or earlier on the same machines or when using Linux git. The issue is present even with AV disabled.

EDIT: Repo demonstrating the issue here: https://github.com/git-for-windows/git/issues/3318#issuecomment-882048836 ~I'm sorry that I don't have a specific repository/project to demonstrate this issue, but it does happen across tens of users and several repositories at our organization. We have reduced the depth of our pnpm folder structures as much as possible (was around 100 before), but the issue remains as you can see in the traces below. It's pretty hard to come up with a public repository that replicates the issue cleanly enough to demonstrate, but maybe this information gives you an idea of what could be wrong.~

Traces: Git 2.32.0 Windows vs git 2.32.0 via WSL, no AV

This is with the newest Git. The repo contains symlinked pnpm node_modules directories ~25 folders deep. AV is disabled.

Git for Windows (git version 2.32.0.windows.1) via Git Bash:

$ GIT_TRACE_PERFORMANCE=1 git status
14:10:13.007377 read-cache.c:2381       performance: 0.000646200 s:  read cache .git/index
14:10:13.036649 preload-index.c:159     performance: 0.027805900 s:   preload index
14:10:13.037648 read-cache.c:1682       performance: 0.027982300 s:  refresh index
14:10:13.038648 diff-lib.c:262          performance: 0.000062400 s:  diff-files
14:10:13.040650 unpack-trees.c:1687     performance: 0.000023400 s:    traverse_trees
14:10:13.040650 unpack-trees.c:413      performance: 0.000001600 s:    check_updates
14:10:13.040650 unpack-trees.c:1773     performance: 0.000060800 s:   unpack_trees
14:10:13.040650 diff-lib.c:606          performance: 0.000109000 s:  diff-index
14:10:13.041649 name-hash.c:607         performance: 0.000849500 s:  initialize name hash
14:10:13.047648 trace.c:487             performance: 0.043241000 s: git command: 'C:\Users\jms\AppData\Local\Programs\Git\mingw64\bin\git.exe' status
$ GIT_TRACE_PERFORMANCE=1 git status --ignored
15:48:35.456280 read-cache.c:2381       performance: 0.001131500 s:  read cache .git/index
15:48:35.532280 preload-index.c:159     performance: 0.072352200 s:   preload index
15:48:35.532280 read-cache.c:1682       performance: 0.072507300 s:  refresh index
15:48:35.536281 diff-lib.c:262          performance: 0.000073500 s:  diff-files
15:48:35.540280 unpack-trees.c:1687     performance: 0.000025900 s:    traverse_trees
15:48:35.540280 unpack-trees.c:413      performance: 0.000001600 s:    check_updates
15:48:35.540280 unpack-trees.c:1773     performance: 0.000102400 s:   unpack_trees
15:48:35.540280 diff-lib.c:606          performance: 0.000154700 s:  diff-index
15:48:35.541280 name-hash.c:607         performance: 0.000875600 s:  initialize name hash
15:48:50.643532 trace.c:487             performance: 15.102252600 s: git command: 'C:\Users\jms\AppData\Local\Programs\Git\mingw64\bin\git.exe' status --ignored

Traces: Git on Ubuntu (git version 2.32.0) via WSL on same filesystem/folder:

$ GIT_TRACE_PERFORMANCE=1 git status
14:11:28.677854 read-cache.c:2368       performance: 0.001113400 s:  read cache .git/index
14:11:28.952387 preload-index.c:154     performance: 0.273454800 s:   preload index
14:11:28.952482 read-cache.c:1670       performance: 0.273552600 s:  refresh index
14:11:28.954700 diff-lib.c:262          performance: 0.000066200 s:  diff-files
14:11:28.958355 unpack-trees.c:1685     performance: 0.000017200 s:    traverse_trees
14:11:28.958377 unpack-trees.c:413      performance: 0.000001600 s:    check_updates
14:11:28.958386 unpack-trees.c:1771     performance: 0.000100500 s:   unpack_trees
14:11:28.958391 diff-lib.c:606          performance: 0.000154900 s:  diff-index
14:11:28.959852 name-hash.c:607         performance: 0.000790600 s:  initialize name hash
14:11:29.074925 trace.c:487             performance: 0.402027600 s: git command: git status
$ GIT_TRACE_PERFORMANCE=1 git status --ignored
14:10:31.481193 read-cache.c:2368       performance: 0.001076200 s:  read cache .git/index
14:10:31.754160 preload-index.c:154     performance: 0.271709500 s:   preload index
14:10:31.754254 read-cache.c:1670       performance: 0.271806400 s:  refresh index
14:10:31.756489 diff-lib.c:262          performance: 0.000066000 s:  diff-files
14:10:31.760105 unpack-trees.c:1685     performance: 0.000017100 s:    traverse_trees
14:10:31.760129 unpack-trees.c:413      performance: 0.000001700 s:    check_updates
14:10:31.760138 unpack-trees.c:1771     performance: 0.000090000 s:   unpack_trees
14:10:31.760143 diff-lib.c:606          performance: 0.000145400 s:  diff-index
14:10:31.761585 name-hash.c:607         performance: 0.000789100 s:  initialize name hash
14:10:32.031897 trace.c:487             performance: 0.556233200 s: git command: git status --ignored

Older traces from Git 2.29.0

In Git 2.29.0 there was also an additional "dir.c:2824" entry in the trace just before the last line, where the bulk of the time was spent. This isn't shown in the trace anymore with 2.32.0, but maybe this is relevant.

Traces: For reference Git 2.26.1 Windows on same repository

$ GIT_TRACE_PERFORMANCE=1 git status
16:18:28.043282 read-cache.c:2308       performance: 0.000733000 s:  read cache .git/index
16:18:28.076282 preload-index.c:152     performance: 0.031700700 s:   preload index
16:18:28.076282 read-cache.c:1622       performance: 0.031870900 s:  refresh index
16:18:28.077750 diff-lib.c:251          performance: 0.000067800 s:  diff-files
16:18:28.080759 unpack-trees.c:1596     performance: 0.000021800 s:    traverse_trees
16:18:28.080759 unpack-trees.c:377      performance: 0.000000000 s:    check_updates
16:18:28.080759 unpack-trees.c:1695     performance: 0.000077500 s:   unpack_trees
16:18:28.080759 diff-lib.c:537          performance: 0.000142600 s:  diff-index
16:18:28.081785 name-hash.c:600         performance: 0.000854700 s:   initialize name hash
16:18:28.084407 dir.c:2720              performance: 0.003891700 s:  read directory
16:18:28.088414 trace.c:475             performance: 0.049136600 s: git command: 'C:\Users\jms\AppData\Local\Programs\Git\mingw64\bin\git.exe' status
$ GIT_TRACE_PERFORMANCE=1 git status --ignored
16:18:32.286790 read-cache.c:2308       performance: 0.000658700 s:  read cache .git/index
16:18:32.316127 preload-index.c:152     performance: 0.027360500 s:   preload index
16:18:32.316297 read-cache.c:1622       performance: 0.027553800 s:  refresh index
16:18:32.317306 diff-lib.c:251          performance: 0.000055400 s:  diff-files
16:18:32.319305 unpack-trees.c:1596     performance: 0.000020900 s:    traverse_trees
16:18:32.319305 unpack-trees.c:377      performance: 0.000000100 s:    check_updates
16:18:32.319305 unpack-trees.c:1695     performance: 0.000054500 s:   unpack_trees
16:18:32.319305 diff-lib.c:537          performance: 0.000099400 s:  diff-index
16:18:32.320304 name-hash.c:600         performance: 0.000851800 s:   initialize name hash
16:18:32.376260 dir.c:2720              performance: 0.056734000 s:  read directory
16:18:32.379295 trace.c:475             performance: 0.095973600 s: git command: 'C:\Users\jms\AppData\Local\Programs\Git\mingw64\bin\git.exe' status --ignored

Github Issue boilerplate info

Setup

$ git --version --build-options
git version 2.32.0.windows.1
cpu: x86_64
built from commit: 4c204998d0e156d13d81abe1d1963051b1418fc0
sizeof-long: 4
sizeof-size_t: 8
shell-path: /bin/sh
feature: fsmonitor--daemon
$ cmd.exe /c ver
Microsoft Windows [Version 10.0.19042.1083]
"%USERPROFILE%\AppData\Local\Programs\Git\etc\install-options.txt"
Editor Option: Nano
Custom Editor Path: 
Default Branch Option:  
Path Option: Cmd
SSH Option: OpenSSH
Tortoise Option: false
CURL Option: OpenSSL
CRLF Option: LFOnly
Bash Terminal Option: MinTTY
Git Pull Behavior Option: Rebase
Use Credential Manager: Core
Performance Tweaks FSCache: Enabled
Enable Symlinks: Disabled
Enable Pseudo Console Support: Disabled
Enable FSMonitor: Disabled

Details

Issue is present regardless of terminal. Specifically both from Git Bash and also when launched directly by IntelliJ.

Running git status --ignored on a repository with deep folder structures (~25 folders) containing symlinks, specifically pnpm node_modules folders.

git status --ignored

Similar performance to Git for Windows 2.26.1 or any recent Git version on Linux.

Extremely slow performance (15.1s) when compared to Git for Windows 2.26.1 (0.09s), or any recent Linux version via WSL (0.55s).

Unfortunately at this time I have not been able to create a suitable public repository with such a deep PNPM folder structure, as many of the artifacts and repositories in question are not public.

jmiserez commented 3 years ago

I've updated the issue with better numbers:

So while our AV (Trend Micro) does have a large 4x impact, the issue itself looks to be responsible for a ~157x impact in this case, when comparing Git for Windows 2.32.0 vs Git for Windows 2.26.1.

dscho commented 3 years ago

@jmiserez could you revert https://github.com/git-for-windows/git/commit/515ff6a7d597fd8083c63a983f624ccd78bb2a4c and test the result? You have multiple options how to do that:

  1. install Git for Windows' SDK,
  2. sdk cd git,
  3. git revert 515ff6a7d597fd8083c63a983f624ccd78bb2a4c
  4. build Git via make -j$(nproc)
  5. test in-place via ./git --exec-path="$PWD" -C <directory> <command>?
  6. open a PR?

or

  1. clone https://github.com/git-for-windows/git
  2. create a new branch
  3. git revert 515ff6a7d597fd8083c63a983f624ccd78bb2a4c
  4. edit .github/workflows/git-artifacts.yml by inserting a push: line before the workflow_dispatch trigger
  5. commit
  6. push to your fork and let GitHub Actions create an installer and portable Git
  7. test
  8. open a PR?
jmiserez commented 3 years ago

@dscho Thank you very much for the help. I assume you meant https://github.com/git-for-windows/git/commit/2f55cc471e41028d08bd6ac33c25c1b587bb2660 rather than https://github.com/git-for-windows/git/commit/515ff6a7d597fd8083c63a983f624ccd78bb2a4c?

Either way, neither of the 2 commits is the culprit. And my initial hypothesis at https://github.com/git-for-windows/git/pull/2637#issuecomment-878931871 was wrong. I've removed the PR link from the issue above and edited the title.

But I found the offending commit: https://github.com/git-for-windows/git/commit/8d92fb292706fd8d13cfe55353b2ec9345153a3e ("dir: replace exponential algorithm with a linear one") is where the performance regression happened.

So this looks like it isn't actually a Git for Windows specific bug, but unfortunately only manifests on Windows. At the moment I don't know exactly which part of the code is the problem, as it's quite a substantial rewrite. I think we really need an example/test repository to get to the root cause, I'll see if I can come up with one that I can share. And maybe the original author of that commit has an idea of what happened.

To make sure I also tested the other 2 commits:

Sidenote: the downloadable SDK environment is a pretty awesome and worked smoothly right out of the box.

dscho commented 3 years ago

So this looks like it isn't actually a Git for Windows specific bug, but unfortunately only manifests on Windows.

I am not actually sure that this is true...

replace exponential algorithm with a linear one

This yields a couple of hits on the Git mailing list: https://lore.kernel.org/git/?q=%22replace+exponential+algorithm+with+a+linear+one%22

A couple of ideas: it might be that removal of the resolve_gitlink_ref() function mentioned here, or it might be something else.

But I do wonder whether this really only manifests on Windows...

jmiserez commented 3 years ago

A colleague has built a repository that demonstrates the issue. It would be great if someone else could verify/reproduce this.

Steps to reproduce:

For the moment creating the symlinked structure requires a (portable) install of Node.js. I've added short install/uninstall instructions in case you/anyone else is interested in reproducing this on a machine. In the future it should be possible to create just the folders with a git testcase.

Download and extract Node.js, install pnpm (portable):

  1. Download Node.js: Specifically the "Windows Binary (.zip), 64-bit" version from the official page: https://nodejs.org/en/download/, currently node-v14.17.3-win-x64.zip.
  2. Extract the zìp somewhere, e.g. C:\dev\node\node-v14.17.3-win-x64
  3. Add the path containing "C:\dev\node\node-v14.17.3-win-x64" to your $PATH (Start Menu -> Run... -> rundll32 sysdm.cpl,EditEnvironmentVariables, or search for "Edit environment variables for your account").
  4. Open cmd.exe, chdir to C:\dev\node\node-v14.17.3-win-x64
  5. Install PNPM into the folder: npm install -g pnpm (see https://pnpm.io/installation). The folder should now contain pnpm alongside node.

To uninstall:

  1. Delete the unzipped folder and remove it from your $PATH, this deletes Node.js and pnpm.
  2. Delete the .pnpm-store folder at %USERPROFILE%\.pnpm-store, this deletes the pnpm cache of packages.

Check out the repo and setup:

  1. git clone https://github.com/mvilliger/node_modules_slow
  2. Open cmd.exe, chdir to node_modules_slow\root
  3. pnpm_install.cmd (which just runs "pnpm install --ignore-scripts")

Reproduce the slow git commands

Run either from cmd.exe or Git Bash with GIT_TRACE_PERFORMANCE=1:

git status -> **fast**
git status --ignored -> **slow since 2.27.0**

To get rid of long path warnings:

git -c "core.longpaths=true" status --ignored -> **slow since 2.27.0**

Results

It takes around ~3s on WSL, but several minutes on Windows. I have not yet tested a physical Linux machine but will do so shortly.

A quick analysis using Sysinternals Process Monitor shows that the syscalls look different for Windows (git.exe) vs. WSL (git) when running GIT_TRACE_PERFORMANCE=1 git -c "core.longpaths=true" status --ignored. I'm not sure why the CreateFile REPARSE calls aren't necessary via WSL, but I'm also not sure if that is actually the issue. Full disclosure: These screenshots are taken with AV enabled, as I currently don't have access to a "clean" Windows machine. So the timings are off by 4x here.

Git for Windows 2.32.0 git: image

Git on Ubuntu WSL 2.32.0: image