Open lin-xianming opened 2 years ago
This behavior is most likely due to the extra checks we have to go through (see 6ad3d3db7372717de578088ce65f6262c37ec20c) to determine the type of the symlink. On Linux/Unix, symlinks do not distinguish between file targets and directory targets, but on Windows they do. And since still too much of Git assumes Linux semantics, we have to work extra hard around that.
Is there an easy way to figure out the types of the symlinks contained in your repository? If so, it might make sense to declare them (either in a .gitattributes
file that is contained in the repository, or in a .git/info/attributes
file that is local to your checkout; in the latter case you will want to clone with --no-checkout
, then initialize that file, then call git checkout <branch>
).
An alternative, if that is not a viable approach, would be to perform a parallelized checkout that uses all of your CPU's cores (or uses an even higher number if the operation is I/O bound).
Extra checks for each symlink does not explain why checkout speed slows down over time. I'm testing on an SSD and there are no disk bottlenecks. With symlink=file
, there are no differences in checkout speed. With checkout.workers=-1
, a second core was briefly loaded before checkout speed slowed down to the same as before and only a single core was loaded.
With
symlink=file
, there are no differences in checkout speed.
Hmm. That's funny. Could you investigate further, e.g. by instrumenting the code e.g. with Trace2 statements?
git is repeatedly accessing symlinks that are already checked out, leading to slower checkout the more symlinks are checked out. One thing of note is that for a repository like the example given in the bug report, all the symlink targets will not exist when the repository is cloned.
That should not happen when configuring the symlink=file
Git attribute.
This problem seems to be specific to git-annex repositories with large amount of symlinks like the one linked in the bug report. I created a repository with 10000 symlinks with non-existent targets with for i in {1..10000}; do ln -s ../$i $i; done
and did not experience any problems with cloning and checkout. I deleted most of the symlinks from the example repository and also did not see any repeated access with procmon.
@lin-xianming it should be really interesting to learn what you figure out investigating this further.
Setup
Windows developer mode is enabled so symlinks can be created without privilege elevation.
Details
Bash
Repository would be cloned and working tree checked out in a reasonable amount of time.
Repository was cloned and checkout began normally, but slowed down to 5-20 files per second after a few seconds, and one CPU core was fully loaded. There are 14693 files to checkout in total and about 7000 remaining when it slowed down. At 20 files per second it would have taken 5.8 more minutes. The same clone and checkout on Linux took less than 5 seconds.
For example
http://psydata.ovgu.de/forrest_gump/.git
. Seems to affect any git-annex repository with large amount of symlinks.