Closed smithkl42 closed 7 years ago
BTW, I see that this is largely a duplicate of https://github.com/git-lfs/git-lfs/issues/1611 (which I looked for but didn't see initially). However, since that one is closed, and it still seems to be an issue, I figure it's worth keeping this one open.
:wave: Hi @smithkl42, thanks for opening this.
Our repo has about 3000 standard git files, and a hundred or so git-lfs tracked files (for a combined size of about 1 GB).
This is interesting. We recently shipped an optimization to the git lfs track
command that skips walking directories that are .gitignore
'd, and the .git
directory itself. Even so, with only 3000 files (and hopefully a proportional amount of Git objects), the naive filepath.Walk
implementation should be fast enough.
That all being said, I'm not sure what's going on here. If you're able to share the contents of your repository, a link would be immensely helpful :sparkles:. If not, some information about the directory structure should suffice: how many top-level directories are there in your repository, are the sub-directories deeply nested, etc. Just enough information to give us a picture of what your repository looks like.
I'm thinking that what's probably happening here is that we're leaking goroutines. I would need a little more time to look at the fast walk implementation to find the cause, but that's at least one theory on what could be going on here. Another theory: one (or more) of the alive goroutines could be spinning and thrashing the CPU. Going to :pager: @sinbad in on this one, since his help would be very valuable here. 🙇
Unfortunately, can't share the repo - sorry. (I know that makes troubleshooting more difficult.)
We've only got about 20 top-level directories, maybe a couple hundred git-tracked folders 6 or so layers deep - nothing particularly interesting there. We do have about 1600 git-ignored folders sitting under .\Swyfft.Web\node_modules
, another 2000 git-ignored folders sitting under .\packages\.
, and probably another thousand or so folders sitting under various git-ignored bin
folders. All of those sound like a lot, but I think it's mostly a factor of how nuget and npm manage things - though I could imagine that they're interfering with an attempt to walk the file tree.
Let me know if you'd like more information.
(Would it make any difference where and how we're implementing the .gitignore?)
Would it make any difference where and how we're implementing the .gitignore?
It would, actually. We have to re-implement git's file pattern matching, and it's possible that a bug is causing LFS to scan directories that should be ignored. Right now, it reads .gitignore
at the root, and then at each subdirectory, assembling more gitignore rules the deeper it goes down the tree.
👍 if you could share both the contents of the .gitignore
and the actual filenames it's ignoring as accurately as possible - if you don't want to share real names then changing words to other words is probably fine so long as the structure & mix of any separators / special characters etc remains the same. I tested our new path walker with hundreds of thousands of ignored files and it made a huge difference.
Here are the contents of our single .gitignore file:
################################################################################
# This .gitignore file was automatically created by Microsoft(R) Visual Studio.
################################################################################
#Swyfft-specific stuff
*.orig
/swyf-tests/reports/*
/swyf-tests/*.log
/TestResults/
*.RData
*.RHistory
desktop.ini
.DS_Store
# Jenkins Stuff
**/[Oo]ut/*
JenkinsResults.trx
# User-specific files
*.user
*.suo
tmp*.tmp
*.userosscache
*.sln.docstates
# Build results
[Dd]ebug/
[Dd]ebugPublic/
[Rr]elease/
[Rr]eleases/
x64/
x86/
build/
bld/
[Bb]in/
[Oo]bj/
Swyfft.Web/js/**/*.min.js
Swyfft.Web/css/**/*.min.css
Swyfft.Web/css/Dist/
# Visual Studio 2015 cache/options directory
.vs/
# ReSharper
_ReSharper*/
*.[Rr]e[Ss]harper
*.DotSettings.user
# Ignore NuGet Packages
*.nupkg
# Ignore the packages folder
**/packages/*
# except build/, which is used as an MSBuild target.
!**/packages/build/
# except bootstrap
!**/packages/bootstrap.3.3.4
# Uncomment if necessary however generally it will be regenerated when needed
#!**/packages/repositories.config
~*.xlsm
.metadata/*
.recommenders/*
# Ignore the npm package directory
**/node_modules/*
# Visual Studio profiler
*.psess
*.vsp
*.vspx
# Backup & report files from converting an old project file
# to a newer Visual Studio version. Backup files are not needed,
# because we have git ;-)
_UpgradeReport_Files/
Backup*/
UpgradeLog*.XML
UpgradeLog*.htm
# SQL Server files
*.mdf
*.ldf
# Business Intelligence projects
*.rdl.data
*.bim.layout
*.bim_*.settings
# Microsoft Fakes
FakesAssemblies/
# Node.js Tools for Visual Studio
.ntvs_analysis.dat
# Visual Studio 6 build log
*.plg
# Visual Studio 6 workspace options file
*.opt
lastMigrationSeedDate
Swyfft.Web/css/BuyerFwdStyles.css
Swyfft.Web/css/BuyerFwdStyles.css
Swyfft.Web/App/Dist
xunit-results-Swyfft.Common.Tests.xml
xunit-results-Swyfft.Seeding.Tests.xml
xunit-results-Swyfft.Services.AcceptanceTests.Critical.xml
xunit-results-Swyfft.Services.Tests.xml
xunit-results-Swyfft.Web.AcceptanceTests.Critical.xml
xunit-results-Swyfft.Web.Tests.xml
Let me know if that helps, or if there's anything else I can do to help troubleshoot. (And sorry we can't share the whole repository.)
Thanks, the only thing that jumps out at me immediately is that we don't support the negation operator yet (!
), although if anything that would just cause it to skip extra things so the reverse of what would make it slower.
To help us reproduce, could you also:
find .
- that will dump all the filenames in an easy-to-consume format, I can set up a script to create a duplicate structure that way.gitattributes
file so we can reproduce what you're actually trackingThanks
Aha, thinking about it again actually I think it might be the [Dd]ebug
etc lines. Our regex won't leave those intact which means it's probably looking in your build dirs. Would still be useful to have your file structure as above anyway to prove that.
Thanks, Steve, I'll send the file structure to your email, if that's all right.
Whoops, not sure how I managed to close this accidentally. Re-opening.
Just to update what's going on in email, I've reconstructed the file structure with a script (38K files excluding the git repo files - contents are just random data) but discovered there's a bunch of nested .gitignores in there as well which I've requested copies of. I just managed to panic git-lfs since I just filled the file list with random data and it tried to read the garbage .gitignore
files in the list!
No luck reproducing this so far. This is my tool which I've reconstructed the same repo file structure, with the same .gitattributes
, all the .gitignores
and 38K files: https://github.com/sinbad/git-lfs-1750
The result when running git lfs track
is that it's actually very quick:
time git lfs track
Listing tracked patterns
*.zip (.gitattributes)
*.xlsm (.gitattributes)
*.kml (.gitattributes)
git lfs track 2.89s user 0.34s system 675% cpu 0.479 total
This is on my Mac, I'll test on Windows tomorrow but I don't imagine it'll be vastly different.
Oh BTW the [Dd]ebug
bit wasn't a problem, we already handled that correctly; I just added a test to prove that in #1768
It's just as fast for me on Windows (reports faster in fact despite being in a VM, but I think that's MSys's time
lying since perceptually it's still 2 and a bit seconds):
steve@STEVENSTREED3AA MINGW64 ~/temp/issue1750 (master)
$ time git lfs track
Listing tracked patterns
*.zip (.gitattributes)
*.xlsm (.gitattributes)
*.kml (.gitattributes)
real 0m1.316s
user 0m0.015s
sys 0m0.000s
This is definitely the right file structure:
$ find . -type f | wc -l
38302
$ find . | grep .gitignore
./.gitignore
./Swyfft.Web/App/.gitignore
./Swyfft.Web/bower_components/iframe-resizer/.gitignore
./Swyfft.Web/bower_components/jquery-impromptu/.gitignore
./Swyfft.Web/obj/Beta/Package/PackageTmp/bower_components/iframe-resizer/.gitignore
./Swyfft.Web/obj/Beta/Package/PackageTmp/bower_components/jquery-impromptu/.gitignore
./Swyfft.Web/obj/Development/Package/PackageTmp/bower_components/iframe-resizer/.gitignore
./Swyfft.Web/obj/Development/Package/PackageTmp/bower_components/jquery-impromptu/.gitignore
./Swyfft.Web/obj/Release/Package/PackageTmp/bower_components/iframe-resizer/.gitignore
./Swyfft.Web/obj/Release/Package/PackageTmp/bower_components/jquery-impromptu/.gitignore
./Swyfft.Web/obj/Staging/Package/PackageTmp/bower_components/iframe-resizer/.gitignore
./Swyfft.Web/obj/Staging/Package/PackageTmp/bower_components/jquery-impromptu/.gitignore
So I don't know where to go from here. The content of the files are irrelevant (except for .gitignore
and .gitattributes
) when it comes to speed of traversal, since only the directory entries are checked, so given that I have an identical file structure now I should be seeing the same thing as you, but for me it's fine.
I think I have a minimal repro for this:
$ cat .gitignore
node_modules/
$ find . | grep -v .git
.
./foo
./foo/node_modules
./foo/node_modules/bar.txt
Git ignores bar.txt but when tracing file access by git-lfs you can see that it stats bar.txt. (OS X fs_usage trace attached).
(edit: git-lfs/1.5.3 (GitHub; darwin amd64; go 1.7.4))
That's very odd. All the devs running 1.5.3 are reporting that it's slow. On my machine - a fairly beefy Asus Zenbook with an SSD and 4 cores - it takes 20+ seconds and spikes all 4 processors.
smith@ken-asuslaptop MINGW64 /c/source/swyfft (feature/ks-EFDeductibleTests)
$ time git lfs track
Listing tracked patterns
*.zip (.gitattributes)
*.xlsm (.gitattributes)
*.kml (.gitattributes)
real 0m24.354s
user 0m0.000s
sys 0m0.015s
But ... I discovered that if I drop back to an earlier version of git lfs (1.4.4), it only takes about two seconds:
smith@ken-asuslaptop MINGW64 /c/source/swyfft (feature/ks-EFDeductibleTests)
$ time git lfs track
Listing tracked paths
*.zip (.gitattributes)
*.xlsm (.gitattributes)
*.kml (.gitattributes)
real 0m2.520s
user 0m0.031s
sys 0m0.000s
smith@ken-asuslaptop MINGW64 /c/source/swyfft (feature/ks-EFDeductibleTests)
$ git lfs version
git-lfs/1.4.4 (GitHub; windows amd64; go 1.7.3; git cbf91a9)
Is there a specific version you'd like me to check timing on? (Just tried 1.5.2 and it takes ~25 seconds.)
@smithkl42: @sgrankin may be onto something (thanks!) - although the file structure you sent me didn't have anything in the folders that would have been skipped by this, perhaps it's because you executed find
on a clean checkout? In a 'used' working copy maybe there would be a lot more files in places I don't have.
To test this, can you edit your root .gitignore
and change all entries which end in dir/
to dir/**
instead?
@sinbad To clarify, foo/node_modules/bar.txt
is NOT tracked by git; it's ignored due to the node_modules/
in the root .gitignore. Here's a zip of the repo with the work tree so you can debug: gitlfsbug.zip
I also tried switching the pattern to node_modules/**
in the .gitignore (after creating the above zip) but the lstat64 on foo/node_modules/bar.txt
still shows up in the trace.
@sinbad - Well, we're part-way there. Changing the various dir/
excludes to **/dir/**
dropped the time for git lfs track
in half - down from about 25 seconds to 12 seconds.
@sinbad - Oh, and the file structure I sent you was from a very dirty working copy - one with build artifacts and what-not that have accumulated over at least a month or two (since we switched to git-lfs, in fact).
@sgrankin:
I also tried switching the pattern to node_modules/** in the .gitignore (after creating the above zip) but the lstat64 on foo/node_modules/bar.txt still shows up in the trace.
Yes, since node_modules
s not in the root it would need to be **/node_modules/**
. The problem here is that the docs for .gitignore
don't totally reflect what git actually does, it seems, I'll have to add more special cases.
@smithkl42:
Oh, and the file structure I sent you was from a very dirty working copy
OK, odd that it's still fast here having recreated that structure. I was just a bit suspicious since there was nothing in the debug/release/bin folders that were listed in your gitignore.
I also can't understand how the old version of git-lfs could possibly be faster for you, since 1.4.x didn't respect .gitignore
at all and would walk absolutely every folder (including the .git repo!) in a single thread, compared to 1.5.x which skips the contents of .gitignore
and parallelises in many threads; in my tests it was 100x + faster on folders up to 100K files. Feel like there's something we're missing here. I think we need to give you a tracing build to isolate exactly what's going on.
I'm short of time until Monday, do you build git-lfs from source or will you need me to provide custom binaries?
I think I figured out how to build it under Windows - at least, I've got a git-lfs.exe sitting in the ./bin folder which reports a 1.5.0 version number. (Is that right?)
But here's the weird bit. When I run git lfs track
with that version, it returns almost instantaneously.
smith@ken-asuslaptop MINGW64 /c/source/swyfft (development)
$ time git lfs track
Listing tracked patterns
*.zip (.gitattributes)
*.xlsm (.gitattributes)
*.kml (.gitattributes)
real 0m0.338s
user 0m0.000s
sys 0m0.000s
smith@ken-asuslaptop MINGW64 /c/source/swyfft (development)
$ git lfs version
git-lfs/1.5.0 (GitHub; windows amd64; go 1.7.4; git 96dba412)
Switching back to the 1.5.3 I downloaded takes the expected (?) ~13 seconds.
But here's the weird bit. When I run
git lfs track
with that version, it returns almost instantaneously.
Something is definitely weird here. I double checked the assets that I released for Windows and checked that they were built against the right commits, and that the fast directory walk is reachable within that tree. All was good there, so maybe this is a $PATH
issue?
at least, I've got a git-lfs.exe sitting in the ./bin folder which reports a 1.5.0 version number. (Is that right?)
Ah, you're on master. We don't tweak the version number until right before a release. We anticipated major code changes for the next big LFS release, so we immediately branched off release-1.5
when we shipped v1.5.0. All of the v1.5.x releases have been cut from that branch, and have missed all the newer stuff in master.
I wonder if the speed boost is from #1696? We could try backporting that. If that helps this much, I think it'd be worth a v1.5.4 release.
Sorry for being a Go newbie, but when I try to build off the 1.5.3 branch, I get this error:
smith@ken-asuslaptop MINGW64 /c/source/git-lfs (release-1.5)
$ script/bootstrap
script\build.go:19:2: cannot find package "github.com/git-lfs/git-lfs/tools/longpathos" in any of:
C:\Go\src\github.com\git-lfs\git-lfs\tools\longpathos (from $GOROOT)
c:\source\src\github.com\git-lfs\git-lfs\tools\longpathos (from $GOPATH)
Switching back to master lets me build - sort of:
smith@ken-asuslaptop MINGW64 /c/source/git-lfs (master)
$ script/bootstrap
Installing goversioninfo to embed resources into Windows executables...
Creating the resource.syso version information file...
git-lfs.go:1: running "goversioninfo": exec: "goversioninfo": executable file not found in %PATH%
smith@ken-asuslaptop MINGW64 /c/source/git-lfs (master)
$ go run script/*.go -cmd build "$@"
Using go1.7.4
Go is strict about its GOPATHs. Try running go get github.com/git-lfs/git-lfs
to make sure it's in $GOPATH/src/github.com/git-lfs/git-lfs
and add $GOPATH/bin
to your $PATH
.
Got it - that worked.
Yeah, building 1.5.3 that way resulted in the same 13 second git lfs track
. So it's not just an issue with the build environment.
@smithkl42 Can you try against #1782?
That seems to have done it. The version is flagged as 1.5.3, but it comes back in about .35 seconds.
smith@ken-asuslaptop MINGW64 /c/source/swyfft (development)
$ git lfs version
git-lfs/1.5.3 (GitHub; windows amd64; go 1.7.4; git 0d96b04d)
smith@ken-asuslaptop MINGW64 /c/source/swyfft (development)
$ time git lfs track
Listing tracked patterns
*.zip (.gitattributes)
*.xlsm (.gitattributes)
*.kml (.gitattributes)
real 0m0.349s
user 0m0.000s
sys 0m0.000s
@technoweenie nice catch; looks like my original change had a regex bottleneck that didn't show up in my bulk tests, I guess it more adversely affected larger gitignores and could possibly outweigh the gains from the faster file walking in those cases.
Thanks. Shout out to @smithkl42 for trying it on master in the first place :) We should be able to get a fix out this week. If not this week, then early 2017 :)
@technoweenie - Heh. Not the first time my incompetence has provided an inadvertent public service :-).
Thanks for the dogged work on fixing this. My team appreciates it quite a bit.
Hi all - thanks a ton for the work done here! I found this bug after issues with SourceTree after switching to LFS.
Is there anything that can be done in the short-term? I saw the bit about adding /** to directories but wanted to be sure there wasn't anything else that could help at this time.
@otherdave the original update in 1.5.x may have helped if your issue was very large, very deep directory structures, since that's what I tested against in my original optimisation. If its still slow after that then it's down to you hitting the problem Rick identified with large gitignores generating too much regex overhead with my approach, which is currently only fixed in master right now. If you can build a version for yourself from source you could do that, otherwise I'm afraid it's a case of waiting for the next update, which I'm guessing will be after the holidays. [edit] Although, if you fall into the latter case (still slow with latest release) then deleting any lines in your .gitignores that you don't need will probably help as a workaround
@sinbad I've been using git command line for a while so it's not a huge pain waiting. I don't mind trying to build from master but I'm on windows/cygwin/babun so I'm not sure what it would take to get a build environment ready. If there's documentation for that, I'm happy to give it a shot and provide some feedback in case it helps.
@otherdave Go is relatively friendly so if you're willing to give it a shot:
go
is on your path (installer should sort that out)GOPATH
environment variable to that dirgo get github.com/git-lfs/git-lfs
%GOPATH%\bin
to your PATH
, before any other location that has git-lfs.exe in itThat should do it I think; I usually build Go from source so I'm not 100% sure how much the installer does for you but those are the main steps. go get
downloads source into %GOPATH%
and automatically builds it, putting the result in %GOPATH%\bin
which is why just that step should get you the latest version.
@sinbad thanks! I was far too lazy to edit my path so after pulling down the git-lfs binaries via go get
I just dropped git-lfs.exe into c:\program files\Git LFS.
My results?
My old git-lfs.exe took 37 seconds to run git lfs track
. The new one took 0.5 seconds :)
For sourcetree, it looks like I need to drop the new one into c:\users\dstahl\AppData\Local\Atlassian\SourceTree\git_extras\git-lfs.exe and now ST is faster than ever.
The same is true for our repository. It dropped from >30s with the version before 1.5 to 5s with 1.5 and now to really fast 0.2s for a pretty huge repository (>500k files, but most of them ignored, ~12k files under version control). Great work! Looking forward to the 1.5.4 release now to use git lfs in production.
:wave: Hi all! Thanks for being patient: I'll have a new release of LFS that'll include https://github.com/git-lfs/git-lfs/pull/1782 early next week 👍
Confirmed. This finishes subsecond for me.
Running a simple
git lfs track
command on our repo takes about 30 seconds, and thegit-lfs
process chews up nearly 100% of CPU during that time period. Our repo has about 3000 standard git files, and a hundred or so git-lfs tracked files (for a combined size of about 1 GB).Not being clear on the internals of git-lfs, I'm not sure what you can do about it, but since SourceTree kicks off a
git lfs track
every time you stage a file, it makes for a pretty miserable experience. And given the output - it doesn't look like it's doing anything more complicated than reading from a text file, though I'm sure the reality is much more complex - it seems like this shouldn't take so long.I should note that I'm running git-lfs 1.5.3.