Closed legoktm closed 5 years ago
The test failure (https://travis-ci.org/Kentzo/git-archive-all/jobs/431358381) looks like an issue with travis-ci.
Alternatively, git-archive-all could aggregate all the files and then check them all at once.
Perhaps it should do both.
@legoktm Please try the version from the check-attr branch.
For repositories with many files, is_file_excluded() is the biggest bottleneck since it has to be called for each file. git-check-attr is actually pretty fast, so most of the time is just spent in the process of shelling out to git.
We can use the pygit2 library (a wrapper around libgit2) if it's available for a much faster check. In my testing of MediaWiki tarball generation (a rather large case), is_file_excluded went from 117 seconds of wall clock time to 1.5 seconds!
Since pygit2 can be a bit tricky to install as you need to have a matching libgit2 version, only use it if it's already installed and fall back to the current behavior of shelling out if not.
There are some other calls to git that could also use pygit2, however in my profiling, none of those appear as hotspots, and the cost of shelling out is negligible compared to the amount of time the command itself takes.
I hope it's OK to optionally depend upon an external library like this. It wasn't very straightforward for me to install it (had to manually install a slightly older version, since Fedora is not using the very latest libgit), so I didn't think it would be that great to have a hard dependency on it.