Closed schwern closed 10 years ago
I'm surprised that Git::Repository is so slow. Can you point me at the code, so that I can see if I can improve either Git::Repository or the code itself.
Profiling reveals that a huge amount of time is spent in readline both in Git::Repository and in System::Command.
Git/Repository/Command.pm line 167
chomp( @output = <$stdout> );
# spent 402s making 438 calls to Git::Repository::Command::CORE:readline, avg 917ms/call
System/Command.pm line 113
my $mesg = do { local $/; <$stat_r> };
# spent 271s making 439 calls to System::Command::CORE:readline, avg 618ms/call
That could be because that run involved a large push, it's hard to tell. I will run another profile without a push and see.
I'm uploading the whole nytprof directory for your examination. It should be fully uploaded in a few minutes.
False alarm, sorry. Gitpan::Git->push
took 360s.
...except that doesn't explain why tagging was so slow.
The relevant information is in gitpan-nytprof-2014-08-11T02-13/nytprof/Git-Repository-pm-1172-line.html#182
which shows that Git::Repository->run
is being called in lots of different ways and taking over 600ms for all of them. In contrast, git tag foo
on my command line takes 14ms.
A lot of the time is in Git::Repository->new
and traced down to System::Command->$_spawn
.
Here is a fresh profile run, this time with no push to avoid network interference. It shows that calling "git tag" is eating 45% of the run time.
The line you point at in the profiler:
# System/Command.pm line 113
my $mesg = do { local $/; <$stat_r> };
is actually used by the parent to catch errors the child might have had when doing the fork+exec dance (I copied it from IPC::Open3, IIRC). Given that the command ran successfully, this sounds like a timeout before actually finding out the pipe has been closed.
I wonder if there's a better (faster) way to communicate failure to the parent.
Could you try timing with and without the setpgrp()
at line 128?
If this improves things for you, I could always make it toggable in System::Command
and disable it in Git::Repository::Command
.
Some explanation: I've done some manual measurements on System::Command
(with the help of Time::HiRes
and print
), and the call to setpgrp
is the action the kid does before the actual exec
that takes the longest (by a factor of about 20 compared to the rest of what the child does before exec
).
Since it happens before the exec
, and the parent waits on the $stat_r
filehandle to either be closed (thanks to close-on-exec) or have something written to it, this setpgrp
call is slowing down both the parent and the child. The call was added in book/System-Command@c7d64e12ff7141a386f2a405b4d4b4baa0c07117 for the sole purpose of enabling kill -SIGKILL, $cmd->pid;
, hence my proposal to make it optional.
System::Command 1.110 is now on CPAN, with a setpgrp
option to disable (or enable) the call to setpgrp()
in the child process (it's enabled by default).
Since Git::Repository passes the options hashes directly to Git::Repository::Command, which is a subclass of System::Command, you can do:
my $r = Git::Repository->new( { setpgrp => 0 } );
and none of the Git commands spawned by $r
will call setpgrp()
.
Let me know if that improves things.
Sorry, no help, takes the same amount of time. I upgraded System::Command to 1.110, Git::Repository to 1.312 and changed the call to
return Git::Repository->new(
work_tree => $self->repo_dir.'',
{
env => {
GIT_COMMITTER_EMAIL => $config->committer_email,
GIT_COMMITTER_NAME => $config->committer_name,
GIT_AUTHOR_EMAIL => $config->committer_email,
GIT_AUTHOR_NAME => $config->committer_name,
},
setpgrp => 0,
},
);
The feature/Git-Raw branch is using Git::Raw for the most important operations, particularly tagging. While Devel::NYTProf says the Git::Raw branch is much faster, real world trials don't show a significant difference.
It's possible Devel::NYTProf is interfering with Git::Repository or System::Command's performance?
I merged it, performance or not, mostly because I'm interested in using Git::Raw and libgit2.
I don't know if it's Devel::NYTProf or something else, but in terms of speed, Git::Repository is as fast a git itself plus the overhead of fork+exec and the option handling (obviously). The setpgrp
option made the fork+exec a tiny bit faster (by removing some overhead), but I'm not surprised it doesn't have a big influence on the overall speed.
It's interesting for me to see that Git::Raw is not significantly faster.
(Timing is based on NYTProf for importing the 71 releases of Math-BigInt which takes 883s)
Surprisingly it takes 137s just to tag the 71 releases (213
git tag
calls) which should take no time at all. Git::Repository is slow. Most of that time is reading git command output.Git::Raw, which uses libgit2, might prove better. Try it and see.