evalEmpire / gitpan

Git repositories for all of CPAN
http://github.com/gitpan
73 stars 12 forks source link

Try Git::Raw #60

Closed schwern closed 10 years ago

schwern commented 10 years ago

(Timing is based on NYTProf for importing the 71 releases of Math-BigInt which takes 883s)

Surprisingly it takes 137s just to tag the 71 releases (213 git tag calls) which should take no time at all. Git::Repository is slow. Most of that time is reading git command output.

Git::Raw, which uses libgit2, might prove better. Try it and see.

book commented 10 years ago

I'm surprised that Git::Repository is so slow. Can you point me at the code, so that I can see if I can improve either Git::Repository or the code itself.

schwern commented 10 years ago

Profiling reveals that a huge amount of time is spent in readline both in Git::Repository and in System::Command.

Git/Repository/Command.pm line 167
    chomp( @output = <$stdout> );
# spent 402s making 438 calls to Git::Repository::Command::CORE:readline, avg 917ms/call
System/Command.pm line 113
    my $mesg = do { local $/; <$stat_r> };
# spent 271s making 439 calls to System::Command::CORE:readline, avg 618ms/call

That could be because that run involved a large push, it's hard to tell. I will run another profile without a push and see.

I'm uploading the whole nytprof directory for your examination. It should be fully uploaded in a few minutes.

schwern commented 10 years ago

False alarm, sorry. Gitpan::Git->push took 360s.

schwern commented 10 years ago

...except that doesn't explain why tagging was so slow.

schwern commented 10 years ago

The relevant information is in gitpan-nytprof-2014-08-11T02-13/nytprof/Git-Repository-pm-1172-line.html#182 which shows that Git::Repository->run is being called in lots of different ways and taking over 600ms for all of them. In contrast, git tag foo on my command line takes 14ms.

A lot of the time is in Git::Repository->new and traced down to System::Command->$_spawn.

schwern commented 10 years ago

Here is a fresh profile run, this time with no push to avoid network interference. It shows that calling "git tag" is eating 45% of the run time.

book commented 10 years ago

The line you point at in the profiler:

# System/Command.pm line 113
my $mesg = do { local $/; <$stat_r> };

is actually used by the parent to catch errors the child might have had when doing the fork+exec dance (I copied it from IPC::Open3, IIRC). Given that the command ran successfully, this sounds like a timeout before actually finding out the pipe has been closed.

I wonder if there's a better (faster) way to communicate failure to the parent.

book commented 10 years ago

Could you try timing with and without the setpgrp() at line 128?

If this improves things for you, I could always make it toggable in System::Command and disable it in Git::Repository::Command.

book commented 10 years ago

Some explanation: I've done some manual measurements on System::Command (with the help of Time::HiRes and print), and the call to setpgrp is the action the kid does before the actual exec that takes the longest (by a factor of about 20 compared to the rest of what the child does before exec).

Since it happens before the exec, and the parent waits on the $stat_r filehandle to either be closed (thanks to close-on-exec) or have something written to it, this setpgrp call is slowing down both the parent and the child. The call was added in book/System-Command@c7d64e12ff7141a386f2a405b4d4b4baa0c07117 for the sole purpose of enabling kill -SIGKILL, $cmd->pid;, hence my proposal to make it optional.

book commented 10 years ago

System::Command 1.110 is now on CPAN, with a setpgrp option to disable (or enable) the call to setpgrp() in the child process (it's enabled by default).

Since Git::Repository passes the options hashes directly to Git::Repository::Command, which is a subclass of System::Command, you can do:

my $r = Git::Repository->new( { setpgrp => 0 } );

and none of the Git commands spawned by $r will call setpgrp().

Let me know if that improves things.

schwern commented 10 years ago

Sorry, no help, takes the same amount of time. I upgraded System::Command to 1.110, Git::Repository to 1.312 and changed the call to

  return Git::Repository->new(
      work_tree => $self->repo_dir.'',
      {
          env => {
              GIT_COMMITTER_EMAIL => $config->committer_email,
              GIT_COMMITTER_NAME  => $config->committer_name,
              GIT_AUTHOR_EMAIL    => $config->committer_email,
              GIT_AUTHOR_NAME     => $config->committer_name,
          },
          setpgrp => 0,
      },
  );
schwern commented 10 years ago

The feature/Git-Raw branch is using Git::Raw for the most important operations, particularly tagging. While Devel::NYTProf says the Git::Raw branch is much faster, real world trials don't show a significant difference.

It's possible Devel::NYTProf is interfering with Git::Repository or System::Command's performance?

schwern commented 10 years ago

I merged it, performance or not, mostly because I'm interested in using Git::Raw and libgit2.

book commented 10 years ago

I don't know if it's Devel::NYTProf or something else, but in terms of speed, Git::Repository is as fast a git itself plus the overhead of fork+exec and the option handling (obviously). The setpgrp option made the fork+exec a tiny bit faster (by removing some overhead), but I'm not surprised it doesn't have a big influence on the overall speed.

It's interesting for me to see that Git::Raw is not significantly faster.