gitbucket / gitbucket

A Git platform powered by Scala with easy installation, high extensibility & GitHub API compatibility
https://gitbucket.github.io/
Apache License 2.0
9.16k stars 1.25k forks source link

Performance: JGitUtil.getCommitLog returns too many commits on force-push #3476

Open ziggystar opened 9 months ago

ziggystar commented 9 months ago

JGitUtil.getCommitLog returns too many commits

Impacted version: current

We have performance problems with our gitbucket instance. It's particularly bad when force-pushing a rebased branch. The biggest problem currently appears to be this:

When force-pushing, JGitUtil.getCommitLog is called with two commit ids. The first one (from), I think, is the commit of the old branch tip (before rebasing), the other one (to) is the new branch tip. The current implementation then walks backwards from the new branch tip to the root (full repository).

I don't know where to fix this correctly. Probably the caller when doing the push should provide correct parameters to the method.

ziggystar commented 9 months ago

But I also tried the following code, which I found on the internet. Maybe that's a better method to get the commit infos. But it cannot easily be swapped, as the behavior is different in non-standard cases.

  def getCommitLogFast(git: Git, from: String, to: String): List[CommitInfo] = {
    val since = git.getRepository.resolve(from)
    val until = git.getRepository.resolve(to)
    git.log.addRange(since, until).call.asScala.map(new CommitInfo(_)).toList.reverse
  }

I wrote the following tests, note the comments next to the cases. The 00000 I encountered when pushing a lot of commits to my stale test GB installation. I don't know what that means and whether this must be supported.

  test("getCommitLogFast returns correct commits") {
    withTestRepository { git =>
      val c1 = createFile(git, Constants.HEAD, "README.md", "body1", message = "commit1")
      val c2 = createFile(git, Constants.HEAD, "README.md", "body2", message = "commit2")
      val c3 = createFile(git, Constants.HEAD, "README.md", "body3", message = "commit3")

      for (
        (from, to, msg) <- List(
          (c1.name, c3.name, "1-3"),  // OK
          (c2.name, c3.name, "2-3"),  // OK
          (c1.name, c2.name, "1-2"),  // OK
          (c3.name, c1.name, "3-1"),  // FAIL: fast: NIL, old: 1 commit
          ("000000000000000", c3.name, "000000-3") // FAIL: fast: NPE b/c of unknown commit
        )
      ) {
        val fast = JGitUtil.getCommitLogFast(git, from, to)
        val old = JGitUtil.getCommitLog(git, from, to)
        assert(fast == old, msg)
      }
    }
  }

Overall, I don't know if it's worth changing to this method.

ziggystar commented 9 months ago

I have a PR with a new implementation of getCommits that works for rebases (it calculates a merge-base/common ancestor).

Problem is, sometimes (e.g. new repo), I get commit names 000000000...0, and those can't be resolved. Where are those coming from? What's the correct behavior there? Are those names generated by GB?