gitext-rs / git-stack

Stacked branch management for Git
Apache License 2.0
505 stars 19 forks source link

Speed up commit operations with partial index updates #151

Open epage opened 2 years ago

epage commented 2 years ago
    /// Cherry-pick a commit in memory and return the resulting tree.
    ///
    /// The `libgit2` routines operate on entire `Index`es, which contain one
    /// entry per file in the repository. When operating on a large repository,
    /// this is prohibitively slow, as it takes several seconds just to write
    /// the index to disk. To improve performance, we reduce the size of the
    /// involved indexes by filtering out any unchanged entries from the input
    /// trees, then call into `libgit2`, then add back the unchanged entries to
    /// the output tree.

See https://github.com/arxanas/git-branchless/blob/331b7cf2a37d00a3d54ba5df00f3d702aa7b3948/src/git/repo.rs#L774-L789

arxanas commented 2 years ago

There's also some implementation commentary here: https://github.com/libgit2/libgit2/issues/6036 (cc @bcongdon)

epage commented 2 years ago

I had always thought the speed up was just in operating on the index directly, rather than touching the working tree, which i also do. I didn't know there were additional optimizations. Thanks!

epage commented 2 years ago

I wonder if we can and should create a crate for sharing complex details like this. It could serve dual purposes of also documenting how to implement higher level git operations with libgit2, something that is lacking regardless of language. Pretty much the only resource I found was (pygit's recipes.

The big risk is people wanting these functions customized every which way that we might as well not provide them. If we can find a strict definition of what we accept (no touching working tree, conflicts are report as errors, etc) it'd help.

Maybe we could create an org we can collaborate in and I could move https://github.com/crate-ci/git-config-env over there (I put it there just to not have it in my personal repos which I try to avid).

arxanas commented 2 years ago

I would be open to splitting the stuff under git-branchless/git into its own crate — I kind of wanted to do that anyways to improve compilation times. It wraps a lot of the libgit2 stuff into its own APIs, as per the commentary here, so it's not clear what functionality we wouldn't want to wrap.

That being said, it seems like it would slow down development to split up that crate from the rest of the repository, since development does tend to cross-cut at times.

There's also some neat, generally-useful stuff like the rebase engine which could be used across projects, but it's a harder to see how to split the API boundary, since it depends on details of the DAG like obsolete commits. (And the fact that it uses the Eden DAG would force that component to be GPLed.)

epage commented 2 years ago

I very much see libgit2 wrappers being too coupled to individual projects to be worth splitting out, unless they get very mature.

I was more envisioning a git-recipes crate that provides one off snippets that people can find useful like

So its mostly a question of whether the algorithms can be split out from the abstractions.

epage commented 2 years ago

Naming is hard :). I was looking to create an org to put these under but it looks like someone has the username git-rs. If anyone has alternative names for what to name this common ground for working on git related libraries and tools, I'm open to ideas