gitext-rs / git-stack

Stacked branch management for Git
Apache License 2.0
496 stars 19 forks source link

Speed up protected branch detection #222

Open epage opened 2 years ago

epage commented 2 years ago

Please complete the following tasks

Description

In #216, we found that git stack can be slow calculating protected bases when some of them are far from each other or the topic branches.

Version

v0.8.0

Steps to reproduce

git clone https://github.com/westermo/linux/
cd linux
time git stack
git checkout 8dc84dc
git checkout -b foo
gvim COPYING
git commit -a -m "foo"
time git stack
git co master
git branch -D foo
time git stack

Actual Behaviour

Takes 6s

Expected Behaviour

Should take <1s

Debug Output

No response

epage commented 2 years ago

git-branchless originally cached merge-bases on-disk

https://github.com/arxanas/git-branchless/blob/b48ed600777f73e410732e613b199a09e6dafaf2/src/core/mergebase.rs

Then switched to Eden. @arxanas is it all eden or is it also the hooks incrementally updating eden's database?

it now uses a data structure called the "segmented changelog" to represent its commit graph. See https://github.com/quark-zju/gitrevset/issues/1 and commit message at https://github.com/arxanas/git-branchless/commit/f6c540fea8392223d604c4994b081b603b3df850 for more details. It answers O(log(n)) merge-base queries with very good real performance (~1ms per query). That particular implementation is GPL-3-only, unfortunately.

epage commented 2 years ago

@arxanas

arxanas commented 2 years ago
  • Improve performance of git sl after rebase. In large repos, it can take several seconds to compute.
    • This could probably be done using the post-update hook. There should be something about the lattice merge-base structure which we can exploit to quickly calculate new merge-bases for all the visible commits, something like if mb(local, main) = x and mb(main, main') = x then mb(local, main') = x (but there are some edge-cases due to the fact that nodes can have multiple parents).
epage commented 2 years ago

Yeah, Eden sounds nice; too bad its GPL. Thats weird for me to say as I used to be a fan but there are enough people / companies that restrict GPL that I don't want the work I do to be restricted and I'm unlikely to create the next major piece of software that will be abused for its license. This also means I'm participating in the viral nature of license biases.

Thats a good point that incremental updates wouldn't be a big deal, even without hooks (I also want to avoid hooks). With a graph data structure, it seems like it'd be relatively easy to identify and trim all roots that aren't known (ie cache invalidation). Might be interesting to play with this at a later time if there is an alternative viable graph that supports serialization.

@arxanas how does git-branchless do for initializing on Linux? I'm assuming thats much worse than gecko-dev.

arxanas commented 2 years ago

@epage here's the timing results:

$ (cd linux && rm -rf .git/branchless/dag/ && sudo purge && /usr/bin/time git sl)
       35.89 real        29.53 user         1.89 sys
$ (cd linux && git rev-list --count HEAD)
1078940
$ (cd gecko-dev && rm -rf .git/branchless/dag/ && sudo purge && /usr/bin/time git sl)
       21.52 real        17.32 user         1.35 sys
$ (cd gecko-dev && git rev-list --count HEAD)
787973

That's a little bit longer than you would expect just based on commit numbers (21.52s * 1078940 commits / 787973 commits = 29.47s, but it actually takes 35s). I'm guessing that it's somehow pessimized by linux's use of many merge commits in contrast to gecko-dev's avoidance of merge commits.