arxanas / git-branchless

High-velocity, monorepo-scale workflow for Git
Apache License 2.0
3.44k stars 86 forks source link

git branchless seems to spin and then die after running `git commit` #937

Closed whoopsmith closed 1 year ago

whoopsmith commented 1 year ago

Description of the bug

After installing git branchless in my repo git commit hangs for a log time and then eventually reports

error: git-branchless died of signal 9 branchless: Failed to process reference transaction! branchless: Some events (e.g. branch updates) may have been lost. branchless: This is a bug. Please report it.

Expected behavior

Commit operation should complete.

Actual behavior

Operation appears to stall. Top shows git branchless using 100% of CPU and large ammounts of ram. Eventually the following happens.

error: git-branchless died of signal 9 branchless: Failed to process reference transaction! branchless: Some events (e.g. branch updates) may have been lost. branchless: This is a bug. Please report it.

Version of rustc

rustc 1.69.0 (84c898d65 2023-04-16)

Automated bug report

This command runs for a very long time and generates a large amount of output. branchless_bug-report.txt

Version of git-branchless

git-branchless-opts 0.7.0

Version of git

git version 2.35.3

arxanas commented 1 year ago

Hi @whoopsmith, how large is your repo roughly in terms of commits?

Either git-branchless is having trouble rendering the smartlog (the implementation is not smart) or it's having some other issue processing the commit graph. I would recommend that you simplify the commit graph that git-branchless has to process. First, you can try setting branchless.smartlog.defaultRevset to something smaller and seeing if the operations complete (try stack() or just HEAD).

If it's not a smartlog issue, then you would have to profile git-branchless to see what's taking so long. You can find profiling instructions here: https://github.com/arxanas/git-branchless/wiki/Runbook#with-tracing

You can also try jj, which I think has an efficient implementation of the smartlog.

arxanas commented 1 year ago

Can you also try disabling the branchless.undo.createSnapshots option and see if that improves the performance?

whoopsmith commented 1 year ago

Hi @whoopsmith, how large is your repo roughly in terms of commits?

Total all branches: 57217 Size of the upstream base branch I was working with: 4197

First, you can try setting branchless.smartlog.defaultRevset to something smaller and seeing if the operations complete (try stack() or just HEAD).

Can you also try disabling the branchless.undo.createSnapshots option and see if that improves the performance?

You can also try jj, which I think has an efficient implementation of the smartlog.

Thanks. I have it disabled in that repo for now but I'll re-enable it in the near future and try the 3 things above. If those don't make a difference then I'm not sure when I might have the time to work with some profiling but I'll try.

whoopsmith commented 1 year ago

First, you can try setting branchless.smartlog.defaultRevset to something smaller and seeing if the operations complete (try stack() or just HEAD).

The docs say this option is only available from v0.8.0 but the latest I get from cargo install git-branchless is 0.7.0. Is there a later release I can get 0.8.x from without building from the development version?

Can you also try disabling the branchless.undo.createSnapshots option and see if that improves the performance?

Setting this option to false did not make a difference.

whoopsmith commented 1 year ago

You can also try jj, which I think has an efficient implementation of the smartlog.

I have jj installed and operational in this repo but I'm not clear on what you want me to do. There is no jj smartlog command.

martinvonz commented 1 year ago

jj's regular log command is quite customizable. It defaults to showing commits that are not on any remote, but you can specify a different set. You can also customize what's displayed for each commit using templates. I think @arxanas was just asking if it performs better in your case.

whoopsmith commented 1 year ago

jj's regular log command is quite customizable. It defaults to showing commits that are not on any remote, but you can specify a different set. You can also customize what's displayed for each commit using templates. I think @arxanas was just asking if it performs better in your case.

Thanks for the info. I'm a total noob for both of these projects so I don't actually know what 'performs better' means. I can tell what seems to work and what doesn't though. :)

When I run git branchless smartlog it says its walking the commits for about 30 seconds and then stalls for about 50 seconds and produces a bunch of output. Its doesn't appear to be using my pager but perhaps that's expected. If I pipe the output to wc -l I get 64689 lines of output.

Running jj log at the same place the 1st time it seemed to go do several operations (console updates were too fast to tell what) and then produced output (which used the pager) in about 1-2 seconds. Repeated runs of the same command are pretty much instant.

So from my UI only POV jj is certainly more perfomant. But I don't know if those operations are comparable.

I'll be happy to run any other comparisons but I'll need some cli recipes for exactly the commands to be used.

claytonrcarter commented 1 year ago

Out of curiosity, how many local branches do you have in your repo? And how many commits are on those branches that aren’t merged back into main/master?

I ask because I have a colleague that can’t use branchless because they maintain something like 13,000+ local branches (for their own reasons) and the smartlog hangs during generation every time. Plus, I doubt that it would be very useful with so many branches!

Also, do your local branches contain many merge commits back-and-forth? I believe there used to be an issue with smartlog duplicating commits on merged branches, although I think that was fixed in a recent release.

Finally, and I may have missed it if you’ve already tried this, but can you try running something like “git sl HEAD” to see if you can render a smart log with even just a single commit?

whoopsmith commented 1 year ago

Out of curiosity, how many local branches do you have in your repo?

Currently 2848. We don't delete feature branches so I have a lot.

And how many commits are on those branches that aren’t merged back into main/master?

All of them. :) We use a 100% rebased workflow.

I ask because I have a colleague that can’t use branchless because they maintain something like 13,000+ local branches (for their own reasons) and the smartlog hangs during generation every time. Plus, I doubt that it would be very useful with so many branches!

Ah... well one of the reasons I'm looking at things like branchless (and now jujutsu) is to explore other workflow options. The revsets type workflow is particularity interesting to me. For it to be useful to my team however I'd need to be able to stay compatible with our existing setups.

Also, do your local branches contain many merge commits back-and-forth?

Zero merge commits. All linear history.

Finally, and I may have missed it if you’ve already tried this, but can you try running something like “git sl HEAD” to see if you can render a smart log with even just a single commit?

This works fine and doesn't stall.

arxanas commented 1 year ago

This works fine and doesn't stall.

Oops, I should have suggested to run git-branchless smartlog with a revset argument first to check this, thanks @claytonrcarter for pointing this out. Since this works, it sounds like the only problem in your case is that you have too many branches for the smartlog to render (simply due to its inefficient implementation; but even if it were efficient, it would probably be useless to render 3,000 branches). The smartlog is rendered automatically after some operations, like git sw or git move.

The docs say this option is only available from v0.8.0 but the latest I get from cargo install git-branchless is 0.7.0. Is there a later release I can get 0.8.x from without building from the development version?

If you are okay with using cargo install, then you can get the latest development version with cargo as well. See https://github.com/arxanas/git-branchless/wiki/Runbook#manual-testing for details. You can run this:

cargo install --locked --git https://github.com/arxanas/git-branchless git-branchless

Once you have the latest development version, you can try setting branchless.smartlog.defaultRevset to something more manageable. For example, to include just the current commit stack in the smartlog, you could use stack(), or to include only work which has been modified in the last two weeks, you could use something like draft() & committer.date('after: two weeks ago'). (cc @claytonrcarter: your coworker could also try that.)

If none of jj or git-branchless work for your workflow, you could also try Sapling.

whoopsmith commented 1 year ago

If you are okay with using cargo install, then you can get the latest development version with cargo as well.

I seem to be missing something.

$ cargo install --locked --git https://github.com/arxanas/git-branchless git-branchless
    Updating git repository `https://github.com/arxanas/git-branchless`
  Installing git-branchless v0.7.0 (https://github.com/arxanas/git-branchless#367205d6)
....
Install seems to have worked but

$ git branchless --version
git-branchless-opts 0.7.0

I don't see a 0.8.x branch or any tag for v0.8.0.

whoopsmith commented 1 year ago

Installing git-branchless v0.7.0 (https://github.com/arxanas/git-branchless#367205d6)

ah. The hash matches master HEAD so I guess this is just that the version info is stale. I'll try to see if the option works.

arxanas commented 1 year ago

@whoopsmith did setting branchless.smartlog.defaultRevset work for you?

whoopsmith commented 1 year ago

@arxanas Sorry for my delay in response. Yes, that does seem to work. I've had this set to stack() since then and have been able to go about my normal workflow. I've not seen any long stalls or errors. I do see some slight pauses sometimes (maybe .5 to 1s) though. I know branchless is operating because I see the messages it generates.

I've not had a chance to figure out what that setting actually means yet our how I might change my workflow to try and be more branchless. :) None of that is related to this report though so I suppose you can mark this as resolved with a workaround.

Thanks for all your help looking into this.