arxanas / git-branchless

High-velocity, monorepo-scale workflow for Git
Apache License 2.0
3.43k stars 85 forks source link

Too many open files #260

Open KevinWuWon opened 2 years ago

KevinWuWon commented 2 years ago

Description of the bug

I got the following while trying to do a move. Tried it a few times and got the same error again. Ended up using git rebase instead.

❯ g move -s 3a1b4f32 -d green
The application panicked (crashed).
Message:  A fatal error occurred:
   0: Git error GenericError: could not open '/Users/kevinwuwon/work/.git/objects/ec/e53826228db6bfdbe45285a18fad18f8d48c99': Too many open files

Location:
   src/git/repo.rs:45

  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ SPANTRACE ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

   0: branchless::git::repo::find_tree with self=<Git repository at: "/Users/kevinwuwon/work/.git/"> oid=NonZeroOid(ece53826228db6bfdbe45285a18fad18f8d48c99)
      at src/git/repo.rs:1083
   1: branchless::git::tree::get_changed_paths_between_trees with repo=<Git repository at: "/Users/kevinwuwon/work/.git/"> lhs=Some(Tree { id: f356c54deafa9098cd7df4b8022a589e13824b73 }) rhs=Some(Tree { id: 0dd521a0fdf599f49516098abe7fa40b60071d69 })
      at src/git/tree.rs:270
   2: branchless::git::repo::get_paths_touched_by_commit with self=<Git repository at: "/Users/kevinwuwon/work/.git/"> commit=Commit { inner: Commit { id: a17ae10bd61939cffe5f7403dec915af6597260e, summary: "[PAY-2762] Separate CHARGEBACK from REFUNDED status in PurchaseHistory (#240240)" } }
      at src/git/repo.rs:590

Backtrace omitted.
Run with RUST_BACKTRACE=1 environment variable to display it.
Run with RUST_BACKTRACE=full to include source snippets.
Location: src/commands/mod.rs:262

Backtrace omitted.
Run with RUST_BACKTRACE=1 environment variable to display it.
Run with RUST_BACKTRACE=full to include source snippets.

Expected behavior

No response

Actual behavior

No response

Version of git-branchless

02bc9b5e1bb756cf6ab0a4fd81653d6146fd9540

Version of git

git version 2.34.0

Version of rustc

rustc 1.56.1 (59eed8a2a 2021-11-01)

arxanas commented 2 years ago

Thanks for the report @KevinWuWon. I haven't seen an issue like this before.

This comment https://github.com/rust-lang/git2-rs/issues/626#issuecomment-763633357 suggests that resources aren't freed until the owning Repository is freed, which could be a problem for this use-case. In https://github.com/arxanas/git-branchless/blob/f680141a0d258799e644e228ca2111a9a43724ea/src/git/tree.rs#L76, we do a dual depth-first search of the two trees, and I expected the allocated git2::TreeEntrys to be freed after returning from each function call, but that might not be the case.

KevinWuWon commented 2 years ago

What OS are you using?

macOS 11.6.2

What's the maximum number of open files for your system? (You might be able to find this information with ulimit.)

ulimit says unlimited

How big is the diff for the commit(s) which you're applying?

The git diff of the 3 commits I'm moving is 483 lines with 4 changed files.

Does your repository have a lot of commits/files?

200k commits, 385k files

Are there any directories which have very many files in them?

No, there are 110 files descendent from the directory it touches.

Are any changed paths very deep into the directory hierarchy?

No, 6 levels deep.

I restarted my computer and the problem went away so I suspect it's a resource leak. The computer had been on for a few weeks.

KevinWuWon commented 2 years ago

I'm still getting "Too many open files" even after a system restart. This time on git amend:

❯ g amend
The application panicked (crashed).
Message:  A fatal error occurred:
   0: could not open '/Users/kevinwuwon/work/.git/objects/92/bfc18cfa594ae75d0839df956707bdab5fd6a2': Too many open files; class=Os (2)

Location:
   /rustc/59eed8a2aac0230a8b53e89d4e99d55912ba6b35/library/core/src/result.rs:1915

Backtrace omitted.
Run with RUST_BACKTRACE=1 environment variable to display it.
Run with RUST_BACKTRACE=full to include source snippets.
Location: src/commands/mod.rs:262

Backtrace omitted.
Run with RUST_BACKTRACE=1 environment variable to display it.
Run with RUST_BACKTRACE=full to include source snippets.

This file is not particularly big, nor is it in a directory with many files. git commit -a --amend --no-edit worked fine as an alternative.

arxanas commented 2 years ago

How many branches and references do you have in your repository? You can try running this and report the results:

$ find .git/refs | wc -l
$ find .git/refs/heads | wc -l
$ find .git/refs/remotes | wc -l
$ find .git/refs/branchless | wc -l

After that, can you try running git branchless gc? That just cleans up dangling references under .git/refs/branchless/. Maybe we're somehow holding onto the objects pointed to by those references.

I don't have a hypothesis as to what's opening all the files. I think it's 50/50 odds between that the commit-rewrite process itself opens too many files at once or that some other operation prior to that opens too many files and holds onto them.

KevinWuWon commented 2 years ago
❯ find .git/refs | wc -l
    2881

❯  find .git/refs/heads | wc -l
      22

❯ find .git/refs/remotes | wc -l
     445

❯ find .git/refs/branchless | wc -l
    2408

The git branchless gc didn't help.

But someone suggested I type ulimit -n 10240 (even though ulimit returns "unlimited") and that remedied it successfully.

arxanas commented 2 years ago

I think ulimit shows the "hard" limit, but the "soft" limit is different . ulimit -n indicates that the soft limit on my machine (macOS 11.6.3) is 65535. I don't really have a good idea of where we can avoid opening as many files, and in any case, 1024 seems a little low.

martinvonz commented 2 years ago

Did you figure out which files are opened? Is it refs or loose objects or something else? I suppose strace could help you at least figure that part out.

martinvonz commented 2 years ago

Oh, regular git gc should help if the problem is with either refs or objects -- the command packs both refs and objects.