facebook / sapling

A Scalable, User-Friendly Source Control System.
https://sapling-scm.com
GNU General Public License v2.0
6.18k stars 284 forks source link

Actions become substantially slower over time #936

Open ianb-pomelo opened 3 months ago

ianb-pomelo commented 3 months ago

I have a checkout of a git repo in sapling that I have been using for a while and as I've used it more and more, the commands (such as sl status and sl) have been substantially slower. I checked out a new version of the repo with Sapling and sl is almost 2x slower in my older checkout and sl status is ~8x slower. I suspect that it has something to do with all the history that I have locally that is hidden. Is there a way to clean up the checkout a bit to remove these stale references that I don't want to bring back? Or do I just have to routinely checkout new copies?

bolinfest commented 2 months ago

I'm not sure about sl status, but in terms of interacting with Git, I just ran:

git --git-dir .sl/store/git fsck

which appeared to clean some things up?

nicktogo commented 2 months ago

Had same issue when upgrading to the latest version a while ago. I re-cloned with SL and it's working fine until recently, it's slower again.

quark-zju commented 2 months ago

Is your repo a .git repo or .sl repo? Which directory does the repo root has?

If it's .git/ repo, does git status make subsequential git/sl status faster? If it's .sl/ repo, what's the output of sl debugstatus and sl debugtree? Do you have watchman installed? Does --config workingcopy.use-rust=true --config status.use-rust=true (or false) make a difference?

status should not be impacted by commit history.

ianb-pomelo commented 2 months ago

Mine is a .git repo. I tried running git status with git --git-dir .sl/store/git status but got an error fatal: this operation must be run in a work tree.

I've attached the debug logs for sl status that has the couple second delay in checking the status

2024-09-18T14:30:50.932032Z DEBUG run: status_info: status_mode="rust"
2024-09-18T14:30:50.932036Z  INFO run:status: workingcopy::workingcopy: enter
2024-09-18T14:30:50.933227Z  INFO run:status:pending_changes: workingcopy::filesystem::watchmanfs::watchmanfs: enter
2024-09-18T14:30:50.933743Z DEBUG progress_model::registry: registering ProgressBar connecting watchman
2024-09-18T14:30:50.946475Z  INFO run:status:pending_changes: workingcopy::filesystem::watchmanfs::watchmanfs: exit
2024-09-18T14:30:50.946499Z DEBUG run:status: watchman_info: watchmanfallback=1
2024-09-18T14:30:50.946506Z  WARN run:status: workingcopy::filesystem::watchmanfs::watchmanfs: watchman error - falling back to slow crawl err=While invoking the watchman CLI to discover the server connection details: No such file or directory (os error 2), stderr=``
2024-09-18T14:30:50.946571Z  INFO run:status:pending_changes: workingcopy::filesystem::physicalfs: enter
2024-09-18T14:30:50.946586Z DEBUG run:status:pending_changes: workingcopy::filesystem::physicalfs: working copy parents: [Ok(HgId("a1b73645a2d513b0b07fbbb8105a0537a04e3f8a"))]
2024-09-18T14:30:50.946737Z DEBUG run:status:pending_changes: progress_model::registry: registering ProgressBar comparing
2024-09-18T14:30:50.946744Z  INFO run:status:pending_changes: workingcopy::filesystem::physicalfs: exit
2024-09-18T14:30:50.946749Z  INFO run:status:compute_status: workingcopy::status: enter
2024-09-18T14:30:50.946758Z  INFO run:status:compute_status:compute_status: workingcopy::status: enter
2024-09-18T14:30:50.955040Z DEBUG progress_model::registry: removing 1 orphan ProgressBar
2024-09-18T14:30:52.747772Z  INFO run:status:compute_status:compute_status:into_iter: workingcopy::filechangedetector: enter
2024-09-18T14:30:52.747795Z  INFO run:status:compute_status:compute_status:into_iter:check manifest{lookups=0}: workingcopy::filechangedetector: enter
2024-09-18T14:30:52.747802Z  INFO run:status:compute_status:compute_status:into_iter:check manifest{lookups=0}:files: manifest_tree: enter
2024-09-18T14:30:52.748272Z DEBUG tree::store::prefetch{ids="9a7be5236f176977ec136e3847dea0a38140c2ee"}: manifest_tree::store: enter
2024-09-18T14:30:52.748327Z DEBUG tree::store::prefetch{ids="9a7be5236f176977ec136e3847dea0a38140c2ee"}: manifest_tree::store: exit
2024-09-18T14:30:52.748335Z DEBUG tree::store::get{id="9a7be5236f176977ec136e3847dea0a38140c2ee"}: manifest_tree::store: enter
2024-09-18T14:30:52.748374Z  INFO run:status:compute_status:compute_status:into_iter:check manifest{lookups=0}:files: manifest_tree: exit
2024-09-18T14:30:52.748540Z DEBUG tree::store::get{id="9a7be5236f176977ec136e3847dea0a38140c2ee"}: manifest_tree::store: exit
2024-09-18T14:30:52.748652Z  INFO run:status:compute_status:compute_status:into_iter:check manifest{lookups=0}: workingcopy::filechangedetector: exit
2024-09-18T14:30:52.748848Z  INFO run:status:compute_status:compute_status:into_iter:compare contents{keys=0}: workingcopy::filechangedetector: enter
2024-09-18T14:30:52.748949Z  INFO run:status:compute_status:compute_status:into_iter:compare contents{keys=0}:get_content_stream: workingcopy::filechangedetector: enter
2024-09-18T14:30:52.749031Z  INFO run:status:compute_status:compute_status:into_iter:compare contents{keys=0}:get_content_stream: workingcopy::filechangedetector: exit
2024-09-18T14:30:52.749039Z  INFO run:status:compute_status:compute_status:into_iter:compare contents{keys=0}: workingcopy::filechangedetector: exit
2024-09-18T14:30:52.749045Z  INFO run:status:compute_status:compute_status:into_iter: workingcopy::filechangedetector: exit
2024-09-18T14:30:52.749054Z  INFO run:status:compute_status:compute_status:maybe_flush_treestate: workingcopy::util: enter
2024-09-18T14:30:52.749060Z DEBUG run:status:compute_status:compute_status:maybe_flush_treestate: workingcopy::util: pending_change_count=0 timeout_secs=Some(0)
2024-09-18T14:30:52.749065Z DEBUG run:status:compute_status:compute_status:maybe_flush_treestate: treestate::dirstate: skipping treestate flush - it is not dirty
2024-09-18T14:30:52.749068Z  INFO run:status:compute_status:compute_status:maybe_flush_treestate: workingcopy::util: exit
2024-09-18T14:30:52.750157Z  INFO run:status:compute_status:compute_status{pending_changes_count=0 manifest_files_count=0}: workingcopy::status: exit
2024-09-18T14:30:52.750163Z  INFO run:status:compute_status: workingcopy::status: exit
2024-09-18T14:30:52.750167Z  INFO run:status{status_len=0}: workingcopy::workingcopy: exit
2024-09-18T14:30:52.751429Z DEBUG run: clidispatch::context: maybe starting pager enable_pager=true reason="auto"
2024-09-18T14:30:52.755092Z DEBUG run: progress_model::registry: removing 1 orphan ProgressBar
2024-09-18T14:30:52.755551Z  INFO run: commands::run: exit
quark-zju commented 2 months ago

I tried running git status with git --git-dir .sl/store/git status

That would be a .sl repo. A .git repo is when you run git init or git clone followed by running sl commands in it and is a separate issue (duplicated with #929).

Could you try the .sl instructions above?

ianb-pomelo commented 2 months ago

Ah sorry, the output is

sl debugstatus
len(dirstate) = 7915
len(nonnormal) = 4
len(filtered nonnormal) = 4
clock = None
sl debugtree
dirstate v2 (using treestate/60f847df-7593-4f1f-b826-66c1f92973c4, offset 2811706, 7915 files tracked)

I do not have watchman installed

quark-zju commented 2 months ago

sl debugstatus len(dirstate) = 7915 len(nonnormal) = 4 len(filtered nonnormal) = 4 clock = None

Thanks. It looks normal. Something might have regressed without our notice (non-watchman, non-edenfs use-cases are rare in production). I can have a look when I get time.

quark-zju commented 2 months ago

@ianb-pomelo What is the version of Sapling you're using? Are you building sl from source? Have you tried

sl status --config workingcopy.use-rust=true --config status.use-rust=true

?

ianb-pomelo commented 2 months ago

I installed Sapling via Homebrew and am using version Sapling 0.2.20240718-145624-f4e9df48

I tried that command and it was about the same performance as without that config.

I have also tried installing watchman and that substantially sped things up so for me I can just use that setup

nicktogo commented 2 months ago

I'm using .sl now and I have watchman running. Here are my debug output:

➜  sl debugstatus                                                              
len(dirstate) = 519805
len(nonnormal) = 1
len(filtered nonnormal) = 1
clock = c:1726596135:5724:1:46787
➜  sl debugtree                                                                
dirstate v2 (using treestate/37b9b29e-9233-4a38-b688-96be70bd6b3c, offset 43057603, 519805 files tracked)

setting the flag to true is faster than false. If I just use sl status, that seems to be same as setting to true

sl status --config workingcopy.use-rust=true --config status.use-rust=true
nicktogo commented 2 months ago

@quark-zju do you have some suggestions how to work around this, other than re-cloning?

destroycomputers commented 1 month ago

I have encountered this, too.

A quick run under Instruments showed that in my case the issue was due to multi-pack index validation. The issue only appears after some time because originally there is no multi-pack index, as the count of pack files is low. This also means I could fix it temporarily by running git's garbage collection on the .sl/store/git directory.

The reason it's slow seems to be due to excessive validations on multi-pack index loading as described in https://github.com/libgit2/libgit2/issues/6166 which is fixed in the upstream. Indeed, simply recompiling sapling with the git2 rust dependency bumped from version 0.14 to the latest 0.19 fixed the problem for me.

nicktogo commented 1 month ago

Thanks for sharing! @destroycomputers this works for me too!!

quark-zju commented 1 month ago

@quark-zju do you have some suggestions how to work around this, other than re-cloning?

As a temporary solution, you can try disabling multi-pack index by appending to .sl/store/git/config after #### End maintained by Sapling ####:

[maintenance "gc"]
enabled = true
[maintenance "incremental-repack"]
enabled = false

Then delete the multi-pack index file:

rm .sl/store/git/objects/pack/multi-pack-index

The underlying complexity is with --config git.shallow=1 (sapling config, matches git's "filter" concept, not git's "shallow"), the default gc can error out saying some object is not found (reported internally). But multi-pack repack works. If you're not using git.shallow feature you don't need the multi-pack index and can use the regular gc.

It seems the multi-pack index is causing multiple issues - performance issues reported here, and "object not found" issues reported internally ("not found" by libgit2, but found by git). So we might want to abandon its usage all together. To do that, we might need to switch off libgit2, or the use of the git native store. Those are non-trivial and would take some time...

destroycomputers commented 1 month ago

It seems the multi-pack index is causing multiple issues - performance issues reported here, and "object not found" issues reported internally ("not found" by libgit2, but found by git). So we might want to abandon its usage all together. To do that, we might need to switch off libgit2, or the use of the git native store. Those are non-trivial and would take some time...

@quark-zju meanwhile, would it be possible to bump the version of the git2 dependency to latest? That would at least fix this specific issue and given the currently used version is over 2 years old, maybe will have some other positive effects.

quark-zju commented 1 month ago

would it be possible to bump the version of the git2 dependency to latest?

Good catch. I didn't realize we're using an old version. I tried upgrading it but it does not seem as trivial since other 3rd party dependencies in the monorepo (like cargo, sqlite, etc) wants the older version. Upgrading them could be a choice but if it's too hard we might try different approaches.

UPDATE: It is more complex than I thought. Let's say it's not feasible to upgrade now. Our plan might be migrating off libgit2.