facebook / watchman

Watches files and records, or triggers actions, when they change.
https://facebook.github.io/watchman/
MIT License
12.58k stars 990 forks source link

Add Git support #934

Closed arxanas closed 3 years ago

arxanas commented 3 years ago

Git already supports Watchman via the fsmonitor hook. For performance reasons, I want to add Git support directly:

facebook-github-bot commented 3 years ago

Hi @arxanas!

Thank you for your pull request.

We require contributors to sign our Contributor License Agreement, and yours needs attention.

You currently have a record in our system, but the CLA is no longer valid, and will need to be resubmitted.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks!

arxanas commented 3 years ago

@wez pls fix build? I just want a macOS executable with these changes πŸ₯ΊπŸ™ .

facebook-github-bot commented 3 years ago

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks!

facebook-github-bot commented 3 years ago

@fanzeyi has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot commented 3 years ago

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks!

wez commented 3 years ago

@wez pls fix build? I just want a macOS executable with these changes πŸ₯ΊπŸ™ .

watchman isn't my main focus these days, but @fanzeyi and the rest of the team are doing a great job looking after it!

facebook-github-bot commented 3 years ago

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks!

facebook-github-bot commented 3 years ago

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks!

arxanas commented 3 years ago

@facebook-github-bot You're welcome!

facebook-github-bot commented 3 years ago

@mroch merged this pull request in facebook/watchman@b6034cefbd7e3e6a41aeedda9c3baed38597c72d.

arxanas commented 3 years ago

@mroch Thanks for cleaning this up and merging! And thanks @chadaustin for reviewing!

dmose commented 3 years ago

This looks exciting! What does one need to do to an existing git repo monitored by the fsmonitor hook to take advantage of this new integration?

wez commented 3 years ago

https://facebook.github.io/watchman/docs/scm-query.html is the main feature that this integration most immediately unlocks.

@arxanas @mroch: looks like those docs could do with being updated to mention the new new scm-git capability!

dmose commented 3 years ago

My reading of https://github.com/facebook/watchman/pull/934#issue-953330495 was that it would also improve performance. Do I misunderstand?

arxanas commented 3 years ago

@wez / @mroch I'm happy to update documentation, but I think that the current implementation is buggy. Git::getFilesChangedSinceMergeBaseWith seems to only return committed files and not uncommitted files. (Regrettably, there appears to be no single git diff command which will include both committed and uncommitted files?)

I had difficulty updating the tests in the OSS build. I tried getdeps.py test, but if I recall correctly, it seemed to be ignoring the updates to the test cases which I wrote, and I couldn't get it to pick up my new test. Is there a guide on testing Watchman for OSS?

@dmose This functionality could be used to improve performance, but it needs to be adopted by tools which plan to make specific use of it. It also probably won't be too useful unless your repository is big enough. I'm curious how large your repository is?

I personally plan to update https://github.com/arxanas/git-branchless to use it for commit/status/diff operations one day. As per this comment, I think there is room for improvement:

[...] because of bugs / misfeatures in the current git-side implementation it can take hundreds of milliseconds until we make sense of all of that. This is because git doesn't really "know" about watchman/inotify, it just uses it as a replacement for walking the FS, but on big repositories other stuff can still be expensive.

To go over the points in my original comment:

dmose commented 3 years ago

It's quite large: https://github.com/mozilla/gecko-dev (it contains all of firefox desktop and gecko, which includes codebases like webrtc, spidermonkey, lots of vendored-in rust etc).

I guess I was imagining a world where git itself used this interface instead of the fs-notify hook to speed up normal operations (git status, etc).

I'm now suspecting that someone would have to sign up to do that work on the git side. Is that correct?

wez commented 3 years ago

The changes in this PR allow watchman to reason about git state by calling out to git and caching information. git can't make use of these changes directly, because this particular feature is asking git for information. This PR is aimed at tools downstream of watchman that need to move more processing to be O(local-changes) in the face of an update/rebase operation that may change many files.

To speed up the git fsmonitor integration, the logical first step would be to eliminate that python process overhead mentioned above by eg: rewriting that hook in say Rust using the watchman_client crate.

My recollection is that fsmonitor only accelerates a subset of git compared to the deeper integration in the Mercurial fsmonitor extension. I believe that it may be technically possible to more deeply integrate the fsmonitor concept in git, but the architecture of git likely makes that a fairly high effort engineering project.

arxanas commented 3 years ago

@dmose I do performance testing for https://github.com/arxanas/git-branchless on gecko-dev (see e.g. https://github.com/libgit2/libgit2/issues/6036). If you're interested in using the tool, I recommend you install directly from master for now, since it includes unreleased significant performance improvements for large repos, such as for git move. If you still see poor performance on large repositories, feel free to open a discussion thread there.

You can add yourself as a watcher to the repository and get notified when I publish a release that adds a replacement for git status. The intention is to 1) talk directly to Watchman, and 2) avoid using the index as a data structure, since it makes operations O(repo). I don't expect to release something like that soon, though.

You might also want to try the sparse-checkout/sparse-index features of Git: https://git-scm.com/docs/git-sparse-checkout. I think that would be the best bet for improving your performance on a monorepo. It should be under active development, so you might want to keep an eye on the Git release notes.

If you also have the problem of an fsmonitor hook with slow startup, you might be able to try https://github.com/jgavris/rs-git-fsmonitor. This isn't an option for me, unfortunately.

You could also try developing on gecko-dev with Mercurial instead. Presumably "deeper integration" with Watchman for Mercurial would make the performance better than with Git?