Closed arxanas closed 3 years ago
Hi @arxanas!
Thank you for your pull request.
We require contributors to sign our Contributor License Agreement, and yours needs attention.
You currently have a record in our system, but the CLA is no longer valid, and will need to be resubmitted.
In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.
Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed
. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.
If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks!
@wez pls fix build? I just want a macOS executable with these changes π₯Ίπ .
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks!
@fanzeyi has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks!
@wez pls fix build? I just want a macOS executable with these changes π₯Ίπ .
watchman isn't my main focus these days, but @fanzeyi and the rest of the team are doing a great job looking after it!
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks!
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks!
@facebook-github-bot You're welcome!
@mroch merged this pull request in facebook/watchman@b6034cefbd7e3e6a41aeedda9c3baed38597c72d.
@mroch Thanks for cleaning this up and merging! And thanks @chadaustin for reviewing!
This looks exciting! What does one need to do to an existing git repo monitored by the fsmonitor hook to take advantage of this new integration?
https://facebook.github.io/watchman/docs/scm-query.html is the main feature that this integration most immediately unlocks.
@arxanas @mroch: looks like those docs could do with being updated to mention the new new scm-git
capability!
My reading of https://github.com/facebook/watchman/pull/934#issue-953330495 was that it would also improve performance. Do I misunderstand?
@wez / @mroch I'm happy to update documentation, but I think that the current implementation is buggy. Git::getFilesChangedSinceMergeBaseWith
seems to only return committed files and not uncommitted files. (Regrettably, there appears to be no single git diff
command which will include both committed and uncommitted files?)
I had difficulty updating the tests in the OSS build. I tried getdeps.py test
, but if I recall correctly, it seemed to be ignoring the updates to the test cases which I wrote, and I couldn't get it to pick up my new test. Is there a guide on testing Watchman for OSS?
@dmose This functionality could be used to improve performance, but it needs to be adopted by tools which plan to make specific use of it. It also probably won't be too useful unless your repository is big enough. I'm curious how large your repository is?
I personally plan to update https://github.com/arxanas/git-branchless to use it for commit/status/diff operations one day. As per this comment, I think there is room for improvement:
[...] because of bugs / misfeatures in the current git-side implementation it can take hundreds of milliseconds until we make sense of all of that. This is because git doesn't really "know" about watchman/inotify, it just uses it as a replacement for walking the FS, but on big repositories other stuff can still be expensive.
To go over the points in my original comment:
git --no-optional-locks status
.git status
(which is probably the worst time for interactive use?). On the other hand, you could ask Watchman to keep track of all the file hashes as the files change, so you could avoid a lot of I/O when you want to ask about the status of the repo.git diff
from a certain commit tree, rather than re-processing all changed files.It's quite large: https://github.com/mozilla/gecko-dev (it contains all of firefox desktop and gecko, which includes codebases like webrtc, spidermonkey, lots of vendored-in rust etc).
I guess I was imagining a world where git itself used this interface instead of the fs-notify hook to speed up normal operations (git status, etc).
I'm now suspecting that someone would have to sign up to do that work on the git side. Is that correct?
The changes in this PR allow watchman to reason about git state by calling out to git and caching information.
git can't make use of these changes directly, because this particular feature is asking git for information. This PR is aimed at tools downstream of watchman that need to move more processing to be O(local-changes)
in the face of an update/rebase operation that may change many files.
To speed up the git fsmonitor integration, the logical first step would be to eliminate that python process overhead mentioned above by eg: rewriting that hook in say Rust using the watchman_client crate.
My recollection is that fsmonitor only accelerates a subset of git compared to the deeper integration in the Mercurial fsmonitor extension. I believe that it may be technically possible to more deeply integrate the fsmonitor concept in git, but the architecture of git likely makes that a fairly high effort engineering project.
@dmose I do performance testing for https://github.com/arxanas/git-branchless on gecko-dev
(see e.g. https://github.com/libgit2/libgit2/issues/6036). If you're interested in using the tool, I recommend you install directly from master
for now, since it includes unreleased significant performance improvements for large repos, such as for git move
. If you still see poor performance on large repositories, feel free to open a discussion thread there.
You can add yourself as a watcher to the repository and get notified when I publish a release that adds a replacement for git status
. The intention is to 1) talk directly to Watchman, and 2) avoid using the index as a data structure, since it makes operations O(repo). I don't expect to release something like that soon, though.
You might also want to try the sparse-checkout
/sparse-index
features of Git: https://git-scm.com/docs/git-sparse-checkout. I think that would be the best bet for improving your performance on a monorepo. It should be under active development, so you might want to keep an eye on the Git release notes.
If you also have the problem of an fsmonitor hook with slow startup, you might be able to try https://github.com/jgavris/rs-git-fsmonitor. This isn't an option for me, unfortunately.
You could also try developing on gecko-dev
with Mercurial instead. Presumably "deeper integration" with Watchman for Mercurial would make the performance better than with Git?
Git already supports Watchman via the
fsmonitor
hook. For performance reasons, I want to add Git support directly: