Closed jbedard closed 2 months ago
Thanks @jbedard for running with https://github.com/bazelbuild/bazel-gazelle/pull/1820 and getting it to a state where it will hopefully be acceptable. I've confirmed that this has subtantially the same performance as the original PR (except for the Configure2
bit, which we can submit separately). The parallelization is indeed important, it's roughly a 2x speedup for our repo.
In addition, Jason and I have discussed how this opens up the path to potentially reading BUILD files in the initial traversal, which can parallelize additional work as well as allow to respect exclude
directives during the parallel walk.
Interestingly enough, when I profile the approach here compared to fastwalk
, I am seeing that fastwalk
reimplemented os.Readdir
in a faster way, and that's responsible for around .6s of our 4s runtime, i.e. 15%. However, fastwalk
appears to spend more time in contention/orchestration of goroutines than the errgroup approach, so the overall runtime is around the same. Potentially os.Readdir
can be further optimized in upstream Go stdlib.
@tyler-french wdyt of this approach? We have been trying to land this perf improvement in one shape or another for months now
@fmeum have you had a chance to look into this and compare with the original PR from @dzbarsky?
@tyler-french wdyt of this approach? We have been trying to land this perf improvement in one shape or another for months now
I'm also interested to see some movement on this PR. I'm working with some rather large repositories, and these changes should solve some of the performance bottlenecks I've been investigating.
We wanted to double-check the perf numbers. Here's what I see on Airtable's repo:
No parallel walk:
fastwalk implementation from #1820
This PR's parallel walk:
So this PR is roughly similar to fastwalk, maybe a little faster, and much simpler. Looks like 15% speedup overall on our repo. Let's merge it!
LGTM, should I merge (PR is marked as draft)?
I wanted to double check the improvements were still the same as when we last measured. Looks like @DavidZbarsky-at has confirmed it's still 15% with his tests 👍
What type of PR is this? Other
What package or component does this PR mostly affect? all
What does this PR do? Why is it needed?
Walk the bazel workspace fs directories in parallel to build a structure representing the workspace before invoking the gazelle phases.
Similar to https://github.com/bazelbuild/bazel-gazelle/pull/1820 from @dzbarsky including using
the samea simplifiedpathTrie
struct but using a simple customwalkDir
method instead of introducing a new external dep.Which issues(s) does this PR fix?
Fixes #1819 Closes #1820