Open keithl-stripe opened 2 days ago
We actually used to have something like this for an internal output-formatter implementation, and we'd attach a FIFO (pipe) so we could pipeline result processing. Unfortunately we wound up running into a lot of issues with pipes, java, and interrupt handling, so we forewent it in favor of reading results directly from blaze's grpc interface, which was much faster than reading it via the bazel cpp client (the bottleneck at that point), but it does require knowing how to talk to bazel directly over grpc. This was a while ago, so I'm not sure what the current state of performance for all these things is.
Anyway, I bring this up in case you were considering any sort of similar pipelining using this flag.
Description of the feature request:
Our repository contains about 700,000 targets. We use the output of
bazel query
to improve CI performance, by restricting the Bazel build to changed targets and their transitive dependencies (similar to bazel-diff).Specifically, we run:
This produces a 6.8 GB file and takes (~cold):
We'd like to speed up this last step, as it’s 74% of wall time.
Through Java profiling (via YourKit and Java Flight Recorder) we've noticed that Bazel spends a lot of CPU and wall time marshaling the query output to gRPC to send back to the Bazel client. This would be eliminated by writing directly to a file.
Which category does this issue belong to?
Core, Performance
What underlying problem are you trying to solve with this feature?
Improve
bazel query
performance when the output is destined for a fileWhich operating system are you running Bazel on?
Linux Ubuntu 24.04.1
What is the output of
bazel info release
?release 7.2.0
If
bazel info release
returnsdevelopment version
or(@non-git)
, tell us how you built Bazel.No response
What's the output of
git remote get-url origin; git rev-parse HEAD
?No response
Have you found anything relevant by searching the web?
No response
Any other information, logs, or outputs that you want to share?
No response