bazelbuild / bazel

a fast, scalable, multi-language and extensible build system
https://bazel.build
Apache License 2.0
23.23k stars 4.07k forks source link

Add option to write `bazel query` output directly to a file #24293

Open keithl-stripe opened 2 days ago

keithl-stripe commented 2 days ago

Description of the feature request:

Our repository contains about 700,000 targets. We use the output of bazel query to improve CI performance, by restricting the Bazel build to changed targets and their transitive dependencies (similar to bazel-diff).

Specifically, we run:

bazel query --output=streamed_proto //...

This produces a 6.8 GB file and takes (~cold):

We'd like to speed up this last step, as it’s 74% of wall time.

Through Java profiling (via YourKit and Java Flight Recorder) we've noticed that Bazel spends a lot of CPU and wall time marshaling the query output to gRPC to send back to the Bazel client. This would be eliminated by writing directly to a file.

Which category does this issue belong to?

Core, Performance

What underlying problem are you trying to solve with this feature?

Improve bazel query performance when the output is destined for a file

Which operating system are you running Bazel on?

Linux Ubuntu 24.04.1

What is the output of bazel info release?

release 7.2.0

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse HEAD ?

No response

Have you found anything relevant by searching the web?

No response

Any other information, logs, or outputs that you want to share?

No response

michajlo commented 1 day ago

We actually used to have something like this for an internal output-formatter implementation, and we'd attach a FIFO (pipe) so we could pipeline result processing. Unfortunately we wound up running into a lot of issues with pipes, java, and interrupt handling, so we forewent it in favor of reading results directly from blaze's grpc interface, which was much faster than reading it via the bazel cpp client (the bottleneck at that point), but it does require knowing how to talk to bazel directly over grpc. This was a while ago, so I'm not sure what the current state of performance for all these things is.

Anyway, I bring this up in case you were considering any sort of similar pipelining using this flag.