bazelbuild / remote-apis

An API for caching and execution of actions on a remote system.
Apache License 2.0
341 stars 119 forks source link

Standardize Build Event Protocol #318

Open sluongng opened 3 weeks ago

sluongng commented 3 weeks ago

Today, among different matured build tool solutions there exists several build event protocols that enable build telemetry use cases:

On top of these, many build tools and CI systems in the wild have started adopting a more generic telemetry system (Open Telemetry, Prometheus) for their CI/CD telemetry needs:

So I want to start a discussion about a standardized Build Event Protocol so that different client and server implementations can agree on a common specification moving forward, and reduce overall fragmentation.

Please comment below if you are interested in adopting such a spec.

EdSchouten commented 3 weeks ago

Just to clarify: are we talking about build_event_stream.proto a.k.a. Build Event Protocol, or publish_build_event.proto a.k.a. Build Event Service?

Standardizing the latter (BES) might make sense. The former (BEP), I'm not convinced that's a good idea. The reason being that it exposes information in a schema that corresponds to Bazel's data model. For example, is it realistic to assume that Pants, Buck2, etc. etc. etc. all have the equivalent of a "ConvenienceSymlinksIdentified" event? I don't think so.

sluongng commented 3 weeks ago

Agree. I don't think we want to make Bazel-specific events a standardized spec.

I think a good starting point would be a new event protocol that meets all the common needs of existing tools:

  1. Creating an invocation with an ID
  2. Command line, workspace information
  3. Timing data
  4. ???

And leave an Any field for different tools to implement domain-specific events. Overtime, we can identify common needs between tools (i.e. more than 2 tools interested in the same thing) to add more event types to the spec.

sluongng commented 3 weeks ago

cc: @philwo @aherrmann @bergsieker who might be interested in this topic.

EdSchouten commented 3 weeks ago

My concern is that if we attempt to standardize anything that is in excess of the Build Event Service, it would severely suffer from an inner-platform effect.

bergsieker commented 2 weeks ago

You might notice that even within Google we have (at least) two different interfaces for this. When we looked at standardizing them years ago, we found that BEP/BES didn't map well onto the Chromium build lifecycle. I don't recall the details, but certainly at least part of it was due to hierarchical builds, where one build initiates another, and you want to be able both to track them separately and to provide a rollup view. Both Bazel and Chrome had too much entrenched usage to make changing them realistic.

My gut feeling here is that BEP doesn't generalize well to other tools. BES might generalize but I'm not sure. However, Bazel is unlikely to move to a new protocol due to the significant infrastructure that we've built internally around BEP.

I'd suggest exploring what this looks like when built on top of an existing framework like Open Telemetry. It's possible there could be enough momentum from non-Bazel tools to get that off the ground, and leveraging existing open standards is good when possible.

sluongng commented 1 week ago

Added ReClient's events to the issue's description.

TheGrizzlyDev commented 1 week ago

I've just opened up a proposal to add BES (or something equivalent) to Buck2: https://github.com/facebook/buck2/pull/806