buildkite / agent

The Buildkite Agent is an open-source toolkit written in Go for securely running build jobs on any device or network
https://buildkite.com/
MIT License
809 stars 298 forks source link

Opentelemetry: Propegate TraceID and SpanID to steps #1663

Open ojkelly opened 2 years ago

ojkelly commented 2 years ago

Is your feature request related to a problem? Please describe. The new OpenTelemetry (OTEL) experiment is helpful, but it does not make it possible to get the current traceId and spanId.

If we could get access to these, we can then link our OTEL instrumentation inside each step with the trace that exists for the whole job.

Describe the solution you'd like Extend the existing BUILDKITE_TRACE_CONTEXT found in https://github.com/buildkite/agent/blob/848a053b3131ebced93d829c3215b31bddd19f72/tracetools/propagate.go to each step, when a tracing backend is used.

Describe alternatives you've considered I've hacked around the codebase to enable access to TraceID and SpanID as individual variables, but the OTEL libraries are all focussed around propagating from a single trace context.

Additional context This helps with getting a deeper understanding of what's happening over the course of a whole job. And helps with linking telemetry from each step together.

I'm willing to work on a PR up for this, if it's likely to be merged.

ojkelly commented 2 years ago

I've found a very relevant issue from the Otel repo https://github.com/open-telemetry/opentelemetry-specification/issues/740.

It goes into a bit of depth about other CI tooling, and how they've mostly settled on using TRACEPARENT and TRACECONTEXT as the envars to share trace information.

moskyb commented 2 years ago

hey @ojkelly! this is definitely something that we're looking to work on - it was intentionally left out of the initial release so that we could get something out, but we'll definitely be looking to include it in the near future. There's some internal work that i'm doing that's ahead of trace propagation in the queue, but it will be coming soon.

I've also been across the issue you linked - whatever we end up doing, it'll probably end up being complaint with the (non-) standard laid out there.

jlisee commented 1 year ago

Any update on this? Maybe some tips on how to accomplish this in the code so I could patch our agent and put something up for others as well?

I am doing integration with a new organization with Buildkite so I was hoping to leverage the built in OpenTelemetry support. Previously I have used a totally external trace generation system (AWS EventBridge -> API Call -> Trace Data) and that worked under an implicit API of making the trace ID match the Buildkite build ID and the span job ID match the Buildkite span ID.

ozdenyilmaz commented 1 year ago

Hi everyone,

I am from the product team at Buildkite. Unfortunately this is still in backlog for us, so it will be a while before we pick this piece of work. The team is working on security work like signed agents that will delay this being from picked up. Sorry about that.