Azure / durabletask

Durable Task Framework allows users to write long running persistent workflows in C# using the async/await capabilities.
Apache License 2.0
1.53k stars 296 forks source link

Request for Activity.Baggage propagation for Distributed Tracing #1073

Open tundwed opened 7 months ago

tundwed commented 7 months ago

My team are using DTFx in a resource provider and we would like to have built in propagation of Activity.Baggage from taskhubclient all the way to the worker orchestration.

cgillum commented 7 months ago

@jviau your thoughts on this?

jviau commented 7 months ago

Baggage propagation is part of OTel spec: https://opentelemetry.io/docs/concepts/signals/baggage/

There is some discrepancies between OTel baggage spec and Activity.Baggage, so we should verify that those are.

jviau commented 7 months ago

Chatted a bit with an OTel expert:

The gap with Activity.Baggage is that baggage is tied to a span. The OTel Baggage API (from their packages) is separate from the span (and you then opt in by selectively copying from OTel Baggage to span tags). The separation is important because baggage can then flow even when the span is sampled out. However, that requires using OTel packages directly. Which I don't think we should do in our core packages. Eventually the Otel Baggage APIs may move into System.Diagnostics.DiagnosticSources package, but that is a low priority. W3c baggage propagation standard is still in draft: https://w3c.github.io/baggage/. Also many .NET SDKs, such as almost all of the track 2 Azure SDKs, do not support baggage. Given this vague area of baggage, the primary message I got from the chat was "Its okay to wait a year".

I see a few options for us:

  1. Adopt Activity.Baggage directly. We will mostly be on our own for this. But we could look at the draft w3c baggage spec.
  2. Do above, but have some abstraction layer for baggage handling. We can then introduce a DurableTask.OpenTelemetry package and have some hook from the client / worker to handle baggage via OTels APIs.
    • This could always happen sometime after option 1.
  3. Do nothing for now, wait until more guidance for proper baggage handling in .NET.

Also to note: baggage propagation can be accomplished via orchestration tags. It will require some effort on the users part, but they can add their own extension methods to TaskHubClient and OrchestrationContext to copy baggage into tags, and then an orchestration & activity middleware to copy tags into baggage.

My recommendation: if we have time we should do option 1, but I think its a low priority at the moment compared to stabilizing the rest of distributed tracing.