Open danielkhan opened 1 year ago
We must find a migration path that lets us first emit sentry-trace and trace-parent in parallel. This can't be turned on by default because of CORS, though
This is only a concern in FE JS, all other platforms can in theory send both headers today.
We also need to ensure that we can pick up an incoming trace-parent header
The question will be what will take precedence for the moment, likely sentry-trace
.
The question will be what will take precedence for the moment, likely sentry-trace
Yes, sentry-trace
it is.
React-Native will get this from JS.
Hi! I'mvery glad to see this being worked on.
We have a large mixed OTel-and-Sentry microservice system. Is there any docs/guidance on how we can get end-to-end distributed tracing working? (doesn't have to be polished, even just links to GH issues or pointers would be helpful)
I'm also wondering what direction we should take re evolving our current setup, in terms of matching the direction that Sentry is going... Should we be adding the Sentry SDK to the systems that are currently instrumented with Otel, or the reverse and moving to a "fully-OTel but with Sentry exporter" system.
The sense I'm getting is that the Sentry OTel story is still evolving, and so it may be too early for us to move to the latter direction without experiencing a ton of "bleeding-edge paper cuts" ? But at the same time I don't really feeling like doing two migrations (go all-Sentry now and then Sentry-on-OTel later) since there are so many micro services that need touching.
The trace header interior/propagation story is kinda messy in a legacy mixed system like ours. And for more fun, we also have legacy Honeycomb usage (with their proprietary headers) in the mix for some of the services, so there are 3 standards in play 🥴
Thanks!
Hi @ericsampson, the direct answer is switching to Sentry SDKs is more beneficial. We're planning to support OTLP ingest this year, but OTel's data often doesn't match Sentry's quality, particularly for errors.
I suggest using our SDKs as we're integrating OTel for its valuable instrumentation. For instance, our Node.js experimental package already utilizes OTel for auto-instrumentation while maintaining compatibility with the OTel tracing SDK and providing Sentry's error tracking.
Throughout this year, we'll introduce more SDKs that integrate OTel. If installing our SDK isn't an option, you'll still be able to send data via OTLP, which is part of the improvements we're discussing here.
thanks for the info @danielkhan !
Snippet to generate traceparent
with the JavaScript SDKs, should work in both v7 & v8:
import * as Sentry from '@sentry/node';
function getTraceparentString() {
const span = Sentry.getActiveSpan();
if (!span) {
return undefined;
}
return`00-${span.spanContext().traceId}-${span.spanContext().spanId}-0${span.spanContext().traceFlags}`;
(for context, traceFlags is either 0 or 1)
Snippet to generate traceparent
with the Cocoa SDKs.
func getTraceParent() -> String? {
guard let span = SentrySDK.span else { return nil }
return "00-\(span.traceId.sentryIdString)-\(span.spanId.sentrySpanIdString)-0\(span.sampled == .yes ? 1 : 0)"
}
Java
public static String getTraceParent() {
final ISpan span = Sentry.getSpan();
if (span != null) {
final SpanContext context = span.getSpanContext();
return "00-" + context.getTraceId() + "-" + context.getSpanId() + "-0" +
((context.getSampled() != null && context.getSampled()) ? "1" : "0");
}
return null;
}
Kotlin
fun getTraceParent(): String? = Sentry.getSpan()?.spanContext?.let { context ->
return "00-${context.traceId}-${context.spanId}-0${if (context.sampled == true) "1" else "0"}"
}
While we are moving to a world where we may process, propagate, and ingest pure OpenTelemetry (OTLP) data, we have to reconsider our trace context propagation schema and adopt W3C Trace Context fully.
Project Board
About W3C Trace Context
W3C Trace Context is a standard for trace context propagation. It defines three headers.
traceparent
(W3C): to capture the relationship between spans and propagate sampling decisionstracestate
(W3C): A vendor-prefixed key-value list to hold vendor-specific (meta) data, like tenant IDs or vendor-local trace IDsbaggage
(W3C): A userland read- and writeable key-value list of properties that can propagate arbitrary information—E.g., for a/b tests or for propagating a customer loyalty status.Sentry has a very similar trace propagation concept. It defines
sentry-trace
, which is almost similar totrace-parent
, except missing some flags W3C introducedbaggage
where we follow the spec, yet we are using it to propagate essential sentry-specific metadata like the dynamic sampling context (DSC). This can have unexpected consequences as the user can remove entries or create a new Baggage object altogether (see Baggage interface in OTel).We are not using a concept like
trace-state
.Problem Description
The problem arises in mixed scenarios where one tier might be instrumented purely with OpenTelemetry.
In this example, the web frontend sends sentry trace headers to the Node middleware where no Sentry SDK or propagator is present, which leads to context loss ( a broken trace). There is also a Python backend tier with a Sentry SDK that receives a
trace-parent
and needs to continue this trace.Out Of Scope
While we are currently not using the
baggage
header to the spec, this problem is out-of-scope for this project.Next steps
sentry-trace
andtrace-parent
in parallel. This can't be turned on by default because ofCORS,
thoughtrace-parent
headersentry-trace
A quick TL;DR: How to Implement
Add support continuing a trace from the
traceparent
header. If bothsentry-trace
andtraceparent
headers are present,sentry-trace
takes precedence. On outgoing HTTP requests, attach atraceparent
header next to thesentry-trace
header by default. Users can opt out by settingtrace_propagation_tragets
to[]
. The W3Ctraceparent
header only has two sampling states; hence, deferred or not-to-sample decisions are both represented with-00
. Add a new globalgetW3CTraceparent()
function as well asSpan::toW3CTraceparent()
.Get inspired by the PHP SDK PR #1 & #2 and Laravel SDK PR.
Time constraints
We will soon be able to ingest OTel traces and this turns out to be rather critical. So we should have a plan by end of Q4 and implement this in Q1.
Stakeholder(s)
Team(s)
Web-Backend, Mobile, Web-Frontend