Closed Steffen911 closed 1 month ago
Hi @Steffen911, thanks for reporting this issue!
Yes, I think you're right that we should use parentContext
in that line. (Could you send a PR about it? 🙏)
However, I'm not sure if that will solve the problem you're encountering. In that line, we create a context for the newly created span. But said span was created using trace.startSpan(..., parentContext)
. If the trace flags were being honored, shouldn't that span have been created as a NonRecordingSpan
? 🤔
Do give it a try! If it solves the problem, I'm happy to merge it.
Hey @unflxw,
I'll open the PR for the parentContext, because I think it would be the correct setup anyway. It doesn't seem to fix the issue though based on my local experiments. It seems that the TraceFlags aren't interpreted when creating the span in https://github.com/appsignal/opentelemetry-instrumentation-bullmq/blob/main/src/instrumentation.ts#L474 as the resulting span has a _spanContext.traceFlags = 1
and isRecording() === true
.
Do you have any other ideas what it could be?
I added https://github.com/appsignal/opentelemetry-instrumentation-bullmq/pull/14, but it doesn't seem to address this issue.
@Steffen911 Do you have any other ideas what it could be?
I think the default sampling behavior is to sample all traces, which is what you're seeing, regardless of the trace flags. I think you may need to configure OpenTelemetry to use the ParentBasedSampler
, so that it takes into account whether the parent span was sampled. Likely in the way that is described in their documentation for the TraceIDRatioBasedSampler
, but for the ParentBasedSampler
instead.
@unflxw Thank you so much! That was actually the perfect hint. We used a TraceIDRatioBasedSampler in all of our services which overwrote the ParentBasedSampler (which seems to be the default). So not using the TraceIDRatioBasedSampler fixed it for us.
What happened
We added sampling to our application to reduce the total amount of spans ingested. We use head based sampling and make the decision on the producer side. We use the new
useProducerSpanAsConsumerParent
flag which means that we'd expect that the sample decision propagates down to the consumer spans. In our case, we still see consumer spans in our observability tool, even though the TraceFlag is set to00
. We validate the latter using thetraceparent
on the job opts.What we expect to happen
When the TraceFlags on the producerContext are set to
00
we expect the same thing on the consumer, i.e. the trace should be excluded from recording, if the producerContext is to be used.Solution approach
@unflxw I think the root cause might be that we still use the consumerContext as active here: https://github.com/appsignal/opentelemetry-instrumentation-bullmq/blob/main/src/instrumentation.ts#L511. Perhaps we should use the
parentContext
variable in this row as well?