Open mahwy opened 1 month ago
Hey @mahwy! Thanks for the note.
We intentionally don't support Raw Message Delivery because, although we can pass trace context, we rely on the SNS and SQS metadata to provide the timestamps used for when the message was first sent to SNS and again when SNS delivered it to SQS. Without that, we're not able to determine when the message was originally published, only that it showed up to SQS at the time in the SQS wrapper.
Instead we've asked people to disable raw message delivery if they want to trace through both SNS and SQS. We could certainly enable trace context extraction if that's something critical for you, but it won't create a coherent trace the way it does without raw message delivery.
Can you help me understand why raw message delivery is critical to your workflow?
Thanks! AJ
@astuyve I'm receiving trace context even with raw message delivery. (Screenshot attached)
messageAttributes._datadog
was injected by dd-trace-js
I think the issue can be fixed by letting the SQSEventTraceExtractor read binaryValue
in addition to stringValue
Hi @mahwy yes trace context is there but that is not enough. We need the timestamps for tracing to fully work, see: https://github.com/DataDog/datadog-lambda-js/blob/main/src/trace/span-inferrer.ts#L295
which isn't present when raw message delivery is enabled.
Can you help me understand why it's required for your use case?
We have existing AWS Lambda functions subscribing a SQS topic. Events are first published to a SNS topic and the SQS topic is subscribing the SNS topic. These SQS subscriber lambda functions are unable extract the trace contexts from the events creating two separate traces for each request chain.
As a workaround, I've added some code to inject messageAttributes._datadog
with StringValue when publishing messages to the SNS topic, which fixes the problem.
const params: SNS.Types.PublishInput = {...};
const activeSpan = tracer.scope()?.active();
if (activeSpan) {
const ddInfo = {};
tracer.inject(activeSpan, 'text_map', ddInfo);
if (Object.keys(ddInfo).length !== 0) {
if (!params.MessageAttributes) {
params.MessageAttributes = {};
}
params.MessageAttributes._datadog = {
DataType: 'String',
StringValue: JSON.stringify(ddInfo),
};
}
}
await sns.publish(params)
I just ran across this same issue. We have a number of instances where we desire raw message delivery for the ease of use and efficiency (especially if we have messages that near the message size limit), plus the ability to use message filtering.
The timestamps in the message are only the SQS ones, and the SNS ones are left out? is that the main issue? I wonder if theres a way we could 'opt in' to still being able to pull these in?
The side problem is I tried the workaround proposed by @mahwy, but the auto instrumentation injection is overriding it I think, and making it still a binaryValue. Is there a way I can opt out of that conditionally on a call? Or another way to work around this?
@onyxraven right, it doesn't work with dd-trace auto-instrumentation. I had to exclude aws-sdk
from esbuild external config. You might be able to disable aws-sdk instrumentation via DD_TRACE_DISABLED_PLUGINS env var.
Hi @onyxraven, thanks for the note
I just ran across this same issue. We have a number of instances where we desire raw message delivery for the ease of use and efficiency (especially if we have messages that near the message size limit), plus the ability to use message filtering.
As far as I know the SNS message attributes are fixed length and quite small, and you still can use SNS based message filtering, but I may be mistaken! Please correct me if I am.
We can add support to extract trace context from raw message delivery, but again the spans won't line up timing wise (because the metadata from SNS is missing – we're blind to it), and the connection would be from an SDK call to SNS -> [gap] -> SQS.
If that's something which would work for you all, I can help prioritize this effort.
IMO, thats a reasonable thing to opt into if possible? a config flag or env var we can set?
Yeah I think it's totally reasonable. I think we'll try to make it automatic and then when customers ask why SNS is missing we can point them to this discussion and explain that raw message deliver needs to be disabled to fully trace SNS. But that seems like a fine tradeoff to unblock you folks!
Expected Behavior
Tracing context injected to
MessageAttributes._datadog.BinaryValue
by dd-trace-js should be extracted.Actual Behavior
TraceContextExtractor.getTraceEventExtractor() is unable to extract tracing context from SNS-SQS Events with Raw Message Delivery
Steps to Reproduce the Problem
Root Causes
When publishing a SNS message, dd-trace-js injects the tracing context under
MessageAttributes._datadog
->BinaryValue
Problem 1
Since an event from raw message delivery is missing
Type
andTopicArn
, isSNSSQSEvent returnsfalse
and isSQSEvent returnstrue
. So it picks SQSEventTraceExtractor, which only checks stringValue.Problem 2
Manually setting
Type
andTopicArn
in the event body would force [SNSSQSEventTraceExtractor] (https://github.com/DataDog/datadog-lambda-js/blob/main/src/trace/context/extractors/sns-sqs.ts) to be picked, which supports both String and Binary data type. But the extractor is reading from MessageAttributes._datadog.Value instead ofMessageAttributes._datadog.BinaryValue
so no context is extracted.Specifications