DataDog / dd-trace-dotnet

.NET Client Library for Datadog APM
https://docs.datadoghq.com/tracing/
Apache License 2.0
416 stars 132 forks source link

AWS SQS: message size limit exceeded after adding attributes #4499

Open johnnycbryant opened 11 months ago

johnnycbryant commented 11 months ago

We encountered a bug where datadog was appending trace information to SQS message attributes after we checked message size causing an error when message sizes were close to the 256 KiB max for SQS. In this scenario the addition of the trace attribute pushes the total message size above SQS limits and throws an error.

Proposal: The code here should check for not only the 10 attribute limit, but the total message size as well before appending the attribute. Also if the attribute isn't appended a warning should be emitted so that it is clear why propagation isn't happening.

I was considering opening a PR on this, but given at the point it the code where it seems most natural to do the check we only have the message attributes and not the full message body on the interface being passed around I figured I'd open an issue and see what others thought.

https://github.com/DataDog/dd-trace-dotnet/blob/8bf7761d3905dfd95a8b6fb7eac280bb46e80f0e/tracer/src/Datadog.Trace/ClrProfiler/AutoInstrumentation/AWS/SQS/ContextPropagation.cs#L74

johnnycbryant commented 11 months ago

SQS message size limits are listed here. It's worth calling out this is an issue regardless of whether the Extended Client Library is used as that makes a decision whether to use S3 or not before where this injection happens. https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/quotas-messages.html

lucaspimentel commented 11 months ago

Thanks for reporting this, @johnnycbryant. The team will take a look and report back here soon.

lucaspimentel commented 11 months ago

@johnnycbryant: if this issue is preventing you from enabling .NET tracer, as a temporary workaround you can disable AWS SQS instrumentation with this tracer configuration: DD_TRACE_AwsSqs_ENABLED=false.

See Configuring the .NET Core Tracing Library for more details.

johnnycbryant commented 11 months ago

@lucaspimentel Thanks. We were already checking the size of messages and using the extended client so as a workaround we simply buffered the max size before sending so we left room on messages for DD attribute if they were close in size. We'd still do this in the event you add logic to not add if the size is too big to avoid missing any messages getting DD attributes. It would be useful to have a warning in logs or traces if the attribute isn't added due to the 10 attr limit or the size limit to debug missed context propagation.