getsentry / sentry

Developer-first error tracking and performance monitoring
https://sentry.io
Other
38.83k stars 4.16k forks source link

Automatic Distributed Traces for SNS/SQS #69712

Open cstavitsky opened 5 months ago

cstavitsky commented 5 months ago

Problem Statement

Right now, Sentry supports automatic distributed tracing for services communicating via HTTP requests.

For services that communicate via SQS/SNS, distributed tracing requires manual custom instrumentation.

In a case where a customer has 1000 lambdas, instrumenting distributed tracing on each lambda isn't feasible. I'm not sure how technically difficult this would be, but it would be awesome to have a solution that works automatically for SNS/SQS.

Solution Brainstorm

Automatically instrumented tracing between services communicating via SQS/SNS

Product Area

Performance

getsantry[bot] commented 5 months ago

Assigning to @getsentry/support for routing ⏲️

getsantry[bot] commented 5 months ago

Routing to @getsentry/product-owners-performance for triage ⏲️

gggritso commented 5 months ago

@cstavitsky thanks for raising this! It sounds like we need to propagate Sentry's trace IDs through SNS/SQS payloads. Is that right? If yes, this sounds like it would be addressed by SDK improvements. If that's correct, I think raising this with the SDK team would be a good start. If I'm misunderstanding, could you clarify the use-case?

cstavitsky commented 4 months ago

@gggritso Thanks for taking a look 👍

It sounds like we need to propagate Sentry's trace IDs through SNS/SQS payloads. Is that right?

yep, that's correct

If yes, this sounds like it would be addressed by SDK improvements. If that's correct, I think raising this with the SDK team would be a good start.

Agreed that this would be addressed by SDK improvements, and sounds good on raising this with the SDK team. However, I'm not clear on the best path to do that-- are you suggesting I:

AbhiPrasad commented 4 months ago

Are they only looking to distribute the trace? Or do they want spans/transactions that represent the time that SNS/SQS is taking?

How do they expect SNS/SQS propagate the trace further from there?

For Node.js v8, you can use AWS SDK OTEL instrumentation to add spans/tracing for AWS client side libraries, but that only tracks the client-side operations.

import { AwsInstrumentation } from '@opentelemetry/instrumentation-aws-sdk';
import { addOpenTelemetryInstrumentation } from '@sentry/node';

// after calling Sentry.init
addOpenTelemetryInstrumentation(new AwsInstrumentation());

AwsInstrumentation is automatically added if you use @sentry/aws-serverless SDK in available in v8 JS SDKs.