DataDog / datadog-lambda-extension

Datadog Lambda Extension
Apache License 2.0
76 stars 4 forks source link

Missing lambda spans when called from distributed map in step functions #268

Open adatob opened 3 months ago

adatob commented 3 months ago

Hi,

I'm struggling with the problem of missing spans in DataDog when calling a lambda function from a step function where distributed maps are in use. Considering the AWS developer guide there is an information that Step Functions doesn't support X-Ray tracing for the child workflow executions started by a Distributed Map state. I am still able to re-enable X-Ray tracing by creating a new segment inside the function, however such traces are not pulled by the DD - X-Ray integration. Moreover the child spans created with the dd-trace tracer are not visible anymore. Only spans created by the extension layer are still visible in DD: one or two depending on the cold start.

With the manual lambda invocation the spans created and all associated X-Ray traces are visible id DD.

I wonder if there is any alternative solution that:

  1. Make the traces created by the new X-Ray segment visible to DD.
  2. will allow you to create new dd-trace spans and associate them with a given lambda call by SF

current environment details:

I will be grateful for the community's support on this topic,

Adam.

agocs commented 3 months ago

Hi Adam, Unforuntately, Datadog also can't trace across a distributed map state. The distributed map state doesn't pass enough context to the lambda function to be able to reconstruct a trace ID. Distributed map states also result in, theoretically, very wide traces, which presents a problem for both X-Ray and our APM system. Both of these are problems we are working to solve, but I suspect it won't be quick.

If you send an email to support@datadoghq.com, you can ask them to add you to the feature request for span linking across distributed map states.

Sorry and all the best! -Chris

adatob commented 3 months ago

Thanks for the quick response.

I fully accept the argument related to the properties of distributed maps. However, do you see any way to configure the X-Ray integration to be able to pull the manually created (besides the active tracing) segments, being aware that they may be not correlated with traces created by the Datadog extension or spans created within the step function?

I'm wondering if the lambda traces might be completely independent from step function trace, like for the manual function invocation. In my case this will be even better than nothing.

Best regards, Adam

agocs commented 3 months ago

Hi @adatob , do you mean something like this?

image

It seems like it should just work. You could use our X-Ray integration or instrument the Lambda functions using datadog instrumentation, whatever you want.

nine5two7 commented 3 months ago

Hello. We are currently working with AWS on supporting Distributed MapStates. There have been a couple of sync meetings between AWS and us recently. We need their help to inject some useful data into execution logs so that span linking can be possible.

For the XRay solution, AWS engineers previously told us that it cannot support Distributed MapStates either. XRay is facing the same problem as we are, and it may also have more concerns about the scale of the fan-out.