DataDog / datadog-lambda-js

The Datadog AWS Lambda Library for Node
Apache License 2.0
107 stars 35 forks source link

CustomTraceExtractor does not work as expected #500

Closed MartinLoeper closed 5 months ago

MartinLoeper commented 5 months ago

Expected Behavior

I am trying to do custom trace propagation in the following scenario: AWS Lambda -> Event Bridge -> AWS Lambda The trace of the second lambda function should have the same trace id as the first lambda and the aws.lambda span of the first lambda should be the parent of the seconds aws.lambda span.

Actual Behavior

The datadog-lambda-js picks up the _datadog attribute which is sent to the second lambda as event payload:

{"status":"debug","message":"datadog:Extracted trace context from Custom Extractor","traceContext":{"traceId":"10134166690075585874","sampleMode":2,"source":"event","parentId":"999770304986695473"}}

It also says:

{"status":"debug","message":"datadog:Added trace context to Xray metadata","trace":{"spanContext":{"_traceId":"10134166690075585874","_spanId":"999770304986695473","_parentId":null,"_isFinished":false,"_tags":{},"_sampling":{"priority":"2"},"_baggageItems":{},"_noop":null,"_trace":{"started":[],"finished":[],"tags":{}}},"source":"event"}}

{"status":"debug","message":"datadog:Attempting to find parent for the aws.lambda span"}

However, the next log says it sets another traceId:

Inject into carrier: {"x-datadog-trace-id":"7913624874872765514","x-datadog-parent-id":"7913624874872765514","x-datadog-sampling-priority":"1"}

The following logs confirms it:

Encoding payload: [{"trace_id":"63f0beb1823f8e26","span_id":"63f0beb1823f8e26","parent_id":"0000000000000000","name":"dns.lookup","resource":"0.0.0.0","error":0,"meta":{"_dd.p.tid":"65ec478500000000","_dd.p.dm":"-0","_dd.origin":"lambda","service":"regex-parser","env":"production","version":"1.0.0","runtime-id":"ccea07aa-3139-40e6-8b4e-a587a4513c66","component":"dns","span.kind":"client","dns.hostname":"0.0.0.0","dns.address":"0.0.0.0","language":"javascript"},"metrics":{"_dd.agent_psr":1,"_dd.top_level":1,"_dd.measured":1,"process_id":16,"_sampling_priority_v1":1},"start":1709983621547005400,"duration":133043213,"links":[],"service":"regex-parser"}]

Why isn't the traceId which was obtained by traceExtractor used by the dd-trace library?

I configured the extractor as follows:

export const traceExtractor = (event) => {
    const datadog = event._datadog;

    if (!datadog) {
        return {
            sampleMode: 2,
            source: "ddtrace",
        };
    }

    const tracingContext = {
        traceId: datadog["x-datadog-trace-id"],
        sampleMode: 2,
        source: "event",
        parentId: datadog["x-datadog-parent-id"],
    };

    console.log("tracingContext", tracingContext);

    return tracingContext;
};

Is event the proper source mode in my scenario? It is hard to find any documentation on this. It looks like dd-trace just ignores the tracingContext obtained like this.

Steps to Reproduce the Problem

-

Specifications

duncanista commented 5 months ago

Hey @MartinLoeper,

My first question is, it seems that #501 and #502 might be duplicates of this issue, is this correct, or am I missing something? If they are, I will close them in favor of #500 (here).

Now, on

I am trying to do custom trace propagation in the following scenario: AWS Lambda -> Event Bridge -> AWS Lambda

Is there a problem on using the out-of-the-box trace propagation? Lambda -> EventBridge -> Lambda should work without any custom extraction.

Is event the proper source mode in my scenario? It is hard to find any documentation on this. It looks like dd-trace just ignores the tracingContext obtained like this.

This shouldn't affect any of the process, so it's safe to ignore, since it's mainly for internal purposes.

Are you using a specific dd-trace version? Would it be possible for you to upgrade to the latest version 107? There has been a couple bug fixes since 101.

MartinLoeper commented 5 months ago

Apologies for opening multiple issues. There was a network error and I hit page reload twice.

Is there a problem on using the out-of-the-box trace propagation? Lambda -> EventBridge -> Lambda show work without any custom extraction.

Since we are using lambda destinations, the datadog context is not propagated automatically. Thus, I attached it to the $.detail.responsePayload manually. I guess lambda destinations are currently not supported out-of-the-box, are they?

I am using 4.29.0 as per pnpm-lock.yaml.

Thanks for looking into this @duncanista!

joeyzhao2018 commented 5 months ago

Hi @MartinLoeper, I ran into similar behavior before. And in my case, there were two tracers running and causing such overrides. To be more specific, there's already a dd-trace in the lambda layer, yet the lambda code also has dd-trace in its dependency and packaged it. So my fix was to npm remove dd-trace from my dependency. Would you mind check if it's the same issue in your case? Thank you.

MartinLoeper commented 5 months ago

Thanks for the hint @joeyzhao2018!

I had to remove the dd-tracer from my dependencies and update the datadog-lambda-js layer version to latest to make it work.

Is it possible to access the built-in tracer and to add some tags to the root span?

duncanista commented 5 months ago

Nice catch @joeyzhao2018! Now we have to figure out why that is happening, since having two tracers shouldn't be causing issues.

Is it possible to access the built-in tracer and to add some tags to the root span?

Yes, here is a docs example of how to add tags to the current span.

MartinLoeper commented 5 months ago

Now we have to figure out why that is happening, since having two tracers shouldn't be causing issues.

That's a good point! I ran into this issue in the first place because I wanted to tag the spans and the example you referenced suggests the following:

const tracer = require('dd-trace');

Does that mean I have to import a second tracing library to tag the spans?

duncanista commented 5 months ago

Does that mean I have to import a second tracing library to tag the spans?

Not at all, this layer comes packaged with a version of dd-trace already! If you don't want to use layers, you could also install both packages, yet, having the layer should be enough, since those packages will be resolved in your AWS Lambda.

MartinLoeper commented 5 months ago

Not at all, this layer comes packaged with a version of dd-trace already! If you don't want to use layers, you could also install both packages, yet, having the layer should be enough, since those packages will be resolved in your AWS Lambda.

I see. Since dd-trace is shipped with the lambda layer, it can be located by the javascript runtime. The reason I included it into our project is that we are using the AWS-CDK with Typescript. For the esbuild bundling to work, any transitive dependency must be present in the package.json. Putting it into the package.json installs another version of dd-tace into the deployment package (aka zip file) which shadows the dd-trace version shipped by the layer. I told esbuild to treat dd-trace as externalModule and thus not bundle it into the zip file. Now everything works as expected. Looks like it is not easy at all to get all of this right using aws-cdk.

duncanista commented 5 months ago

I'm sorry this is inconvenient to work with @MartinLoeper, I'll will look forward to fix this issue.

Since you are using esbuild, I'd overall recommend marking it as an external module as you did! Here are some docs about it: https://docs.datadoghq.com/serverless/guide/serverless_tracing_and_bundlers/ (although we don't have an example for the aws-cdk listed here yet).

Sometimes, native modules might need to be marked as external, you can read more in the docs of dd-trace.

astuyve commented 5 months ago

Hi @MartinLoeper - chiming in here as we've also noticed some issues using CDK.

CDK does not honor the ESBuild API design which supports ESBuild plugins; a feature supported and designed by ESBuild for this purpose and something we rely on for instrumentation to function, as noted above.

There are multiple issues documenting this in both projects: https://github.com/evanw/esbuild/issues/884 https://github.com/aws/aws-cdk/issues/18470

Right now we recommend marking dependencies as external, or following the guide Jordan linked above and packaging your application manually with esbuild, then passing that artifact to CDK.

But as this is no longer an issue with this library and we've diverged significantly from the original topic, I'm going to close this issue.

Feel free to open a new issue or reach out to support directly with any questions.

Thanks again!

MartinLoeper commented 5 months ago

Thanks for the clarification guys. Happy this works now!