DataDog / serverless-plugin-datadog

Serverless plugin to automagically instrument your Lambda functions with Datadog
Apache License 2.0
96 stars 49 forks source link

Enabling DD_LOGS_INJECTION when Using Lambda Extension (addExtension: true) #519

Closed melalonso closed 2 months ago

melalonso commented 2 months ago

Description

We are currently using the serverless-plugin-datadog in our projects, and we noticed an issue with log correlation when the Lambda extension is enabled (addExtension: true). Specifically, the trace_id is not being injected into our logs when the extension is used.

We understand that when addExtension: true is set, the Datadog Lambda extension disables DD_LOGS_INJECTION by default. However, this creates a problem for us because we rely on having the trace_id present in our logs for effective trace-log correlation in Datadog.

Issue:

When the Lambda extension is enabled, the DD_LOGS_INJECTION setting is disabled, and the logs no longer include the trace_id field. This limits our ability to correlate traces with logs, as we are unable to search for logs by trace_id in Datadog. We are aware that we can use the "Lambda request ID" for log searching and tracing purposes. However, we want the additional flexibility to filter logs by trace_id directly, as this provides more consistent trace-log correlation within Datadog, especially across services.

Steps to Reproduce the Problem

  1. Enable the Lambda extension in serverless.yml:
    custom:
    datadog:
    addExtension: true
    logLevel: "info"
  2. Attempt to search logs in Datadog for the trace_id field.
  3. Observe that the logs do not include trace_id, making trace-log correlation difficult.

Question

Is there a way to force DD_LOGS_INJECTION=true while still using the Lambda extension, or is there another recommended approach to ensure the trace_id is included in logs while keeping the Lambda extension enabled?

Are there any upcoming features or updates planned for the plugin that might address this issue?

Current Workarounds

We have considered the following workarounds, but they come with trade-offs:

  1. Disabling the Lambda extension (addExtension: false), which allows DD_LOGS_INJECTION to function as expected. However, we prefer to keep the extension for more efficient log handling.
  2. Manually injecting the trace_id into logs using custom logging logic in our application, but this adds extra complexity.

Specifications

We appreciate any insights or guidance you can provide on how to resolve this while maintaining the Lambda extension.

purple4reina commented 2 months ago

You can always set the environment variable yourself, as

provider:
  name: aws
  region: us-east-1
  environment:
    DD_LOGS_INJECTION: true

Also, try updating your plugin version. You're on 5.1.1 and the most recent is 5.70.0

purple4reina commented 2 months ago

Closing as this is not an issue with the most recent version of the plugin.

melalonso commented 2 months ago

@purple4reina unfortunately it didn't work for me. I tried to set it up as you indicated

provider:
  name: aws
  region: us-east-1
  environment:
    DD_LOGS_INJECTION: true

and even directly in the lambda function, but when it gets deployed the final value for DD_LOGS_INJECTION is false. It gets overwriten and it ends being false.

I updated the plugin version to 5.70.0 and I also made sure I'm using the latest extension versions:

arn:aws:lambda:us-west-2:464622532012:layer:Datadog-Extension:64
arn:aws:lambda:us-west-2:464622532012:layer:Datadog-Node18-x:115

Any other insights or suggestions you can provide on how to resolve this while maintaining the Lambda extension?

brentmitchell25 commented 2 weeks ago

This still seems to be an issue. Currently, we are trying to send messagesa through queueing systems asynchronously and extracting the trace information from them in each consuming lambda function. The trace appears to show correctly in the flame graph, however logs are not associated with the trace extraction but rather the invoking lambda. Manually setting DD_LOGS_INJECTION=true in the console and testing it does connect logs and traces correctly through the whole system.