DataDog / datadog-agent

Main repository for Datadog Agent
https://docs.datadoghq.com/
Apache License 2.0
2.84k stars 1.19k forks source link

[BUG] [Serverless] AWS lambda timeout missed final logs and APM trace when using datadog-lambda-extension #14068

Open pvicente opened 1 year ago

pvicente commented 1 year ago

Agent Environment

datadog-lambda-extension v29 sending logs to datadog.

Describe what happened: Final logs of the lambda execution are missed and the whole trace in APM too

Describe what you expected: All logs reported and trace of the whole execution in APMs

Steps to reproduce the issue: Use datadog-lambda-extension and run your lambda for more than the timeout you set. In my case 850 seconds

Additional environment details (Operating System, Cloud provider, etc): See child issue in https://github.com/DataDog/datadog-lambda-extension/issues/89 datadog-lambda-extension v29 and datadog libraries

datadog-lambda 4.63.0 The Datadog AWS Lambda Library
├── datadog >=0.41,<0.42
│   ├── decorator >=3.3.2 
│   └── requests >=2.6.0 
│       ├── certifi >=2017.4.17 
│       ├── charset-normalizer >=2,<3 
│       ├── idna >=2.5,<4 
│       └── urllib3 >=1.21.1,<1.27 
├── ddtrace >=1.4.1,<2.0.0
│   ├── attrs >=19.2.0 
│   ├── bytecode * 
│   ├── cattrs * 
│   │   ├── attrs >=20 (circular dependency aborted here)
│   │   └── exceptiongroup * 
│   ├── ddsketch >=2.0.1 
│   │   ├── protobuf >=3.0.0 
│   │   └── six * 
│   ├── envier * 
│   ├── jsonschema * 
│   │   ├── attrs >=17.4.0 (circular dependency aborted here)
│   │   └── pyrsistent >=0.14.0,<0.17.0 || >0.17.0,<0.17.1 || >0.17.1,<0.17.2 || >0.17.2 
│   ├── packaging >=17.1 
│   │   └── pyparsing >=2.0.2,<3.0.5 || >3.0.5 
│   ├── protobuf >=3 (circular dependency aborted here)
│   ├── six >=1.12.0 (circular dependency aborted here)
│   ├── tenacity >=5 
│   ├── typing-extensions * 
│   └── xmltodict >=0.12 
└── wrapt >=1.11.2,<2.0.0

Env variables

    # Datadog instrumentation
    "DD_SITE"                      : "datadoghq.com"
    "DD_API_KEY_SECRET_ARN"        :  "arn:aws:secretsmanager:${var.region}:${var.account_id[var.environment]}:secret:${var.environment}/risk-stream/${var.dd_api_key[var.environment]}"
    "DD_TRACE_ENABLED"             : true
    "DD_FLUSH_TO_LOG"              : true
    "DD_TRACE_DEBUG"               : true
    "DD_PROFILING_ENABLED"         : true
    "DD_PROFILING_IGNORE_PROFILER" : true
    "DD_PROFILING_ENABLE_CODE_PROVENANCE": true
    "DD_CAPTURE_LAMBDA_PAYLOAD"    : true
    "DD_ENV"                       : var.environment
    "DD_SERVICE"                   : var.service
    "DD_VERSION"                   : var.image_tag

Last log received in datadog:

Screenshot 2022-10-27 at 10 35 16

Missed logs in cloudwatch logs highlighted:

Screenshot 2022-10-27 at 10 18 09
inyutin commented 1 year ago

I think this is expected behavior. When lambda gets a timeout it means that all the code stops. So you can't execute anything in your function, it just stops. Without be able to progress datadog can't send logs. So that's why you can't see anything after timeout.

rdsedmundo commented 1 year ago

Couldn't the AWS Lambda Telemetry API help in this situation?

nicolas-serbin commented 7 months ago

any news on this ?