DataDog / datadog-lambda-go

The Datadog AWS Lambda package for Go
Apache License 2.0
59 stars 40 forks source link

Duplicated traces when API Gateway AWS_PROXY enabled and using XRAY #175

Open xrn opened 3 months ago

xrn commented 3 months ago

Hey team,

My project is using

github.com/DataDog/datadog-lambda-go v1.17.0
gopkg.in/DataDog/dd-trace-go.v1 v1.64.1
https://github.com/DataDog/datadog-lambda-extension/releases/tag/v58

Lambda config

DD_API_KEY ....
DD_APM_DD_URL http://xxx.xxx.xxx.xxx:3835
DD_CAPTURE_LAMBDA_PAYLOAD true
DD_ENV PROD
DD_LOGS_CONFIG_LOGS_DD_URL http://xxx.xxx.xxx.xxx:3835
DD_LOGS_CONFIG_LOGS_NO_SSL true
DD_LOGS_CONFIG_USE_COMPRESSION true
DD_LOGS_CONFIG_USE_HTTP true
DD_LOGS_ENABLED true
DD_MERGE_XRAY_TRACES true
DD_SITE datadoghq.eu
DD_TRACE_ENABLED true
DD_TRACE_STARTUP_LOGS false
DD_UNIVERSAL_INSTRUMENTATION true
DD_URL http://xxx.xxx.xxx.xxx:3835

We are using API Gateway and Lambdas, And I found an incorrect behavior from end user perspective. When API Gateway integrates with Lambda using AWS intregration type all work well - I have nice tree of connected spans all is beautiful for over 50 endpoints which we have. Worth to note that API is represented as https://domain.com and properties like

{
   "api":{
      "endpoint":{
         "method":"GET",
         "registered":"true",
         "route":"/v0/.....",
         "route_id":"/v0/...."
      }
   },
   "api_gateway":{
      "account_id":"...",
      "request_id":"....",
      "rest_api_id":".....",
      "stage":"v0"
   },
   "aws_api_id":"....",
   "aws_api_stage":"v0",
   "aws_base_path":"v0",
   "aws_domain":"....",

But there are 2 endpoint which are using AWS_PROXY integration type as they are returning binary files and such integration works better for us. And now traces are not so perfect - we are getting new service domain.com instead of https://domain.com with completly different set of properties which do not includes "api" json object. In some cases we see only connected that span + lambda in other we see that + full trace like in regular execution combined together.

I am not sure if this is right repo, or maybe I should report this at Agent repo - but sth is wrong here - switch to AWS_PROXY integration type from AWS should not have that impact.

At the end this leads to:

  1. In traces I have 2 services https://domian/com and domain.com
  2. In API Catalog I have endpoint without assigned API - which is also incorrect and is happening only for AWS_PROXY endpoints
purple4reina commented 3 months ago

Hey @xrn, This is definitely the right repo for your question!

I agree that this is not a good user experience. I'll take a deeper look into our code to see if we can improve this.

purple4reina commented 3 months ago

Hi again @xrn, so I think I have an idea of what is going on and have some ideas on how to proceed.

I think the reason you're seeing two different sets of tags is due to the way we determine the lambda event type. Using the inbound payload, we look at the structure of the json and given a series of expectations, we determine what type of event it is.

My suspicion is that your AWS_PROXY functions are not being recognized in the same way as your other functions. You can confirm this by looking at the different inbound payloads which are saved on the aws.lambda span.

As far as fixes go, I do not have enough information about your setup to give a full recommendation. However, I would suggest taking a look at DD_APM_REPLACE_TAGS which might help you remap some of your tags.

If that doesn't work for you, I suggest opening a support ticket so we can take a closer look at your use case.

xrn commented 3 months ago

Hey @purple4reina thank you for your response. There is a ticket for that case - 1703458 hope that you would be access it, please let me know if this is a case.

My suspicion is that your AWS_PROXY functions are not being recognized in the same way as your other functions. You can confirm this by looking at the different inbound payloads which are saved on the aws.lambda span.

Is this something what you eventually would like to solve at libraries level?

xrn commented 2 months ago

I want to share one my additional observation. When lambda is connected to API Gateway and xray is enabled - in Lambda -> Monitor at bottom we can find section of Traces - for both methods AWS and AWS_PROXY console is properly displaying in URL Address column https://{domain}/{stage}/{path} - so this is another evidence for me that this should be aligned, and I think you should be easily able to reproduce it

xrn commented 1 week ago

@hghotra @purple4reina would like to check with you if there is a chance that above would be solved in predictable future.