aws / aws-xray-sdk-node

The official AWS X-Ray SDK for Node.js.
Apache License 2.0
272 stars 155 forks source link

Nodejs10.x Error: Missing AWS Lambda trace data for X-Ray. #143

Closed EGrun closed 3 years ago

EGrun commented 5 years ago

After updating to the 'nodejs10.x' runtime environment, I am now seeing this error message at the start of the cloudwatch logs:

2019-06-14T21:24:54.713Z    undefined   ERROR   Error: Missing AWS Lambda trace data for X-Ray. Expected _X_AMZN_TRACE_ID to be set.
at Object.contextMissingLogError [as contextMissing] (/var/task/handler.js:37802:19)
at Segment.resolveLambdaTraceData (/var/task/handler.js:167833:43)
at Object.getSegment (/var/task/handler.js:37870:17)
at Object.resolveSegment (/var/task/handler.js:37849:19)
at captureOutgoingHTTPs (/var/task/handler.js:94553:31)
at Object.captureHTTPsRequest [as request] (/var/task/handler.js:94638:12)
at Promise (/var/runtime/RAPIDClient.js:91:31)
at new WrappedPromise (/var/task/handler.js:161080:18)
at RAPIDClient.nextInvocation (/var/runtime/RAPIDClient.js:90:12)
at Runtime.handleOnce (/var/runtime/Runtime.js:52:51)

We are using, AWSXRay.captureHTTPsGlobal(require('http')); to capture all http requests and this is the source of the error. It is also capturing all calls to 127.0.0.1 as part of the runtime event loop.

A fix was to move the captureHTTPsGlobal command inside of the lambda function function handler. Before, it was running from the global module level. We are still getting x-ray results for calls to localhost for the lambda runtime environment. Is there a way to ignore requests to 127.0.0.1?

awssandra commented 5 years ago

Hi EGrun,

Lambda does not have the X-Ray header until the function handler is run. You should be able to put AWSXRay.captureHTTPsGlobal(require('http')) outside your handler, but using the http client will have to be in the function handler code.

https://docs.aws.amazon.com/lambda/latest/dg/downstream-tracing.html

With captureHTTPsGlobal, the http library itself is patched to capture on every http client initialized. There is not a concept to opt out matching request URLs today. However, you can use captureHTTPs, to enable tracing on a single http client instance, where ever you need the call captured.

Hope this helps, Sandra

Imran99 commented 5 years ago

Hi Guys,

I'm having the same issue but I noticed we have closed this one and #701 is only a partial fix.

You should be able to put AWSXRay.captureHTTPsGlobal(require('http')) outside your handler, but using the http client will have to be in the function handler code.

This works in node 8.10 but not node 10.x. The exact same code for node 10.x is giving me Error: Missing AWS Lambda trace data for X-Ray. Expected _X_AMZN_TRACE_ID to be set.

Im using the captureHTTPglobal code from the docs and im only using the http client inside the handler.

Can we reopen this as its preventing us from upgrading to node10.

hotgazpacho commented 5 years ago

By any chance, are you enabling http keep-alives for the AWS SDK, per https://theburningmonk.com/2019/02/lambda-optimization-tip-enable-http-keep-alive/

We're doing this in a single file that we include for every handler, and we see this issue on cold starts.

Imran99 commented 5 years ago

@hotgazpacho yup we are, so I guess thats partly the issue but for some reason it did not cause a problem on node 8.

hotgazpacho commented 5 years ago

@Imran99 trying to maybe help diagnose here... In what order do you set up:

My thinking here is that the order that these things happen is important.

Imran99 commented 5 years ago

Thanks @hotgazpacho we have two internal npm packages that do it in this order:

normal aws sdk tracing package:

and another http client with tracing package that does:

hotgazpacho commented 5 years ago

So, I initialize thusly:

const AWSXRay = require('aws-xray-sdk-core');
const AWS = AWSXRay.captureAWS(require('aws-sdk'));
AWSXRay.captureHTTPsGlobal(require('https'));

function getKeepAliveAgent() {
  const options = {
    keepAlive: true,
    maxSockets: 50, // from aws-sdk
    rejectUnauthorized: true, // from aws-sdk
  };
  const agent = new https.Agent(options);
  agent.setMaxListeners(0); // from aws-sdk
  return agent;
}

AWS.config.update({
    httpOptions: {
      agent: getKeepAliveAgent(),
    },
  });

Where I'm seeing this error is during cold starts of my Lambda function, wherein I use SecretsManager outside of the handler in the global scope to get the secrets config once and keep it around for the lifetime of the function instance. This global initialization is a pretty common scenario in Lambda.

Shahab96 commented 5 years ago

Are there any updates on this issue? We are experiencing this as well and with Node 8 being EoL at year end this would become a problem

willarmiros commented 5 years ago

We are actively looking into this issue and discussing it with the Lambda team. Unfortunately this change would be a fairly substantial refactor for them. In the meantime, we recommend using the non-global HTTP capture API.

If Lambda is unable to complete the change, we will consider adding some kind of string matching to not trace these requests in the SDK. The issue with this is 1) It is an issue with Lambda so that feature would need to be replicated across all X-Ray SDKs and 2) special-case not-capturing with string matching is not exactly a best practice, so we'd like to avoid it if possible. Hope this clarifies some confusion about this issue.

tqhoughton commented 5 years ago

So basically we have to choose between being able to update Lambda functions or being able to use X-Ray.

I think it's obvious what most organizations are going to choose. Can you ask the Lambda team if they can extend the deadline to upgrade Lambda functions to Node 10.x until this is resolved?

willarmiros commented 5 years ago

Lambda has told us a potential solution is underway. I will update this thread once it is released. If the solution does not work, our team will add the filtering logic to ignore these internal calls before Node 8 EOL is reached.

willarmiros commented 4 years ago

Lambda has released what should be a fix for this problem globally! Please follow up if this issue persists, otherwise I'll close this issue.

kwongkz commented 4 years ago

Now actually can directly use env - AWS_NODEJS_CONNECTION_REUSE_ENABLED to 1, no longer need these couple of lines of code New way to keepalive

Imran99 commented 4 years ago

I've noticed that the errors have disappeared but I'm not seeing http calls in traces at all anymore, could be doing something wrong on our end haven't had a chance to investigate yet. Anyone else seeing this?

willarmiros commented 4 years ago

Hi @Imran99, If you are still not seeing HTTP traces please follow up in a separate issue. I'm going to close this issue since no one else seems to be experiencing it anymore.

chearmstrong commented 4 years ago

@willarmiros I'm getting Error: Missing AWS Lambda trace data for X-Ray. Expected _X_AMZN_TRACE_ID to be set. when upgrading to version 3.0.0-alpha.1.

I'm using Node v12, and I'm using HTTP keep-alive.

I didn't get it with version 2.5.0-experimental.1 btw.

willarmiros commented 4 years ago

Hi @chearmstrong, I'm not able to reproduce this error on a Lambda running node 12 and the X-Ray SDK 3.0.0-alpha.1. I initialized my HTTPS client in the same way that @hotgazpacho did and made a request with it and everything ran & was instrumented as expected.

Can you please post a snippet that reproduces this issue?

chearmstrong commented 4 years ago

Hi @willarmiros - my code is something like this.

const __AWS__ = require('aws-sdk')
const AWSXRay = require('aws-xray-sdk')
// ...require some other stuff

const AWS = AWSXRay.captureAWS(__AWS__)

// ...setup xray
AWSXRay.captureHTTPsGlobal(require('http'))
AWSXRay.captureHTTPsGlobal(require('https'))
AWSXRay.capturePromise()

// ... other code/helper functions

// ...lambda handler
const handler = async (event, context = {}) => {
  return AWSXRay.captureAsyncFunc('handler', subSegment => {
    subSegment.addMetadata('body', event.body || {}, 'request')
    subSegment.addMetadata('headers', event.headers || {}, 'request')
    subSegment.addAnnotation('userId', getUserId(event))

    return handleEvent(event)
      .then(response => {
        subSegment.addMetadata('body', response, 'response')
        subSegment.close()

        return getSuccessResponse(response)
      })
      .catch(error => {
        subSegment.addAnnotation('error', error.message || 'UNKNOWN')
        subSegment.close(error)

        return getErrorResponse(error)
      })
  })
}

Worth nothing - I'm only seeing it on cold starts too.

willarmiros commented 4 years ago

Hi @chearmstrong,

I resolved the issue by moving AWSXRay.capturePromise() into the function handler, immediately before your return statement. Feel free to reopen if you still have issues.

chearmstrong commented 4 years ago

@willarmiros Sorry for the delay - I will give this a try, thanks.

awilson28 commented 4 years ago

@willarmiros I'm experiencing the same issue - why is it necessary to move AWSXRay.capturePromise()?

cluedtke commented 4 years ago

This remains an issue for us, even after moving AWSXRay.captureHTTPsGlobal(require('https')); into the handler.

Versions of aws-xray-sdk-core tried:

Before -- we see "Missing AWS Lambda trace data for X-Ray"

const { ApolloServer } = require("apollo-server-lambda");
const xray = require('aws-xray-sdk-core');
xray.captureHTTPsGlobal(require('https'), false);

// ... other stuff

const server = new ApolloServer({ /** config **/ })
exports.handler = server.createHandler();

After -- we still see "Missing AWS Lambda trace data for X-Ray"

const { ApolloServer } = require("apollo-server-lambda");
const xray = require('aws-xray-sdk-core');

// ... other stuff

const server = new ApolloServer({ /** config **/ })
const createHandler = function() {
  const handler = server.createHandler();
  xray.captureHTTPsGlobal(require('https'), false);
  return handler;
}

exports.handler = createHandler();
chearmstrong commented 4 years ago

I resolved the issue by moving AWSXRay.capturePromise() into the function handler, immediately before your return statement. Feel free to reopen if you still have issues.

This worked for me, but only with v3. Thanks.

aripalo commented 4 years ago

I am using node-fetch with X-Ray SDK and could not get it working initially.

Finally https://github.com/aws/aws-xray-sdk-node/issues/143#issuecomment-595580064 solved it, but that seems a bit "annoying trick" 😅 Moving this kind of "initialization code" in the end of handler just feels a bit wrong to me.

aripalo commented 4 years ago

Well… since I'm using Middy I refactored this into middleware so I don't feel so dirty 🙂

In case anyone's interested:

// middlewares/instrument.ts
import { Handler } from 'aws-lambda'
import { NextFunction } from "middy";
const AWSXRay = require('aws-xray-sdk');

const instrumentationMiddleware = {
  after: (handler: Handler, next: NextFunction) => {
    AWSXRay.capturePromise();
    next();
  },
}

export default instrumentationMiddleware;
// index.ts
import { APIGatewayEvent, Context, ProxyResult } from 'aws-lambda';
const wrap = require('@dazn/lambda-powertools-pattern-basic');
import instrumentationMiddleware from "./middlewares/instrument";

export const handler = wrap(async (event: APIGatewayEvent, context: Context): Promise<ProxyResult> => {
  /* ... */
  return response;
})
.use(instrumentationMiddleware);
padzikm commented 3 years ago

Hi, can you explain why capturePromise() should be inside lambda handler or otherwise it throws 'Missing AWS Lambda trace data for X-Ray. Expected _X_AMZN_TRACE_ID to be set' error (only during lambda's cold start)? In aws doc (https://docs.aws.amazon.com/xray/latest/devguide/xray-sdk-nodejs-httpclients.html) there is no mention of need for capturePromise invocation, but in aws-xray for node readme (https://github.com/aws/aws-xray-sdk-node/tree/master/packages/core#capture-all-outgoing-axios-requests) there is example for capturing axios with capturePromise invoked on the top of a module, which is opposite to this issue's solving. According to this issue's solution capturePromise should be called at the end of a handler, but capturePromise patches promise's then method, so in my opinion it should be called right at the beginning of a handler to capture all requests inside handler - why you recommend it to be called at the end? It's a bit confusing, so can you please explain what is correct way of capturing http(s) requests and why?

willarmiros commented 3 years ago

Hi @padzikm,

This issue has gotten fairly cluttered with tangentially related problems that we attempted to fix, but the original issue that you're experiencing turns out to still be unresolved, as pointed out in #315. We have investigated with the Lambda team and we've found the issue to be a bug in the SDK. I will be working on getting a fix out and released as soon as possible. I will post further updates in #315.

DavidTanner commented 3 years ago

The problem with calling things like capturePromises is that it is adding another function to the .then. Depending on how many calls a lambda gets, it could end up adding a lot of functions to the Promise prototype slowing things down.

function patchPromise(Promise) {
  var then = Promise.prototype.then;
  Promise.prototype.then = function(onFulfilled, onRejected) {
    if (contextUtils.isAutomaticMode()
      && tryGetCurrentSegment()
    ) {
      var ns = contextUtils.getNamespace();

      onFulfilled = onFulfilled && ns.bind(onFulfilled);
      onRejected = onRejected && ns.bind(onRejected);
    }

    return then.call(this, onFulfilled, onRejected);
  };
}

https://github.com/aws/aws-xray-sdk-node/blob/aws-xray-sdk%403.1.0/packages/core/lib/patchers/promise_p.js#L12-L26

frosas commented 3 years ago

(commenting here as I'm not sure this capturePromise() issue is related to #315)

As others already stated, it doesn't look correct to have to call capturePromise() within the handler in order to work around that function triggering the misleading "context missing" message when a Promise then() is called outside the handler.

Would it be possible to not have this error message logged under this scenario instead?

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs in next 7 days. Thank you for your contributions.

shellscape commented 3 years ago

Not stale, still relevant.

Dgarc359 commented 2 years ago

Experiencing the same issue with a middy implementation using AWS Xray. If I'm following my cloudwatch logs correctly, it seems that the missing _X_AMZN_TRACE_ID is only logged as missing during a cold start invocation of my lambda. The curious thing is, is that the error doesn't have any actual discernible impact on the logic we're using (Which is: to create an xray trace url from the _X_AMZN_TRACE_ID in the lambda's env vars). The url is correctly created, so I believe that the error is coming about as a side effect of some sort when calling .captureAWS() (this is just a guess, I'm in the process of debugging but I have no real clue where the issue is actually coming from)

If it makes any difference at all, the project which builds the dependencies for the mono repo is using lerna and aws cdk. This issue hasn't popped up in implementations using middy on lambdas bundled using serverless. Currently, I have only seen the issue crop up in lambda's created using nodejs construct from aws-lambda-nodejs. The only other notable difference I have found is that these assets are being bundled with esbuild as opposed to the native bundling used for assets in the aws-lambda module.

All this is to say, this issue is still persisting, but it's not having an impact on the business logic (that I can find). I would love to get to the bottom of /why/ it's happening, but since it's not seemingly having a noticable effect, I'm left wondering if there is a simple way to silence the log for the missing trace_id.

willarmiros commented 2 years ago

so I believe that the error is coming about as a side effect of some sort when calling .captureAWS()

@Dgarc359 Do you use any AWS SDK clients to make requests in your initialization (cold start) code? If so, that is the reason you're receiving the error. captureAWS is trying to instrument those requests, but the _X_AMZN_TRACE_ID env var is not set for cold start code.

Dgarc359 commented 2 years ago

@willarmiros Here is the full stack trace of the error

2021-11-22T18:54:21.378Z    undefined   ERROR   Error: Missing AWS Lambda trace data for X-Ray. Expected _X_AMZN_TRACE_ID to be set.
    at Object.contextMissingLogError [as contextMissing] (/opt/nodejs/node_modules/aws-xray-sdk-core/lib/context_utils.js:26:19)
    at Segment.resolveLambdaTraceData (/opt/nodejs/node_modules/aws-xray-sdk-core/lib/env/aws_lambda.js:93:43)
    at Object.getSegment (/opt/nodejs/node_modules/aws-xray-sdk-core/lib/context_utils.js:94:17)
    at Object.resolveSegment (/opt/nodejs/node_modules/aws-xray-sdk-core/lib/context_utils.js:73:19)
    at features.constructor.captureAWSRequest [as customRequestHandler] (/opt/nodejs/node_modules/aws-xray-sdk-core/lib/patchers/aws_p.js:66:29)
    at features.constructor.addAllRequestListeners (/opt/nodejs/node_modules/aws-sdk/lib/service.js:279:34)
    at features.constructor.makeRequest (/opt/nodejs/node_modules/aws-sdk/lib/service.js:203:10)
    at features.constructor.svc.<computed> [as describeTable] (/opt/nodejs/node_modules/aws-sdk/lib/service.js:685:23)
    at main (/node_modules/dynamoose/lib/aws/ddb/internal.ts:30:29)
    at func (/node_modules/dynamoose/lib/Model/index.ts:102:60)
    at getTableDetails (/node_modules/dynamoose/lib/Model/index.ts:111:10)
    at check (/node_modules/dynamoose/lib/Model/index.ts:172:33)
    at /node_modules/dynamoose/lib/Model/index.ts:190:3
    at new WrappedPromise (/opt/nodejs/node_modules/async-listener/es6-wrapped-promise.js:13:18)
    at /node_modules/dynamoose/lib/Model/index.ts:167:30
    at /node_modules/dynamoose/lib/Model/index.ts:344:41

I believe that what's happening here is that dynamoose is being used to call dynamoDB, and I believe our Middy implementation is adding the .captureAWSRequest [as customRequestHandler] part of that stack. It seems there is some interaction between our middy code, xray, dynamoose, and cold starts that hasn't been accounted for. I'm guessing your suggestion would be to define some behavior in which the _X_AMZN_TRACE_ID isn't attempted to be accessed during cold start?

Dgarc359 commented 2 years ago

I resolved my issue. I'm not going to say I 100% understand the underlying interactions that were causing the error to be thrown up, but from my understanding, it seems like Dynamoose during cold start code would try to make a connection to a DynamoDB instance, and our middy implementation was attempting to capture that request and pull the trace id. Like you said, this would cause an issue due to the env var not being there during cold starts.

The issue was resolved when I declared our middy code as an external module in the bundling prop of the node JS lambda construct. I was originally modifying esbuild to declare our middy code as external, which, to me at least, does the same thing, but it seems like there's something happening under the hood when you declare an external module in the aws-cdk construct as opposed to the actual esbuild node module that was resolved when declaring the module in the construct itself. Can't say more without having a more holistic understanding of the underlying code though