Closed blimmer closed 2 years ago
If you run @blimmer's example code and add the following to both Lambda functions:
console.log(`Trace ID: ${process.env["_X_AMZN_TRACE_ID"]}`)
Then we can see that:
Root=1-61a7045c-079046695b7fe0fb189338c4;Parent=6464c3a27a7a348e;Sampled=1
Root=1-61a7045e-fbbd717121ef450f63b65376;Parent=762ce5a82bd1b4de;Sampled=0
The Sampled=0
the 2nd Lambda's _X_AMZN_TRACE_ID
appears to be the culprit as afaiu that effectively disables x-ray tracing. Furthermore, it seems that for a given call chain the trace id that shows up in the second Lambda (async invoke) does not share the same root segment id (Root=xxx
) as the 1st Lambda.
Confirmed the root cause. If you:
Sampled=1
in its trace id + instrumented downstream calls with captureAWS
.Then when the async Lambda runs it gets a trace id with Sampled=0
which disables pushing tracing data to X-Ray. Here's the output of the async Lambda's logs with AWS_XRAY_DEBUG_MODE
set to true
.
2021-12-01T15:24:04.855Z undefined DEBUG Starting the AWS X-Ray SDK in automatic mode (default).
2021-12-01T15:24:04.856Z undefined DEBUG AWS_XRAY_CONTEXT_MISSING is set. Configured context missing strategy to LOG_ERROR.
2021-12-01T15:24:04.862Z undefined DEBUG Subsegment streaming threshold set to: 0
2021-12-01T15:24:04.862Z undefined INFO Disabling centralized sampling in Lambda environment.
START RequestId: 50c21b8c-a136-42cb-8c7b-a7303091374d Version: $LATEST
2021-12-01T15:24:05.137Z 50c21b8c-a136-42cb-8c7b-a7303091374d INFO This is the triggered lambda
2021-12-01T15:24:05.148Z 50c21b8c-a136-42cb-8c7b-a7303091374d INFO { "invokingFunctionRequestId": "a554f4f3-fa0e-4f97-b467-80de44fe8cb5" }
2021-12-01T15:24:05.148Z 50c21b8c-a136-42cb-8c7b-a7303091374d INFO Trace ID: Root=1-61a79394-d838411846ec1c5f5a43937c;Parent=27ef63f91ef7bfef;Sampled=0
2021-12-01T15:24:05.148Z 50c21b8c-a136-42cb-8c7b-a7303091374d INFO Listing buckets to generate a segment...
2021-12-01T15:24:05.291Z 50c21b8c-a136-42cb-8c7b-a7303091374d DEBUG Lambda trace data found: Root=1-61a79394-d838411846ec1c5f5a43937c;Parent=27ef63f91ef7bfef;Sampled=0
2021-12-01T15:24:05.329Z 50c21b8c-a136-42cb-8c7b-a7303091374d DEBUG Segment started: { "root": "1-61a79394-d838411846ec1c5f5a43937c", "parent": "27ef63f91ef7bfef", "sampled": "0" }
2021-12-01T15:24:06.090Z 50c21b8c-a136-42cb-8c7b-a7303091374d DEBUG Ignoring flush on subsegment fa120c0150f16b3d. Associated segment is marked as not sampled.
Huh, interesting. Is the aws-xray-sdk
for Node somehow setting Sampled=0
? I wonder why this is different in Python. If you've got the time, you might add the AWS_XRAY_DEBUG_MODE
to the use-python
branch of the example repository to check it out. I'm happy to merge PRs to my sample repo to make debugging easier for others, as well @jcarver989 😄
Hi @blimmer and @jcarver989,
Thank you for this investigation, this is very strange indeed! The SDK's code for running in lambda mode doesn't ever set the SAMPLED
state, so I suspect this is a result of some different behavior in the Node.js Lambda runtime (which would also explain why this isn't seen in Python). I will reach out the Lambda team and keep you posted.
Hi.
For some additional information; I am in the same place as you @blimmer, and was interested in you example. Was able to replicate your issue, and made an attempt toupgrade to aws-sdk v3.
This solved the issue, and generates the same trace as Python.
The new setups are:
import { Lambda } from "@aws-sdk/client-lambda";
import { S3 } from "@aws-sdk/client-s3";
import { captureAWSv3Client } from "aws-xray-sdk";
const lambda = new Lambda({});
const s3 = new S3({});
captureAWSv3Client(lambda);
captureAWSv3Client(s3);
const handler = async (event:any, context:any) => {
console.info("This is the triggering lambda");
console.info(JSON.stringify(event));
console.info("Listing buckets to generate a segment...");
const { Buckets } = await s3.listBuckets({});
console.info(`Found ${Buckets?.length || 0} S3 buckets.`);
console.info("Async invoking the lambda...");
await lambda
.invoke({
FunctionName: process.env.TRIGGERED_FUNCTION_NAME!,
Payload: Buffer.from(JSON.stringify({
invokingFunctionRequestId: context.awsRequestId,
})),
InvocationType: 'Event'
});
console.info("Invoked lambda. Ending!");
};
export { handler };
and
import { captureAWSv3Client } from "aws-xray-sdk";
import { S3 } from "@aws-sdk/client-s3";
const s3 = new S3({});
captureAWSv3Client(s3);
const handler = async (event:any) => {
console.info("This is the triggered lambda");
console.info(JSON.stringify(event));
console.info("Listing buckets to generate a segment...");
const { Buckets } = await s3.listBuckets({});
console.info(`Found ${Buckets?.length || 0} S3 buckets.`);
};
export { handler };
In package.json
"@aws-sdk/client-lambda": "^3.76.0",
"@aws-sdk/client-s3": "^3.76.0",
while uninstalling aws-sdk
The conversion to v3 might be a rough and out of scope, but hope this helpst with identifying the issue.
Hello,
After investigating with the Lambda team, it turns out this issue is the result of using the InvokeAsync
API for invoking the Lambda function. Using InvokeAsync is deprecated in favor of Invoke with InvocationType set to 'Event', the deprecated API does not guarantee the same behavior as using the recommended API: https://docs.aws.amazon.com/lambda/latest/dg/API_InvokeAsync.html
We've confirmed this issue is fixed when using the Invoke
API.
Background
Consider a fairly simple application containing:
In both lambdas, the
captureAWS
method fromaws-xray-sdk
is used as recommended by this documentation.Observed Behavior
The triggering lambda was run from the console.
As you can see, the last segment is a
202
HTTP status code, meaning that the async invocation of the test lambda succeeded.However, the triggered lambda produced no traces at all:
Even though it was definitely invoked:
Expected Behavior
Because these two lambdas are triggered by the same "logical request", I expected the X-Ray tracing segment generated by the triggering lambda to be passed as metadata to the async invocation of the triggered lambda.
At the very least, even if the traces were not linked, I expected a trace to be produced by the triggered lambda.
Python Behavior
I'm pretty confident that this is an issue with the Node version of the
aws-xray-sdk
package because everything works as expected when using the Python version ofaws-xray-sdk
.You'll see that the X-Ray trace includes the asynchronous invocation in the same trace, which is the expected behavior for Node, as well.
Reproduction Repository
I created a sample aws-cdk repository to demonstrate this issue. Please reference the README on https://github.com/blimmer/xray-async-invocation to deploy the problematic Node lambdas.
This repo also has a branch to deploy Python versions of these lambdas (
triggering.py
&triggered.py
) to show it working properly when using Python: https://github.com/blimmer/xray-async-invocation/tree/use-python . Follow this section of the README to deploy the Python versions of the lambdas.Other Things I've Tried
I tried some of the various other
capture
methods (capturePromises
&captureHTTPsGlobal(require("http"))
) with the Node SDK and they didn't fix the problem.