Open RealityCtrl opened 3 years ago
Hi @RealityCtrl
I'm not very familiar with axios
but it seems like it's an async HTPP client. I'm guessing what's happening here is that your async request is taking probably taking 1.06s to complete but in the meantime the execution moves forward and the lambda segment is closed before that async request completes and the corresponding subsegment is emitted later on.
But I'm also confused by the fact that even if the HTTP request is taking that long, once the lambda finishes execution in 269ms, this subsegment shouldn't be emitted after the function terminates.
I'll let @willarmiros take a stab at this one since he may have more idea.
You are correct the axios.get is asynchronous, I can't share the code but we do await the result of the API call. For these calls the API call is completing to our knowledge and the downstream system does not have a response time corresponding to the longer time. We also see the same thing with using captureAsyncFunc for instrumenting a connection to redis elasticache and see times on that longer than the configured max function execution time of 15s.
e.g. this one apparently took 17 seconds to disconnect from elasticache but the function completed in 612ms. e.g. this one apparently took 22 seconds to disconnect from elasticache but the function completed in 887ms.
Hi @RealityCtrl,
The only explanation I can think of is that somehow the Lambda function is completing its invocation and being frozen before the subsegments are closed, causing them to be closed during the next invocation and reflect a time longer than the function invocation. It will be very difficult to investigate further without code reproducing the issue, and I know these production Lambda issues tend to be very difficult to reproduce.
I would also recommend trying out the AWS Distro for OpenTelemetry JavaScript Lambda layer, it can auto-instrument your downstream requests & send trace data to X-Ray without modifying your code. Maybe the issue won't appear with that.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs in next 7 days. Thank you for your contributions.
I can report that I am experiencing the same issues in on of our environments. Is there any plan to look further into this issue? We are using aws-xray-skd 3.6.0.
I will check out if we can change to OpenTelemetry, but I still would expect that an issue like this deserves a fix in a library like this.
We have a serverless typescript lambda function which makes API calls to other APIs. We are using the aws-xray-sdk to instrument downstream API calls that are made with axios. Currently using version 3.3.3 of the SDK.
We are seeing the execution time of those downstream API calls as substantially higher that the actual lambda function execution time and the downstream APIs does not see a lambda execution time corresponding to this execution time.
In the example below the API call is listed as 1.06 seconds execution time but the function only executed for 275ms.
What could be causing this discrepancy?
Cloudwatch log message also show 269 ms execution time.
Code snippets for the implementation of tracing.