Open vcschapp opened 3 years ago
Hi @vcschapp ,
Thanks for sharing detailed repro steps. I think one of the reason why there are no trace sent to x-ray could be because of the timeout you set for your request. I see that you have set request timeout for 100 seconds. You probably have to check your lambda function timeout as well. I am guessing default lambda timeout is 3 seconds. So looks like your lambda function is exited even before your request completes and therefore SDK never gets a chance to close the segment and send it to X-Ray service. I would recommend you to try out request timeout which is less than lambda function timeout and see if you're still encountering this issue. If yes, then I would be happy to help.
Hi @bhautikpip, Thanks for such a quick reply.
I can confirm the Lambda function is not timing out. This is a bug in the way the X-Ray SDK is working with the request. (I see the point of confusion was the request timeout. It is set to 100 milliseconds, not seconds!)
I'm able to work around it by forcing the X-Ray SDK to emit the segment with the dangling sub-segments by setting its ContextDone
flag to true
. However I'm obviously misusing ContextDone
and would prefer not to.
From what I can observe, this seems to be happening on the lambdas where the lambda timeout (~5 mins) is way longer than the HTTP client timeout (60s). Wonder if anyone else is seeing this given how old this issue is?
Issue Type
Bug.
Expected Behavior
I have decorated an
http.Client
usingxray.Client
. When I make an HTTP request using the decorated client and the request times out due to a client-side timeout value, an X-Ray trace should be published and it should include the timeout error.Actual Behavior
No X-Ray trace is published. It is as if the HTTP request never happened at all.
Reproduction Steps
See the example code at bottom for a self-contained Lambda function which repros the problem.
Root Cause (Hypothesis)
The root cause of this problem seems to be that when a timeout error happens (or the HTTP request's root context is cancelled for any other reason), some of the sub-subsegments under
HTTPSubsegments
are left dangling.In particular, if the timeout or context cancellation happens during one of the connection setup phases (DNS, TCP connect, TLS handshake) then a sub-sub-segment below the
connect
sub-segment is left open. For whatever reason, this results in X-RayThe AWS SDK for X-Ray (Go) code tries to address this issue by calling
GotConn
in case of error (see in xray/client.go) but the problem is this doesn't address open sub-sub-segments ofconnect
.Notes
xray.Client
code. I am using the AWS X-Ray SDK for Go to write my own X-Ray plugin for a retrying HTTP framework for Go called httpx (my plugin is on GitHub at gogama/aws-xray-http). For my use case, it would be helpful if you simply extendxray.HTTPSubsegments
with a new exported method calledError(err error)
which just "does the right thing" and walks the sub-segment tree closing off every sub-segment which is open in an appropriate manner.Example Lambda Function
{"url":"https://jsonplaceholder.typicode.com/todos/1","timeout": 100}