Open Zhenye-Na opened 2 years ago
Hi @Zhenye-Na,
Thank you for raising this. You're on the right track. Basically the X-Ray SDK stores segment context using ThreadLocal. It uses this context to capture outgoing AWS SDK requests & generate a subsegment for them. If there's no context available, the SDK throws a SegmentNotFoundException
. If transferManager creates a thread pool and uses new threads to send requests, then the X-Ray SDK will attempt to capture them and fail due to empty threadlocal, causing this exception.
To ignore this error, you can set the env var AWS_XRAY_CONTEXT_MISSING=IGNORE_ERROR
, though of course this will cause some requests to not be instrumented. I'm not sure if the AWS SDK exposes enough of their implementation for us to hook into the new thread pool and capture these requests, nor do I think we'd have the bandwidth to extend our instrumentation to support this case. However I would recommend you open this feature request in the OpenTelemetry Java repo as well since they have an AWS SDK instrumentation that could be extended to support this.
Hello @willarmiros
Thank you so much for your reply and confirmation on the experiments I did. Basically what happened after this is we decided to temporarily bypass the SegmentNotFoundException
by using the low level API that S3 team provided to do multi-part uploading and XRay works well with it so far.
I will open a feature request in the repo you mentioned above. However, I am not very familiar with the "terminology" / detailed process to solve this problem. Do you mind if I cc you later in the new issue I raised for OpenTelimetry
team?
Thank you so much!
Merry Xmas 🎅
add some details on my own experiments for someone comes to this issues:
AWSXRay.withContextMissingStrategy(IgnoreErrorXXXStrategy)
this does not throw any exceptions which is nice, but the request is timed out.transferManager
create threadPool, try to retrieve the traceEntity
of the GlobalRecorder
and beginSubsegment()
in each threads that transferManager
created. -> either timed out or exception thrownDo you mind if I cc you later in the new issue I raised for OpenTelimetry team?
No problem
but the request is timed out.
Hmm so just adding X-Ray instrumentation and the ignore error strategy caused the request to time out? That's strange. It might have something to do with how transferManager works. Feel free to post some reproduction code, but glad you have a workaround for now!
https://github.com/open-telemetry/opentelemetry-java-instrumentation/issues/6104
Issue created in OpenTelemetry Java, lets see how this goes
Also, raised one ticket in AWS SDK v2 to see if we get the chance to fix this
I am wondering if this issue will be included in the roadmap ?
Or are there any workarounds if we would like to continue use X-Ray in a multi-threading env
Hello
We are currently using XRay for the services we own, and one of the API involves files transfer, so I add the dependency of using S3
transferManager
. However this throwsXRay "SegmentNotFoundException"
.Spend a little time checking what is the root cause and it turns out that it is because
transferManager
creates a thread pool and XRay is not able to gather context for the threads that transferManager created.I am wondering any available solution for this already, having checked the following resources, but no luck
resources: