aws / aws-xray-java-agent

The official AWS X-Ray Auto Instrumentation Agent for Java.
Apache License 2.0
42 stars 18 forks source link

Not emitting segment named 'demo' as it parents in-progress subsegments #73

Open wesleymooiman opened 3 years ago

wesleymooiman commented 3 years ago

While implementing the xray java agent i noticed an issue with multithreaded code. My application (API) builds a large response which consist of multiple content blocks. Each content block makes one or more request to multiple services. The creation of a block is wrapped in a java.util.concurrent.Future.

The aws x ray agent is not always closing the subsegments which are created for the outgoing apache client calls. Because of this the parent segment is never send since it has in-progress subsegments. However the outgoing calls/segments all have been finished this is clearly visible when debugging. I created a really basic demo application which also shows this behaviour. This simplified demo directly calls the feignclient while the actual application uses Resilience4j to wrap the feign calls. The agent has no trouble closing subsegments which are wrapped by Resilience4j and also run in a separate thread.

The first time http://localhost:8080/posts is called it seems to actually close all the subsegments and sends the udp packet to the deamon however when you make another call to http://localhost:8080/posts it fails. You can tweak how many external calls are made by tweaking the IntStream.range(1, 10).

Here you can see that even though the first request seems ok it actually thinks most of the sub segments are still in progress. Image

https://github.com/wesleymooiman/xray-agent-demo (https://github.com/wesleymooiman/xray-agent-demo/blob/master/src/main/java/com/example/demo/post/PostController.java)

2020-12-18 12:10:04,791 1-5fdc8e0c-dfcd49e48d80e11caabab3ad@5351c1c1cb2f1b10 DEBUG         TargetPoller:125  - Applicable default rule: Default
2020-12-18 12:10:04,791 1-5fdc8e0c-dfcd49e48d80e11caabab3ad@5351c1c1cb2f1b10 DEBUG      CentralizedRule:202  - Sampling target has expired for rule Default. Borrowing a request.
2020-12-18 12:10:04,792 1-5fdc8dff-7743171376d0ed885bf6d8af@6a9efe6eab992a14 DEBUG XRayTransactionContext:64   - Beginning subsegment named: jsonplaceholder.typicode.com
2020-12-18 12:10:04,792 1-5fdc8dff-7743171376d0ed885bf6d8af@6a9efe6eab992a14 DEBUG XRayTransactionContext:64   - Beginning subsegment named: jsonplaceholder.typicode.com
2020-12-18 12:10:04,792 1-5fdc8dff-7743171376d0ed885bf6d8af@6a9efe6eab992a14 DEBUG XRayTransactionContext:64   - Beginning subsegment named: jsonplaceholder.typicode.com
2020-12-18 12:10:04,792 1-5fdc8dff-7743171376d0ed885bf6d8af@6a9efe6eab992a14 DEBUG XRayTransactionContext:64   - Beginning subsegment named: jsonplaceholder.typicode.com
2020-12-18 12:10:04,792 1-5fdc8dff-7743171376d0ed885bf6d8af@6a9efe6eab992a14 DEBUG XRayTransactionContext:64   - Beginning subsegment named: jsonplaceholder.typicode.com
2020-12-18 12:10:04,792 1-5fdc8dff-7743171376d0ed885bf6d8af@6a9efe6eab992a14 DEBUG XRayTransactionContext:64   - Beginning subsegment named: jsonplaceholder.typicode.com
2020-12-18 12:10:04,792 1-5fdc8dff-7743171376d0ed885bf6d8af@6a9efe6eab992a14 DEBUG XRayTransactionContext:64   - Beginning subsegment named: jsonplaceholder.typicode.com
2020-12-18 12:10:04,836 1-5fdc8e0c-dfcd49e48d80e11caabab3ad@03a17fae4a223484 DEBUG XRayTransactionContext:79   - Ending subsegment named: jsonplaceholder.typicode.com
2020-12-18 12:10:04,836 1-5fdc8e0c-dfcd49e48d80e11caabab3ad@19c5b115e7e65c92 DEBUG XRayTransactionContext:79   - Ending subsegment named: jsonplaceholder.typicode.com
2020-12-18 12:10:04,836 1-5fdc8e0c-dfcd49e48d80e11caabab3ad@72ae23b1caa741dd DEBUG XRayTransactionContext:79   - Ending subsegment named: jsonplaceholder.typicode.com
2020-12-18 12:10:04,836 1-5fdc8e0c-dfcd49e48d80e11caabab3ad@03a17fae4a223484 DEBUG       SubsegmentImpl:52   - Subsegment named 'jsonplaceholder.typicode.com' ending. Parent segment named 'demo' has reference count 7
2020-12-18 12:10:04,836 1-5fdc8e0c-dfcd49e48d80e11caabab3ad@72ae23b1caa741dd DEBUG       SubsegmentImpl:52   - Subsegment named 'jsonplaceholder.typicode.com' ending. Parent segment named 'demo' has reference count 7
2020-12-18 12:10:04,836 1-5fdc8e0c-dfcd49e48d80e11caabab3ad@19c5b115e7e65c92 DEBUG       SubsegmentImpl:52   - Subsegment named 'jsonplaceholder.typicode.com' ending. Parent segment named 'demo' has reference count 7
2020-12-18 12:10:04,843 1-5fdc8e0c-dfcd49e48d80e11caabab3ad@5351c1c1cb2f1b10 DEBUG XRayTransactionContext:64   - Beginning subsegment named: jsonplaceholder.typicode.com
2020-12-18 12:10:04,844 1-5fdc8e0c-dfcd49e48d80e11caabab3ad@5351c1c1cb2f1b10 DEBUG XRayTransactionContext:64   - Beginning subsegment named: jsonplaceholder.typicode.com
2020-12-18 12:10:04,848 1-5fdc8e0c-dfcd49e48d80e11caabab3ad@5209a45bf387f937 DEBUG XRayTransactionContext:79   - Ending subsegment named: jsonplaceholder.typicode.com
2020-12-18 12:10:04,848 1-5fdc8e0c-dfcd49e48d80e11caabab3ad@5209a45bf387f937 DEBUG       SubsegmentImpl:52   - Subsegment named 'jsonplaceholder.typicode.com' ending. Parent segment named 'demo' has reference count 6
2020-12-18 12:10:04,876 1-5fdc8e0c-dfcd49e48d80e11caabab3ad@5cb887f741db0ba5 DEBUG XRayTransactionContext:79   - Ending subsegment named: jsonplaceholder.typicode.com
2020-12-18 12:10:04,877 1-5fdc8e0c-dfcd49e48d80e11caabab3ad@5cb887f741db0ba5 DEBUG       SubsegmentImpl:52   - Subsegment named 'jsonplaceholder.typicode.com' ending. Parent segment named 'demo' has reference count 5
2020-12-18 12:10:04,890 1-5fdc8e0c-dfcd49e48d80e11caabab3ad@5351c1c1cb2f1b10 DEBUG      AWSXRayRecorder:539  - Ending segment named 'demo'.
2020-12-18 12:10:04,890 1-5fdc8e0c-dfcd49e48d80e11caabab3ad@5351c1c1cb2f1b10 DEBUG      AWSXRayRecorder:549  - Not emitting segment named 'demo' as it parents in-progress subsegments.
willarmiros commented 3 years ago

Hi @wesleymooiman,

Sorry for the delay on this. I'm not sure if the screenshot corresponds to the same segment as the logs, but it seems like in the case of that screenshot, the other 9 async requests are ending after the request to localhost:8080/posts is handled. This is problematic because when the request is handled, the segment ends, which removes it from the context. The subsegments, in turn, try to find the segment in the context when they are supposed to end but cannot find it because it's after the request is handled.

The only workaround to get X-Ray working, which is for obvious reasons probably not tenable, is to hold off on completing the request until all requests are joined. In short, we probably need to rethink how we're managing context for this scenario. Thank you for adding repro code, I'll test it out and post an update when able.