DataDog / dd-trace-java

Datadog APM client for Java
https://docs.datadoghq.com/tracing/languages/java
Apache License 2.0
577 stars 286 forks source link

No APM traces logged on canceled requests in Reactor #1100

Closed catchin closed 4 years ago

catchin commented 4 years ago

I have a Spring Webflux application with DataDog APM enabled and traces are usually logged fine. However, some requests are canceled by the http client (e.g., because of a timeout configured at the client or a timeout of the load balancer) and then the trace does not appear in APM. This is a bit unfortunate as exactly such requests would need investigation, in my case why they took so long. In my code I found out that for these requests, the subscription gets canceled (I found out by logging onCancel events on the Mono). Looking through the implementation I saw that ReactorCoreAdviceUtils only finishes the span for the onComplete and onError cases. Sadly there is no onCancel in CoreSubscriber to handle the span in the canceled case. Do you see any other chance to get such spans visible in APM?

randomanderson commented 4 years ago

Strange that onCompleted isn't called when a subscription gets canceled. The solution might be simply adding a doOnCancel to the Mono/Flux that is returned in ReactorCoreAdviceUtils.setPublisherSpan. We'll look into it.

devinsba commented 4 years ago

@catchin I'm working on something related right now, if it's not too much trouble could you create a simple example app or reactor flow that shows this behavior? If so I can add it to my test cases or adjust our existing cases

catchin commented 4 years ago

Hi @devinsba I created a small spring boot app showing this behavior - see readme for an explanation https://github.com/catchin/reactor-canceled-mon

devinsba commented 4 years ago

Thanks. I'll drop this test into #1203 as soon as I get a chance

github-actions[bot] commented 4 years ago

:robot: This issue has been addressed in the latest release. See full details in the Release Notes.

howardem commented 1 year ago

Hi @devinsba, it has been almost 4 years since @catchin reported this issue and it's still open. Does Datadog Engineering Team have any plan / will to solve this problem any time soon? A lot of development teams use Webflux for their reactive Spring Boot microservices. And when it comes to Datadog, as exhibitor for the latest VMware Explore 2023 but also present in previous editions of Spring One Platform, in my honest opinion, Datadog, should've come up with solution long time ago.

I added a comment here Traces not captured for webflux netty.request when request is canceled to let you know that as September 2023, this is still a pressing issue for Engineering Teams using DD.

cc: @mcculls, @richardstartin, @bantonsson