DataDog / dd-trace-java

Datadog APM client for Java
https://docs.datadoghq.com/tracing/languages/java
Apache License 2.0
570 stars 283 forks source link

Span trees do not maintain correct relationships in ratpack #2403

Open mrtubby opened 3 years ago

mrtubby commented 3 years ago

An instrumented Ratpack server handling multiple requests making its own requests out to another service via parallelBatches does not keep its spans straight.

I have put together an example with a server that has an endpoint that will make N parallel calls to a copy of itself. Each of those calls will sleep for a time before returning the input value.

Reproduction Steps:

  1. clone example project: https://github.com/mrtubby/ratpackddapmfork
  2. build it using the gradle shadowJar task
  3. stand up a support server for the example to call out to java -Dport=5051 -jar {outputJar}
  4. stand up an instrumented example server java -javaagent:{path_to_dd-java-agent.jar} -Ddd.profiling.enabled=true -Ddd.writer.type=LoggingWriter -Dport=5050 -jar {outputJar}
  5. Make some concurrent requests to the example server: curl http://localhost:5050/remoteParallelSlow/10 & curl http://localhost:5050/remoteParallelSlow/9 & curl http://localhost:5050/remoteParallelSlow/8 & curl http://localhost:5050/remoteParallelSlow/1 &

Expected output: The spans written to stdout should have 4 traceIds each with 1 netty.request span 1 ratpack.handler span N netty.client.request spans (where N is the final argument in the curl path)

Actual output: 4 traceIds each with 1 netty.request span, but from there it becomes non-deterministic

mrtubby commented 3 years ago

Checking in to see if there has been any progress on this. Even as much as confirmation of bug

lpriima commented 3 years ago

Could you please also post couple your non-deterministic output examples(screenshots) here?

mrtubby commented 3 years ago

Sure, I didn't think screenshots of the output would be very easy to deal with. I am attaching some files containing json objects - each numbered key is a datadog.trace.agent.common.writer.LoggingWriter - write(trace) log line from the output. Both trials are the output of independent sets of requests (aka pasting the command from step 5 above into a terminal)

Trial1.txt In this output item "0" is for traceId: 3575870512172022080, and it contains spans for all 0-9 subrequests that it should, plus spans for another 0-6 subrequests that when combined with the spans from item "3" all together make up the subrequests for traceId: 5097975924928825270 from item "1"

Trial2.txt Item "0" is missing all subrequest spans, as well as the ratpack.handler span for this request. TraceId: 9041937206269558385 shows up in several of the log items (1,2,4) and contains spans that belong to 3 of the 4 requests made

mrtubby commented 3 years ago

Checking in to see if I can offer any further assistance on this.

tylerbenson commented 3 years ago

Hey @mrtubby. I tried your sample app and the steps documented above several times and unfortunately was unable to reproduce the problem. I just submitted a PR that I think might help: #2740. Please try the following build and let me know: https://154452-89221572-gh.circle-artifacts.com/0/libs/dd-java-agent-0.81.0-SNAPSHOT.jar

mrtubby commented 3 years ago

Sorry about the delay in my response here I had moved on to other projects.

The snapshot agent does appear to keep better track of spans in my application, and in the dorky test app I put together.

mrtubby commented 3 years ago

@tylerbenson I recognize that there is push back on making these changes from the PR comments. Is this still being looked at?

richardstartin commented 3 years ago

The pushback is specifically because the PR does not demonstrate that the change will fix your issue.

tylerbenson commented 3 years ago

@mrtubby part of the issue is that instrumentation connected unrelated traces for another app when we deployed it. As a result, it seems too risky.

I was also unsuccessful in reproducing the problem with your sample app. Perhaps you could distill it down somehow?