Open mrtubby opened 3 years ago
Checking in to see if there has been any progress on this. Even as much as confirmation of bug
Could you please also post couple your non-deterministic output examples(screenshots) here?
Sure, I didn't think screenshots of the output would be very easy to deal with. I am attaching some files containing json objects - each numbered key is a datadog.trace.agent.common.writer.LoggingWriter - write(trace)
log line from the output. Both trials are the output of independent sets of requests (aka pasting the command from step 5 above into a terminal)
Trial1.txt In this output item "0" is for traceId: 3575870512172022080, and it contains spans for all 0-9 subrequests that it should, plus spans for another 0-6 subrequests that when combined with the spans from item "3" all together make up the subrequests for traceId: 5097975924928825270 from item "1"
Trial2.txt Item "0" is missing all subrequest spans, as well as the ratpack.handler span for this request. TraceId: 9041937206269558385 shows up in several of the log items (1,2,4) and contains spans that belong to 3 of the 4 requests made
Checking in to see if I can offer any further assistance on this.
Hey @mrtubby. I tried your sample app and the steps documented above several times and unfortunately was unable to reproduce the problem. I just submitted a PR that I think might help: #2740. Please try the following build and let me know: https://154452-89221572-gh.circle-artifacts.com/0/libs/dd-java-agent-0.81.0-SNAPSHOT.jar
Sorry about the delay in my response here I had moved on to other projects.
The snapshot agent does appear to keep better track of spans in my application, and in the dorky test app I put together.
@tylerbenson I recognize that there is push back on making these changes from the PR comments. Is this still being looked at?
The pushback is specifically because the PR does not demonstrate that the change will fix your issue.
@mrtubby part of the issue is that instrumentation connected unrelated traces for another app when we deployed it. As a result, it seems too risky.
I was also unsuccessful in reproducing the problem with your sample app. Perhaps you could distill it down somehow?
An instrumented Ratpack server handling multiple requests making its own requests out to another service via parallelBatches does not keep its spans straight.
I have put together an example with a server that has an endpoint that will make N parallel calls to a copy of itself. Each of those calls will sleep for a time before returning the input value.
Reproduction Steps:
java -Dport=5051 -jar {outputJar}
java -javaagent:{path_to_dd-java-agent.jar} -Ddd.profiling.enabled=true -Ddd.writer.type=LoggingWriter -Dport=5050 -jar {outputJar}
curl http://localhost:5050/remoteParallelSlow/10 & curl http://localhost:5050/remoteParallelSlow/9 & curl http://localhost:5050/remoteParallelSlow/8 & curl http://localhost:5050/remoteParallelSlow/1 &
Expected output: The spans written to stdout should have 4 traceIds each with 1 netty.request span 1 ratpack.handler span N netty.client.request spans (where N is the final argument in the curl path)
Actual output: 4 traceIds each with 1 netty.request span, but from there it becomes non-deterministic