Closed vsingal-p closed 1 year ago
Is it possible to provide an example of how you are making the webclient call?
Also, I am curious if the following option improves the situation:
-Ddd.integration.reactor-hooks.enabled=true
I am uploading a sample project - https://github.com/vsingal-p/dd-java-agent-webclient-issue-demo
Here are the jvm args
-javaagent:dd-java-agent.jar -Ddd.propagation.style.inject=B3 -Ddd.propagation.style.extract=B3 -Ddd.integration.reactor-hooks.enabled=true
Here's the request I make
curl --location --request GET 'http://localhost:8080/demo' \
--header 'x-b3-traceid: 3469035142037265' \
--header 'x-b3-spanid: 803698824837'
I am facing 2 issues -
dd.trace_id
and dd.span_id
The first issue is because b3 IDs need to be 16 characters[^1] hex encoded. The equivalents of what you posted above are: 000C5311AF5ABF11
, 000000BB202ED685
We will have to look into the second
[^1]: Trace ID can be either 16 or 32
Okay, thanks. I was skeptical but couldn't find any sample
Here's the project - https://github.com/vsingal-p/dd-java-agent-webclient-issue-demo ~For some reason I cannot edit my previous comment~ Updated original as well
The hex strings worked (partially). Using them I am able to get same dd.trace_id
in all calls made to server. However I get different dd.span_id
for each call. Is that expected? Shouldn't it use x-b3-spanid
?
Good catch, x-b3-span ID in our implementation is used as the parent ID for the server span
Any reason we don't support all 3 headers in dd-java-agent?
x-b3-traceid
x-b3-spanid
x-b3-parentspanid
We don't support the zipkin shared spans concept, in zipkin the "transport" span is shared by the client and server, which is why spanid and parent are needed so that the server side copy of the span knows it's parent. Since we don't share spans, we only need the span ID from the client side and use that as the server side parent ID.
Gotcha! Makes sense. I was just getting easy with sleuth+brave, so found this change in datadog a bit weird.
Summarising the issues till now -
You can configure sleuth the have the same behavior by using spring.sleuth.supports-join=false
in case you need to run in a mixed environment
Sorry for barging in heavy, but facing one more 😓
When I make 2 webclient calls one after the other, dd.span_id
changes in MDC. span_id
for a server call should not change throughout the call I guess.
2022-02-07 22:05:14,063 thread=[reactor-http-nio-4] level=INFO logger=WebClientApi dd.trace_id=3469035142037265, dd.service=com.example.demo.DdJavaAgentWebclientDemoApplicationKt, dd.span_id=8160433720268064940
GET:
URL:https://google.com:
Headers:[]
Attributes:{org.springframework.web.reactive.function.client.WebClient.uriTemplate=https://google.com, org.springframework.web.reactive.function.client.ClientRequest.LOG_ID=ccedc68}
-2022-02-07 22:05:15,958 thread=[reactor-http-nio-4] level=INFO logger=WebClientApi dd.trace_id=3469035142037265, dd.service=com.example.demo.DdJavaAgentWebclientDemoApplicationKt, dd.span_id=8449475067196755785
GET:
URL:https://google.com:
Headers:[]
Attributes:{org.springframework.web.reactive.function.client.WebClient.uriTemplate=https://google.com, org.springframework.web.reactive.function.client.ClientRequest.LOG_ID=30bfd31d}
Ah, OK. I see now what you are seeing. The reason there aren't headers in the webflux request object is that we add them in the netty instrumentation. If you send the request to https://httpbin.org/get
and print the response you should see headers in the payload that was sent
I see. Interesting! I have been scratching my head around this till now 😲 So the main concern for which this issue was raised is working fine 👍
Remaining open concerns -
dd.span_id
in MDC changing after multiple webclient calls~Also is there any way we can see the spanId for the client call being made in logs?~ (DD does not share parent-child span so no span_id is generated for client call)
@devinsba thoughts on above issues?
Span ID on client side becomes Parent Span ID on server side.
Each webclient call will get it's own Span ID so the MDC will be different.
As for logging the span ID for the client call, you probably need to turn on request logging for netty. I'm not sure the best way to do so
Span ID on client side becomes Parent Span ID on server side.
Got it. so dd.span_id
in MDC should correspond to this parent span, right? so that all the logs within that server call can be correlated
Each webclient call will get it's own Span ID so the MDC will be different.
I meant dd.span_id in server logs changes upon multiple webclient calls. That shouldn't happen. (resonates with above comment)
server receives call with x-b3-traceid=abc123456def x-b3-spanid=zzz00000www
log.info("Log1")
log.info("Log2")
webclient.downstreamCallOne()
log.info("Log3")
log.info("Log4")
webclient.downstreamCallOne()
Above snippet generates logs
logger=LoggingServerStartInterceptor level=info dd.trace_id=abc123456def dd.span_id=xxx12345yyy message=Server call start
logger=demo level=info dd.trace_id=abc123456def dd.span_id=xxx12345yyy message=Log1
logger=demo level=info dd.trace_id=abc123456def dd.span_id=xxx12345yyy message=Log2
logger=webclientInstance level=info dd.trace_id=abc123456def dd.span_id=xxx67890yyy message=call succeeded
logger=demo level=info dd.trace_id=abc123456def dd.span_id=xxx67890yyy message=Log1
logger=demo level=info dd.trace_id=abc123456def dd.span_id=xxx67890yyy message=Log1
logger=webclientInstance level=info dd.trace_id=abc123456def dd.span_id=xxx334455yyy message=call succeeded
logger=LoggingServerFinishInterceptor level=info dd.trace_id=abc123456def dd.span_id=xxx334455yyy message=Server call complete
I believe dd.span_id should remain same throughout the server call? which should be xxx12345yyy
Ok, that does sound like a bug. If you can create a reproducer then we can look at making a failing test to work against
Sure, let me work something with the demo app I created above to reproduce this
Updated the project with a sample code - https://github.com/vsingal-p/dd-java-agent-webclient-issue-demo.git
jvm args
-javaagent:dd-java-agent.jar -Ddd.propagation.style.inject=B3 -Ddd.propagation.style.extract=B3 -Ddd.integration.reactor-hooks.enabled=true
Request
curl --location --request GET 'http://localhost:8080/demo' \
--header 'x-b3-traceid: 000C5311AF5ABF11' \
--header 'x-b3-spanid: 000000BB202ED685'
Logs
2022-02-09 01:24:12,130 thread=[reactor-http-nio-4] level=INFO logger=LoggingServerInterceptor dd.trace_id=3469035142037265, dd.service=com.example.demo.DdJavaAgentWebclientDemoApplicationKt, dd.span_id=7662802724461287617 Server call starts
-2022-02-09 01:24:12,679 thread=[reactor-http-nio-4] level=INFO logger=DemoController dd.trace_id=3469035142037265, dd.service=com.example.demo.DdJavaAgentWebclientDemoApplicationKt, dd.span_id=7662802724461287617 Log1
-2022-02-09 01:24:12,679 thread=[reactor-http-nio-4] level=INFO logger=DemoController dd.trace_id=3469035142037265, dd.service=com.example.demo.DdJavaAgentWebclientDemoApplicationKt, dd.span_id=7662802724461287617 Log2
-2022-02-09 01:24:15,229 thread=[reactor-http-nio-4] level=INFO logger=WebClientApi dd.trace_id=3469035142037265, dd.service=com.example.demo.DdJavaAgentWebclientDemoApplicationKt, dd.span_id=4419075444443843519 webclient call complete
-2022-02-09 01:24:15,295 thread=[reactor-http-nio-4] level=INFO logger=DemoController dd.trace_id=3469035142037265, dd.service=com.example.demo.DdJavaAgentWebclientDemoApplicationKt, dd.span_id=7662802724461287617 Log3
-2022-02-09 01:24:15,295 thread=[reactor-http-nio-4] level=INFO logger=DemoController dd.trace_id=3469035142037265, dd.service=com.example.demo.DdJavaAgentWebclientDemoApplicationKt, dd.span_id=7662802724461287617 Log4
-2022-02-09 01:24:16,369 thread=[reactor-http-nio-4] level=INFO logger=WebClientApi dd.trace_id=3469035142037265, dd.service=com.example.demo.DdJavaAgentWebclientDemoApplicationKt, dd.span_id=4451157002518768285 webclient call complete
-2022-02-09 01:24:16,398 thread=[reactor-http-nio-4] level=INFO logger=LoggingServerInterceptor dd.trace_id=3469035142037265, dd.service=com.example.demo.DdJavaAgentWebclientDemoApplicationKt, dd.span_id=994518264780377097 Server call complete
There are different from the example I gave above. I am trying to reproduce that scenario, in the meanwhile above should also work as seems like it also has some discrepancies
The original example I gave was from a springboot + grpc + kotlin app. It's based on my organisation's SDK and it's a bit difficult to replicate the same. Let me know if above example (springboot + reactor + kotlin) is not sufficient. I'll try to create a springboot-grpc-kotlin demo in that case
Any update on this? We are still facing the issue in kotlin+coroutines application. I see other open issues as well https://github.com/DataDog/dd-trace-java/issues/1123 and https://github.com/DataDog/dd-trace-java/issues/931
Can we get this picked up? My org has shifted to kotlin+coroutines for all our apps and this issue is a major blocker
^ Kotlin integration support has been added which has fixed this issue
I am facing an issue with using auto instrumentation with spring webflux + webclient. Headers are not being propagated to downstream call.
In the following commit,
propagate()
line was removed. I suspect this could be the cause. However I couldn't verify as I am not able to build dd-java-agent locally with changes. https://github.com/DataDog/dd-trace-java/pull/1876/files#diff-b53db5fbb0c565c82f99286c8d966b127e12ba58525f4a2fdf69a4b01832f902L55Versions spring-webflux : 5.3.13 dd-java-agent: 0.88.0