Closed vvasilevbosch closed 1 year ago
@vvasilevbosch are you sampling 100% of all requests? And how much load are we talking about?
Because I would assume that some dropping of traced requests is done before the tracing would slow down the functionality of the service or would overwhelm the OTEL endpoint. The used logback logstash appender also does that. Under heavy load, not all log statements might be available.
Maybe this is even configurable in Kamon, the library Ditto uses for tracing. Did you check?
@thjaeckle I have the following setup: 1_000_000 things, 8 connectivity,policies,things, 1 things-search and 1 gateway, 1 kafka connection with 8 clients, to which I send 5000 modifyThing messages per second. I will further check the Kamon configuration. Thanks!
Ok, with this load I would expect that you would have to scale your Jaeger backend. Every command will cause at least 5 spans of a trace, reported via at least 3 services in Ditto.
More realistic IMO would be to configure that only eg 1% of the requests are sampled..
I tried increasing the buffer size of the tracing reporter, but it seems there is a bug in the Kamon library, I have raised an issue in their repo: https://github.com/kamon-io/Kamon/issues/1281
Closing this issue
Incomplete tracing is observed, while load testing the service with modify thing commands, via kafka connection. What can be seen on the trace, is that there are spans with invalid parent span IDs and also a lot of missing spans that should be there. I attach json export of two traces(complete and incomplete) as well as screenshot from jaeger ui. traces.zip