SmartThingsOSS / smartthings-brave

Experimental extensions for the Open Zipkin Brave library
9 stars 8 forks source link

IOTEVENT-24: kafka client interceptors #2

Closed edwardsoo-ss closed 7 years ago

edwardsoo-ss commented 7 years ago

This PR is just a place to get some comment, not meant to be final @llinder take a look if you have some time please

edwardsoo-ss commented 7 years ago

trace example https://tracingd.smartthingstlm.com/traces/fba876dc8882fa32 producer usage: https://git.vandevlab.com/iot/bouncer/merge_requests/72 consumer usage: https://git.vandevlab.com/iot/pusher/merge_requests/61

still has some problems:

edwardsoo-ss commented 7 years ago

@Bijnagte please have a look

llinder commented 7 years ago

Over all this looks great. The whole config map thing leave a bit to be desired but I don't see a way around that since its the way Kafka interceptor configuration works.

I'm a bit concerned that there isn't a way to pass in the current trace context into the ProducerInterceptor. If that isn't possible the whole interceptor pattern is going to be a non starter since that is the only way to create a new child span rather than starting a whole new trace.

Bijnagte commented 7 years ago

I think it is nice to have this more generic approach. How will we migrate? I would think that we need to actually create a second topic in order to do this as there is not a good way to sniff the message to know when the cutover happened. I will comment on the actual usage PRs as well

edwardsoo-ss commented 7 years ago

@llinder updated producer interceptor to create span as child of current span managed by the kristofa brave lib, producer interceptor code runs in the same context so that wasn't a problem. However for some reason sometimes I will see 2 "sr" annotations e.g. https://tracingd.smartthingstlm.com/traces/437f35c7a4468fce I have tried using "ws" & "wr" instead but it still occurs (2 "wr").

I cannot pass the consumer received span to the server tracer because kafka client receives in batch, I think the consumer service may just have to know the envelope schema and do that in a non-transparent way.

edwardsoo-ss commented 7 years ago

@Bijnagte agreed, I will put some thoughts into how we can migrate (maybe another topic like you said)

edwardsoo-ss commented 7 years ago

bouncer -> kafka -> pusher -> paperboy & dove (no server spans): https://tracingd.smartthingstlm.com/traces/b55a0e715a1feea5

The kafka consumer code to join the trace from the kafka message envelope has become quite gross because of the batch receive issue. See https://git.vandevlab.com/iot/pusher/merge_requests/61

edwardsoo-ss commented 7 years ago

latest example: https://tracingd.smartthingstlm.com/traces/06636f5e85cf4a44

the consumer now returns TracedConsumerRecord which extends ConsumerRecord. consumer code can optionally check for tracing context and propagate them for spans caused by the kafka "wr" span.

tracing context injection/extraction in interceptors are now abstract, an implementation using a protobuf envelope message is provided as an example/default. "ws" & "wr" are still annotated and flushed in the interceptors.

llinder commented 7 years ago

Looking good!

Funny that the SQL query of all things has some rendering issues. I think I've seen issues like this running locally before so maybe its nothing.

Probably should be using "cs" and "sr" annotations though. This is the official way that Zipkin handles one way async spans like this so it has proper tests and is less likely to break the UI.

llinder commented 7 years ago

Looks like we have something amiss in the last sample trace.

image

The Server Receive/Send shouldn't be included with the Client Send/Receive. Also the timestamp seems to be absent on the SR/SS. This could be related to using the WS/WR annotations where the SR isn't correctly initialized in Pusher and the serverTracer.setServerSent() is never called.

edwardsoo-ss commented 7 years ago

@llinder you are right about "ws" "wr" causing issues. I switched back to "cs" "ss": https://tracingd.smartthingstlm.com/traces/550db9e6f6f26fd3

edwardsoo-ss commented 7 years ago

example of how traces look if we treat kafka as just a transport (i.e. not naming any endpoint/service as "kafka"): https://tracingd.smartthingstlm.com/traces/c004e3574a476446

looks like the sql query time is still a bit wonky sometimes, even with "cs" and "sr": https://tracingd.smartthingstlm.com/traces/77b257bccebda8e2

I also added some tests.

llinder commented 7 years ago

We might want to consider additional annotations to help searching but those can come later once we learn from real world use.

Great job on this by the way! Really looking forward to having this in production. Feel free to merge when you feel your ready.

edwardsoo-ss commented 7 years ago

thanks! :smile: