Financial-Times / tapper

Zipkin client for Elixir
69 stars 8 forks source link

Pluggable trace/span id generation for Datadog #20

Open bforchhammer opened 5 years ago

bforchhammer commented 5 years ago

Hi!

Thanks for providing the Tapper library! I have to say that I really like the architecture and how easy it is to understand and dive-in!

I'm currently looking into using tapper for collecting traces and spans to be sent to Datadog APM instead of Zipkin... and for the most part, it looks like it is relatively straightforward, i.e. a custom reporter which formats traces in a datadog-compatible way and then forwards them to the Datadog agent seems to work well enough.

The only problem I've stumbled on so far, is the format of trace and span IDs, because Datadog expects them to be of type int64 and does not accept those 128bit hex IDs which are currently used...

I'm assuming that the 128bit hex format is something that works well with Zipkin, but for datadog I would simply generate two random int64 numbers...

Would you be open to making the "generation of ids" pluggable? (e.g. via a IdGenerator behavior + configurable implementation?). I'd be happy to work on a respective PR if that's something that makes sense for Tapper.

ellispritchard commented 5 years ago

Hey @bforchhammer, thanks for the kind words!

Yes, this is something I have considered recently, when re-factoring the trace and span ids away from integers (there was too much repeated conversion to hex in real-world use).

The easiest way of achieving what you want is to just convert the existing 128-bit and 64-bit hex-encoded ids back to integers in your Datadog reporter implementation, discarding extraneous bits (this should be pretty performant, I had to work very hard to beat the core library code when encoding) e.g.

# where span is %Tapper.Protocol.Span{}
<<int_trace_id :: integer-size(64), _ :: bytes-size(8)>> = Base.decode16!(span.trace_id, case: :lower)
<<int_span_id :: integer-size(64)>> = Base.decode16!(span.span_id, case: :lower)

The main problem with changing the id generators to produce a different output, is that the existing contract for reporters expects the ids to be hex-encoded binaries (see lib/tapper/protocol.ex. Changing this would break the contract, and possibly break people's existing reporters. Otherwise the change in Tapper.TraceId et al is fairly straightforward, but should be made a compile-time option for performance reasons (i.e. generate the code that calls the module, rather than switching at run-time).

If you fancy having a go at a more extensive change allowing pluggable generators, please do, and I'd be happy to review, but my advice is that the quickest way to get what you want would be to convert the ids in your custom reporter, as shown above.

bforchhammer commented 5 years ago

The easiest way of achieving what you want is to just convert the existing 128-bit and 64-bit hex-encoded ids back to integers in your Datadog reporter implementation, discarding extraneous bits

Thanks for the idea; I wasn't sure whether it would be smart to discard the extraneous bits, since those IDs need to be consistent across multiple services (including non-elixir projects), but thinking about it some more I think it should work as long as I make sure to always convert all IDs that leave the system (i.e. including distributed tracing headers)... I'll try this way first before changing the id generators :wink:

ellispritchard commented 5 years ago

NB Tapper supports passing through 64-bit trace ids (which used to be the default for Zipkin, but some users ran out of these pretty quickly, e.g. Twitter!), so for received spans, i.e. spans started externally to Tapper, everything should work with 64-bit trace ids.

Tapper does not currently support generating 64-bit trace ids (as that is sort of legacy), but that is a very easy change: we'd just add a configuration option under the :tapper key to select 64-bit trace id generation rather than 128-bit (basically, use the SpanId.generate code), no other changes would be necessary. You'd then have end-to-end 64 bit ids, and your Datadog reporter still has to do the parse, but doesn't need to worry about different ids being passed to downstream services than Datadog.

ellispritchard commented 5 years ago

Yep, it's super simple, and only breaks one test in 64-bit mode ("the trace id is 128-bits"!). I'll probably just commit this, as it might be useful for other people:

https://github.com/Financial-Times/tapper/tree/support-64-bit-trace-id-generation

use

config :tapper, trace_id_64_bit: true