census-instrumentation / opencensus-specs

Apache License 2.0
188 stars 50 forks source link

(re-)make span.kind a thing #54

Open codefromthecrypt opened 6 years ago

codefromthecrypt commented 6 years ago

As far as I know, one of the goals of census is to be used in place of instrumentation like zipkin's. A couple years back, we learned from stackdriver that span.kind is extremely helpful to know. It allows you to lightly map communication semantics, the intent of the library. For example, we know if the caller intended to be a client or a message producer. Or a receiver intended to be a server or a message consumer. Many libraries are built on this slightly higher information than just the direction of traffic.

Though I can't find anything on github, I think span.kind was intentionally taken off the table. I'd like to have that discussion here as it impacts the viability of 3rd party instrumentation. Ideally folks are on-board with the following stable span kinds, representing remote communication intent. Ex if CLIENT I am acting as a client and sending a remote message.

If curious, here are the definitions from zipkin https://github.com/openzipkin/zipkin-api/blob/master/zipkin2-api.yaml#L320 Here are the ones from opentracing https://github.com/opentracing/specification/blob/master/semantic_conventions.md#modelling-special-circumstances

Even if Census clarifies things differently, it would be helpful. I'd highly recommend not relegating this to an attribute (tag in other tracing systems), because it is a very important part of how you instrument code.

The impact is at least clarifying what it currently murky, and also making this a viable library for what I believe its goal is (routine third party instrumentation)

See https://github.com/census-instrumentation/opencensus-erlang/pull/53/files#r171096697

codefromthecrypt commented 6 years ago

PS while there isn't a lot of instrumentation on the java side, it might be unclear what this buys us. The thing is most libraries know authoritatively what they are, even if they use different names than client+server. Here's an example of dubbo instrumentation in zipkin (brave) where we are able to preserve info the library tells us:

Again, in dubbo provider means server, which is fine.

    RpcContext rpcContext = RpcContext.getContext();
    Kind kind = rpcContext.isProviderSide() ? Kind.SERVER : Kind.CLIENT;
    final Span span;
    if (kind.equals(Kind.CLIENT)) {
      span = tracer.nextSpan();
      injector.inject(span.context(), invocation.getAttachments());
    } else {
      TraceContextOrSamplingFlags extracted = extractor.extract(invocation.getAttachments());
      span = extracted.context() != null
          ? tracer.joinSpan(extracted.context())
          : tracer.nextSpan(extracted);
    }
bogdandrutu commented 6 years ago

I proposed this here https://github.com/census-instrumentation/opencensus-proto/pull/51

bogdandrutu commented 6 years ago

In the initial PR @adriancole mentioned about the need for CONSUMER/PRODUCER kinds. I like to understand where these are used, and he nicely mentioned this: https://github.com/openzipkin/brave/tree/master/instrumentation/kafka-clients

I have few questions here that can help me to understand why there is a need for these kinds:

I do not have that much experience with opensource and I may be naive that we can actually instrument Kafka.

codefromthecrypt commented 6 years ago

I don't want to go too far into the details of Kafka, except certainly what you mention is not possible now ( to instrument end to end )

Main thing is that when a user schedules a message it is buffered into a batch, sent to a broker, possibly received many times. The producer of that message acts very differently than a client, even if somewhere in a very thread disconnected way the client that sends the batch is like a client.

Similarly on the consumer side, there is no thread context shared with the processor (ex you return a message later or maybe never processed).

The semantics are different than the usual RPC semantics at least from the application layer of abstraction even if at some point there usually a network call that is like an RPC holding something that includes the message.

codefromthecrypt commented 6 years ago

besides batching the other thing that will be a problem if trying to model at the lowest abstraction (messaging driver) is fragmentation. For example, rabbitmq will frag a message over amqp if it is "too big". If anything, if we decided to only model things at the lowest layer we will have problems on the way back re-assembling the fragments. This is not an issue in normal span modeling. I think we can create the same complexity if we wanted to model RPC spans only using http/2 frames.

TL;DR; is it is easier to have messaging producers and consumers instrumented at their layer of abstraction and not rely on low level wire protocol to infer this.