census-instrumentation / opencensus-specs

Apache License 2.0
188 stars 50 forks source link

Enhance OpenCensus’ span model to include Zipkin style endpoints #135

Open basvanbeek opened 6 years ago

basvanbeek commented 6 years ago

Zipkin’s span model has the notion of local and remote endpoints which can hold service name, ipv4, ipv6 and port. There are benefits for OpenCensus to implement a similar special case for this type of information into the span data model. Obviously it would map perfectly with Zipkin exports but can also help other trace processing systems to identify connections between peers or the existence of a transparent proxy in between.

Currently OpenCensus uses a global singleton approach to its tracer. In multi homed services or elegant monoliths where one might want to distinguish between the available transports and components this poses problems if solely having endpoints at the exporters’ end. By having per span local endpoints we can easily annotate the correct service or component entrypoint when the information becomes available at call time.

Having remote endpoints per span allows us to have more details on peers which are not instrumented. For example a downstream database resource or an upstream proxy / client.

SergeyKanzhelev commented 6 years ago

In Application Insights we have a notion of Source and Target which describe remote endpoint for incoming and outgoing calls as well as Cloud Context that describe the local endpoint. They have different semantics and fields though. When possible we send app-id in headers to populate Source and Target so app-id filed would be nice. We also doesn't use ip for local endpoint.

So perhaps we can document attributes prefix and possible values in specs for OC instead of separate data structure?

SergeyKanzhelev commented 6 years ago

Similar discussion in Open Tracing: https://github.com/opentracing/specification/pull/119#issuecomment-404337325 (if I understood the intent correctly)

bogdandrutu commented 6 years ago

Can you point me to the data model that Zipkin uses? Should this be treated as a first class citizen or we can use attributes as @SergeyKanzhelev proposed?

basvanbeek commented 6 years ago

Attributes can be used for almost anything. Will it be performant, convenient and provide a strong enough signal to show it has special meaning and actually be considered as important to instrument in middlewares? That's another question.

data model of Zipkin: https://zipkin.io/zipkin-api/#/

SergeyKanzhelev commented 6 years ago

it be performant, convenient

when you refer to performant - do you mean some client side code that will use this information for the purpose of client-side metrics aggregation as a tag? If it is only set and never read - performance and convenience doesn't matter. Do you see any other scenarios where strongly type data model will made sense?

BTW, I'm not opposed to the idea completely. I have troubles describing endpoints as ip addresses, serviceName matches to what we have in Application Insights data model.

special meaning and actually be considered as important to instrument

I think in many cases serviceName will be extracted by code separated from middleware. From instance metadata in Azure and AWS or from some environment variables in K8. So the code will be a plug that works with various middlewares. Do you have different experience?

codefromthecrypt commented 6 years ago

in zipkin, there's an endpoint which includes a service name, which is a label, and network context. https://github.com/openzipkin/zipkin-api/blob/master/zipkin.proto#L164

There have been historical chats about how to represent activity of transparent proxies. That's one reason why the "v1" format used to allow multiple hosts in the same span. Meanwhile, I think developing the model further might imply adding a "proxying-for" metadata in addition to the current service. This could allow aggregation in multiple ways with the same data.

For example, if your service is itself a proxy (like a sidecar such as linkerd), then it could for example take a propagated service name, smack that into the span and allow aggregation and visualization pretending it exists or doesn't exist. If such a proxy receives an uninstrumented request, it could still fabricate a virtual service name.

Not to get too much into it, just that use cases could get past the local vs remote simplification. Meanwhile even having simple service identification would be handy.

SergeyKanzhelev commented 6 years ago

Are you thinking of RemoteServiceId field on Span? Where RemoteServiceId can be either name or uuid or both?

codefromthecrypt commented 6 years ago

@SergeyKanzhelev I suppose what I mean is a name, low cardinality aggregatable service name, so for example, not a generated IP based thing or UUID. Ex. if my service is named playback, but my IPC is handled through a sidecar, I would propagate my servicename "playback" with my request. when the sidecar does my stuff for me, it attaches both its service name (maybe "mesh") and my name "playback". Just an idea, but it has been on the mind a bit.

sjkaris commented 5 years ago

If I understand the problem statement correctly, the problem is that we want to be able to emit spans on behalf of another service, without losing the knowledge that this was done.

Opencensus has the concept of Node and of Resource, with Resource being added as a field on Span by this PR. If some service wants to emit a Span on behalf of another, I believe it should put the other service as the Resource for that span, and leave itself as the Node on the ExportServiceTraceRequest. This paradigm similarly matches with the proxy usecase, in which the proxy would put itself as the Node on the ExportServiceTraceRequest, but the service that the span is for on Resource.

If we follow these conventions, then I believe that there are no changes needed to the current Opencensus Proto model, and this issue should be closed.

/cc @bogdandrutu