census-instrumentation / opencensus-specs

Apache License 2.0
188 stars 50 forks source link

Improve the not ideal span names #107

Open rakyll opened 6 years ago

rakyll commented 6 years ago

According to the current HTTP tracing specs, spans are named after path or route.

This ends up us creating a lot of empty span names, e.g. if we make an HTTP request to the root of a domain.

In the following case, I have a handler that makes three outgoing calls:

The tracing backend visualization doesn't seem to be very useful:

a

/cc @adriancole @bogdandrutu @ramonza

semistrict commented 6 years ago

I think we should add the value of the Host header. Even if this ends up being the hostname of the serving instance (and so not very useful) it is better than nothing. I would expect that in almost all load balancing scenarios, the Host header would be meaningful and low cardinality. (Not sure how bad high cardinality really is anyway - as @rakyll also mentioned the span name is mostly for display purposes).

codefromthecrypt commented 6 years ago

in zipkin route based name has method+space prefix. when there is no route you still get method. when there is a root route you still get "get /" which is ok. people also can overwrite span names. I wouldn't use hostname as that could be boundless cardinality in some sites that have ephemeral processes.

tsloughter commented 6 years ago

+1 to just method and path. I thought it was "Method:Path", which is what I've been using, so get:/, but space is fine too.

codefromthecrypt commented 6 years ago

ps we still prefer route as path can have variables. can of course special case safe bets like root path.

learning cardinality post factum is something fancy someone could do as well ;)

semistrict commented 6 years ago

I am confused why high cardinality is such a big concern. Why is low cardinality important? Is this due to a limitation/feature of some backends?

I am also not suggesting we use the hostname. I am suggesting we use the value of the Host header in the inbound and outbound requests. From my experience, the Host header is often used as a routing key ("virtual hosting") and so usually is preserved by load balancers, and so meaningful. Has this not been the case in your experience?

I don't think adding the method is enough. In a lot of cases, the method is not used semantically and will always be POST or always GET. The Host header is what tells you which service you're talking to.

Whatever we decide here, I do think it's important for the user to be able to override the choice somehow. We probably won't make the right decision for all cases and we don't want users to abandon the instrumentation because it isn't configurable enough. I think this discussion is evidence enough that this is a useful dimension in which the instrumentation should be configurable.

yurishkuro commented 6 years ago

Why is low cardinality important? Is this due to a limitation/feature of some backends?

Span names are often used in aggregations, or as labels for RPC metrics.

semistrict commented 6 years ago

That sounds useful. Is it possible to configure those aggregations to not use the Span name? As mentioned before, even with just the path we are already talking unbounded cardinality in many cases. If I were to guess, I would say that in most apps URL path has a higher cardinality than Host header (REST APIs) - so this is going to be a problem even if we just leave it as-is.

yurishkuro commented 6 years ago

Is it possible to configure those aggregations to not use the Span name?

Sure, but when running trace analysis, having only service to service relationships in the call graph is a lot less useful than knowing the actual operations. Using the path by default is almost always a poor choice for span name, especially given REST APIs proclivity to do things like GET /api/user/2387462837647823. However, if the service owner knows that URL paths are low-cardinality, then a configuration option to allow path as span name sgtm.

semistrict commented 6 years ago

Oh, I'm not suggesting we remove the path or even the method. I think we should have path, method and host header in the span name by default. Then allow the user to customize it if they need to as well.

codefromthecrypt commented 6 years ago

Ramon's opinion is to intentionally make high cardinality span names and have the backend ignore them (reassemble from attributes I guess)

Is there anyone else in favour of this approach?

semistrict commented 6 years ago

@adriancole that's not my intention at all. I want span names to be more descriptive. I believe that in most cases, adding the Host header will make them more descriptive while also not making them high-cardinality (since the host header should be the name of the service or load balancer, not an individual instance).

I'm only talking about cardinality at all because someone else brought it up as a concern and my point was that we already need to deal with high cardinality (because of the path).

rakyll commented 6 years ago

I think this is a visualization issue and we cannot do much to make the case better if tracing backends don't allow a pretty name to be visualized as the span name in their UI. In the end, there are two widely different concerns: span names making statistical sense and pretty representation of them in a UI.

Maybe we should address the visualization issues. Backend UIs can always utilize the HTTP attributes to make the span visualization richer.

I am feeling ok to allow users to customize span names. And, yea, we should include the method in the span name. This is already settled a while ago but we haven't implemented in Go and we should asap.

semistrict commented 6 years ago

@rakyll what's your position on host header in span name?

yurishkuro commented 6 years ago

My advice is to have default behavior that avoids high cardinality. That means no path or host in the span name. We typically default to just http method if nothing else is available, and rely on the UI to enrich the display based on span tags or on the user to provide better names. The potentially high cardinality generators should only be opt-in.

semistrict commented 6 years ago

So I think it comes down to whether we want the default span name to be guaranteed low cardinality. If this is the case, I am on board with @yurishkuro suggestion to remove path and only have method.

If we don't care that much about high cardinality we should have method, path and host header.

What doesn't make sense to me is the status quo of path and method since that is both high cardinality (due to path) and not very descriptive (due to not including the host).

@bogdandrutu I think you need to make the call here so we can move on to implementing

bogdandrutu commented 6 years ago

First to make sure we are all on the same page, all of these (method, path, host) will be attributes so the information will be available there.

Some other implementations that I know of use a format {service}.{method}. Where method is the HTTP method (GET, POST, etc.) and the service is configured by the user per each http server instance (for example for zpages we will set "opencensus"). The problem that I see with this is that client spans most likely will not have the same span name as the server.

I think I like @yurishkuro proposal, because even in the zpages we prefer low cardinality span names. Maybe something like http.{method}.