trace: allow access to spans that are not sampled

gbbr commented 5 years ago

In the current implementation and specs, a span is only sent to the exporter when sampled. This is causing problems in some APM systems. For example, at Datadog we compute stats and other events based on traces (such as percentile statistics). Without having access to all traces, it is not possible to make these computations accurately.

This is a specific example, but in my opinion, exporters should have access to everything. Perhaps a secondary argument (the SamplingDecision) or a new field on the exported span could solve this? Possibly the former suggestion would work best to give more importance to this decision.

basvanbeek commented 5 years ago

What if people using DataDog exporter would just set the OpenCensus sampler to always sample? And then if needed have the DataDog exporter decide on what can be discarded and what not?

gbbr commented 5 years ago

I guess you're right, that is a good workaround. It is nevertheless a workaround and will prevent people from using the OpenCensus sampling if they desire. But if this is the best we can do then it's fine by me. I guess you could leave the issue open as a proposal if that's ok?

codefromthecrypt commented 5 years ago

at least in java repo I also recall a setting "always record" or similar which doesnt necessarily affect sampling headers

On Wed, 7 Nov 2018, 19:02 Gabriel Aszalos, notifications@github.com wrote:

I guess you're right, that is a good workaround. It is nevertheless a workaround and will prevent people from using the OpenCensus sampling if they desire. But if this is the best we can do then it's fine by me. I guess you could leave the issue open as a proposal if that's ok?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/census-instrumentation/opencensus-specs/issues/210#issuecomment-436587410, or mute the thread https://github.com/notifications/unsubscribe-auth/AAD611y7N3TLX7UelgidD3RjZYlqNz3Kks5usr3TgaJpZM4YSLZl .

basvanbeek commented 5 years ago

The big question is... do you need additional items like annotations and attributes as well or is just an empty span with only SpanContext sufficient?

As a reference Zipkin has been working on the concept of "firehose" mode which deals with this kind of problem and some others. See: https://cwiki.apache.org/confluence/display/ZIPKIN/Firehose+mode

gbbr commented 5 years ago

The big question is... do you need additional items like annotations and attributes as well or is just an empty span with only SpanContext sufficient?

Yes, we would need the data too. I see what you're hinting at though; looking at the opencensus-go implementation I can see that spans aren't recording data when they aren't sampled, so this is clearly not an option given the current state. It would be a bigger change.

arguile- commented 5 years ago

Using AlwaysSample has a couple issues:

expensive
if multiple exporters are used it can overload the other exporter that is expecting sampled traces

Do those calculated metrics (for Datadog) really require more than just the outer span (without attributes)? The resource name, start, duration, status, etc. are all determined by the outermost span for the service under trace.

If OpenCensus could optionally export all spans, could Span.IsSampled() or Span.IsRecordingEvents() could be used to map the DataDog agent sampling priority to PriorityAutoKeep and PriorityAutoReject?

gbbr commented 5 years ago

Using AlwaysSample has a couple issues:

expensive

if multiple exporters are used it can overload the other exporter that is expecting sampled traces

This is true, but is currently the only way the exporter can become aware of all traces. It will be less expensive once https://github.com/DataDog/opencensus-go-exporter-datadog/pull/27 is merged because that will allow Datadog to do its own sampling as long as OC lets everything through.

I'd say that's a reasonable solution for now, and unless there is interest from other parties too into letting all spans, with data included, reach the exporters even when they aren't sampled, I'd say it's fine to close this issue.

gbbr commented 5 years ago

As a last point and line of thought, it could also be interesting to consider different sampling rules on a per-exporter basis, but I'm not sure how much of a real use case that is.

codefromthecrypt commented 5 years ago

Here's a more detailed brief on "firehose mode" in Brave (zipkin java tracer);

we have a collaborative sampledLocal decision. If anything needs the data, then it is recorded. There are no fine tuning knobs, like only record tags or whatever.. what we did was cheapen the intermediate object holding pending data so that it isn't eagerly parsed. If any "exporter" needs data either for every request or a specific one, then that sampledLocal flag becomes true and recording occurs irrespective of the remote sampling flag.

gbbr commented 5 years ago

Another idea would be to allow different sampling rules on a per exporter basis. That way you could choose exporters which receive all traces programmatically and let them handle them as they please. This change would still require keeping span data though, if any exporter has an all permissive sampling rule.

census-instrumentation / opencensus-specs

trace: allow access to spans that are not sampled #210