DataDog / dd-opentracing-cpp

Datadog Opentracing C++ Client
Apache License 2.0
40 stars 40 forks source link

Is there any way to populate the `X-B3-Sampled` header when using `sample_rate`? #117

Closed chrusty closed 4 years ago

chrusty commented 4 years ago

Hi! I've been trying out some tracing with nginx / datadog / scala services, and have got to the point where traces are working end-to-end (it is already giving some excellent information). The only problem is that with the default configuration for dd-opentracing-cpp 100% of requests are traced (and our APIs handle thousands of requests per second).

The Scala services are set to obey the X-B3-Sampled header (they'll only submit spans if this is set to 1).

I'd assumed that setting sample_rate to some rate (and disabling dd.priority.sampling) would work, but under this configuration the services behind nginx never see this header. Only by setting dd.priority.sampling = true (or omitting it altogether to assume the default) do the upstream services get this header, but at that point it is set on 100% of requests.

Am I going about this all wrong?

Here is the config I'm trying to use:

{
    "service": "nginx",
    "operation_name_override": "nginx.handle",
    "agent_host": "localhost",
    "agent_port": 8126,
    "sample_rate": 0.5,
    "dd.priority.sampling": false,
    "propagation_style_extract": ["B3"],
    "propagation_style_inject": ["B3"]
}
cgilmour commented 4 years ago

Hi @chrusty you're correct, that header doesn't get set when using sample_rate. It's only used when priority sampling is enabled. Priority sampling starts off at 100% and adjusts the rate based on feedback it receives from the datadog agent.

With nginx (and I'm assuming nginx-opentracing extension), you can control globally or per-location if tracing is enabled using the opentracing on directive. However, there's no trivial option for specifying a rate.

There's some planned work for adding "sampling rules" that allow for more control over the sampling rate. I don't have an ETA for that at the moment.

chrusty commented 4 years ago

Hi @cgilmour, thanks for your response. OK I guess that explains why the priority-sampling isn't working for me (because the datadog agents are all set to trace 100% because I don't want them throwing away sampled traces from the upstream services).

How would you feel about a new feature? I'm working on a fork with a couple of other people, something along these lines:

Hopefully we can come up with something.

chrusty commented 4 years ago

@cgilmour my colleague and I managed to get this working today. Here is a branch that adds sampling-rates to the headers and span-context.

https://github.com/depop/dd-opentracing-cpp/pull/1

It probably breaks a bunch of other stuff, just wanted to show you where we're going with it. Depending on how it works out for us we may look at adding a config flag to specifically control this behaviour. Let me know if you'd be interested in this, and we'll raise a proper PR.

cgilmour commented 4 years ago

The way it works with priority sampling enabled is:

The reason this wasn't done for rate sampling was to make sure there was still a way to handle unbound memory growth. The sample_rate is blindly applied to any traces, initiated or propagated. I'd be ok with changing the behavior so the sample_rate doesn't apply to propagated traces that already have a sampling decision. Eg: https://github.com/DataDog/dd-opentracing-cpp/blob/master/src/tracer.cpp#L129 would become

  if (span_context.getPropagatedSamplingPriority() == nullptr && sampler_->discard(span_context)) {

That could work with existing options.