DataDog / opentelemetry-mapping-go

Go modules that implement OpenTelemetry-to-Datadog mapping for all telemetry signals
Apache License 2.0
14 stars 6 forks source link

Spikes after dd-agent restarts #100

Closed piotr1212 closed 11 months ago

piotr1212 commented 1 year ago

We are experiencing spikes in graphs after we restart a Datadog agent. We suspect it has to do with the following https://github.com/DataDog/opentelemetry-mapping-go/blob/main/pkg/otlp/metrics/metrics_translator.go#L181-L183

When the agent is restarted it can't calculate the delta, I'm not sure if it should report 0 or just not report at all but reporting the actual count seems wrong.

piotr1212 commented 1 year ago

I've just upgraded to the 7.45 agent because of this fix https://github.com/DataDog/datadog-agent/pull/15363 but that did not solve our issue

piotr1212 commented 1 year ago

Patching out the else part of the statement on 7.43 does solve the issue for us.

piotr1212 commented 1 year ago

Created support ticket #1237572

songy23 commented 1 year ago

Chatted with @mx-psi this issue seems pretty similar to https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/22810 (also in CNCF opentelemetry-collector thread https://cloud-native.slack.com/archives/C01N6P7KR6W/p1685109323323809). The datadog translator code https://github.com/DataDog/opentelemetry-mapping-go/blob/main/pkg/otlp/metrics/metrics_translator.go#L181-L183 is consistent with the behavior in collector cumulativetodeltaprocessor: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/0ae07ba41d8e19bf684ed3b793060299c4a2083d/processor/cumulativetodeltaprocessor/internal/tracking/tracker.go#L129

Alternatively we can add a config for datadog exporter and agent that specifies whether to keep or drop (or apply the current logic) the first cumulative metric, but we want to investigate with the community first.

piotr1212 commented 1 year ago

Hey, amazing that the feature is added. Do you have any idea when the feature will be added to the agent? I assume it needs a config option now.

mx-psi commented 1 year ago

:wave: We don't have a fixed date but this will be available on Agent v7.48.0. In the meantime, my suggestion would be to start producing delta metrics at the source if possible; we have a guide for this here: https://docs.datadoghq.com/opentelemetry/guide/otlp_delta_temporality/

If you open a support case we can also discuss providing a beta before v7.48.0 if delta aggregation temporality does not work for you.

piotr1212 commented 11 months ago

This is implemented, closing issue