In #215, we reworked the aggregate transform to properly consider/handle multiple in-flight buckets and zero-value counters. While this brought aggregation behavior up to parity in terms of ensuring metrics were flushed at the right interval, it did highlight a particularly glaring issue: when aggregating correctly (in terms of matching the Datadog Agent), ADP emits more data overall than the Datadog Agent.
This is due to the fact that the Datadog Agent is actually storing multiple data points per series/sketch. When the aggregate transform flushes, a given context (metric) might be present of two buckets, which means we'll flush two metrics -- one from each bucket -- and both of those will find their way to the Datadog Metrics destination and be sent off. In constant, the Datadog Agent is actually deduplicating these by merging them into a single metric with multiple data points -- one timestamp/value point from each original metric -- and then send that single metric to the serializer and forwarder.
Since the output payload has support for multiple data points per series/sketches, this means we end up sending more data overall, without a super great way to merge things back together in the Datadog Metrics sink without a lot of extra sorting and temporary storage.
We should explore if there's a simple data model change we can make to Metric to better support this, since this would allow us to get back to parity, in terms of output data volume, with the Datadog Agent... and it doesn't hurt that OTLP has the same data model, so we'd be more aligned overall.
This might also be a chance to better optimize the aggregate transform in the process.
Context
In #215, we reworked the aggregate transform to properly consider/handle multiple in-flight buckets and zero-value counters. While this brought aggregation behavior up to parity in terms of ensuring metrics were flushed at the right interval, it did highlight a particularly glaring issue: when aggregating correctly (in terms of matching the Datadog Agent), ADP emits more data overall than the Datadog Agent.
This is due to the fact that the Datadog Agent is actually storing multiple data points per series/sketch. When the aggregate transform flushes, a given context (metric) might be present of two buckets, which means we'll flush two metrics -- one from each bucket -- and both of those will find their way to the Datadog Metrics destination and be sent off. In constant, the Datadog Agent is actually deduplicating these by merging them into a single metric with multiple data points -- one timestamp/value point from each original metric -- and then send that single metric to the serializer and forwarder.
Since the output payload has support for multiple data points per series/sketches, this means we end up sending more data overall, without a super great way to merge things back together in the Datadog Metrics sink without a lot of extra sorting and temporary storage.
We should explore if there's a simple data model change we can make to
Metric
to better support this, since this would allow us to get back to parity, in terms of output data volume, with the Datadog Agent... and it doesn't hurt that OTLP has the same data model, so we'd be more aligned overall.This might also be a chance to better optimize the aggregate transform in the process.