elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.9k stars 24.73k forks source link

Missing metric name in tsid results in TSDB duplicate detection dropping data #99123

Open salvatore-campagna opened 1 year ago

salvatore-campagna commented 1 year ago

Elasticsearch Version

8.7 and above

Installed Plugins

No response

Java Version

bundled

OS Version

All

Problem Description

This issue has been observed while integrating Prometheus remotes writes into time series database. It happens as a result of batching happening before metrics are written to TSDB. If we have two (consecutive) batches (possibly with other batches in between) writing the second batch might fail due to TSDB duplicate detection in case the first one had the same set of dimensions (names and values)and timestamp. This happens because of the way we generate "the primary key" for documents stored in TSDB.

Each document is identified by a pair (_tsid, timestamp) so clients can generate batches that include metrics with the same _tsid and timestamp. Note also that clients might do pre-aggregation of metrics on the timestamp field which might result in more chances of duplications because documents are generated at specific points in time.

Example

Batch 1

...
{"@timestamp": "2021-04-28T18:00:00.000Z" , "region": "us-east-2", "host": "foo", "pod": "fb58e236-48af-11ee-be56-0242ac120002", "gauge": 10} 
...

Batch 2

...
{"@timestamp": "2021-04-28T18:00:00.000Z" , "region": "us-east-2", "host": "foo", "pod": "fb58e236-48af-11ee-be56-0242ac120002", "counter": 10} 
...

Here we have two metrics, gauge and counter whose dimensions (names and values) are the same.

The _tsid for the first batch is the same as the _tsid of the second batch, something similar to:

region=us-east2:host=foo:pod=fb58e236-48af-11ee-be56-0242ac120002

Also the timestamp is the same.

The result is that duplicate detection kicks in, rejecting the second batch as a duplicate.

Integrations are working-around this issue by adding the metric name as an additional "dummy" dimension field. In that case the gauge and counter metric have a different name, which results in a different _tsid which results in writing two documents with different _tsid (and same timestamp). Adding this additional dimension (metric name), anyway, results in two documents being stored instead of one resulting in tsdb storage overhead.

This is considered a bug since we reject data as a result of incorrect duplicate detection. The two batches include data about two different metrics even if they share the same set of dimensions (_tsid). The contract of metric protocols, such as Prometheus remote write or OTLP metrics is that the metric name is part of the identity of a time series.

Steps to Reproduce

Just try to index two documents with the same dimensions, each with a different set of metrics, for instance one with a counter and one with a gauge.

Logs (if relevant)

No response

elasticsearchmachine commented 1 year ago

Pinging @elastic/es-analytics-geo (Team:Analytics)

martijnvg commented 1 year ago

A solution that can work in the short term is introducing additional component that generates the _id for time series documents. In this issue, we mention that another timestamp could be used as an additional component to the _id. This would allow indexing a document with same tsid and timestamp (but different metric fields) in the same data stream

felixbarny commented 1 year ago

I think the solution here is to make the name of the metrics part of the _tsid. This makes more sense to me rather than relying on the _id as the metric name is part of the identity of a time series. But currently, we don't treat it that way.

kkrik-es commented 1 year ago

Have we tried adding jitter to the timestamp, to avoid conflicts? This should also play well with downsampling; we'll have no extra dimension values while the metric values will fall into the same bucket and get aggregated.

felixbarny commented 1 year ago

Maybe we're mixing two different things here. This is not the issue about late arrivals and offsets that we've talked about in the context of aggregated trace metrics.

While we can think of workarounds for this issue, we do need to acknowledge that this is a bug that needs fixing. We're not taking the metric names into account in defining the time series when we need to.

More specifically, we're violating the first bullet point in the OpenTelemetry Timeseries Model specification:

In this low-level metrics data model, a Timeseries is defined by an entity consisting of several metadata properties:

  • Metric name
  • Attributes (dimensions)
  • Value type of the point (integer, floating point, etc)
  • Unit of measurement
kkrik-es commented 1 year ago

Iiuc, the solution above with the timestamp jitter should address the duplicate detection issue. This is just a workaround, I agree that we need to get the right fix in place here - although I'd rather we don't include more field names in the tsid as this has performance implications.

Fwiw I'm not sure this is a violation of the model. The metric name is part of the metadata properties, just not part of the key - unless I'm mistaken about the definition of "metadata".

felixbarny commented 1 year ago

But what is the jitter we want to apply here? A random one? We don't necessarily know if there's a second batch coming.

felixbarny commented 1 year ago

Right now, the workaround is to manually add the metric names into the _tsid by creating a hash of the metric names via the fingerprint processor. See also https://github.com/elastic/integrations/blob/5261e0c6579e1bba856a129c0392de5749627914/packages/prometheus/data_stream/remote_write/elasticsearch/ingest_pipeline/default.yml.

kkrik-es commented 1 year ago

Right, maybe something as simple as a random within [-50ms, 50ms]. If the batches are sent every 10s or so, this should have a negligible impact when there's a single batch.

felixbarny commented 1 year ago

Here's another integration where the TSDB migration is blocked due to this issue: https://github.com/elastic/integrations/issues/6568#issuecomment-1741085205

salvatore-campagna commented 1 year ago

To be more precise I think we are also breaking that contract because we don't take into account the metric type and unit of measurements. I guess this might be ok if some form of naming convention is used when defining the metric name such as heap_memory_bytes_total versus heap_memory_kilobytes_total because the name already include the metric unit of measurement. This is also REQUIRED in a way for Elasticsearch because we would not support indexing two fields with the same name and different type.

Anyway if an integration wants to use the same name for two metrics but using different unit of measurement and/or different value type we are back to the same issue...two different metrics with same _tsid. Take for instance the following example (written as if it is an Elasticsearch mapping which normally would not work as said above)

"disk_storage_total": {
   "type": long,
   "metric_type": "counter",
   "meta": {
      "unit": "bytes"
   }
},
"disk_storage_total": {
   "type": integer,
   "metric_type": "counter",
   "meta": {
      "unit": "kilobytes"
   }
}

In this case we would not be able to index those metrics. So probably there is no integration doing this but I guess the fact that it is possible means we have find a way to support it.

felixbarny commented 1 year ago

You're right that we technically violate that, at the moment, too. However, these kinds of conflicts are far less common compared to different documents with the same dimensions.

I've discussed the issue regarding the different units with @axw the other day. He brought up that we could map the unit to a well-known/special dimension when dealing with OTel data. That way we'd be compliant with the OTel spec.

felixbarny commented 8 months ago

IINM, this is not resolved by https://github.com/elastic/elasticsearch/pull/98023

elasticsearchmachine commented 8 months ago

Pinging @elastic/es-storage-engine (Team:StorageEngine)

felixbarny commented 3 weeks ago

There's also a similar issue with OTel metrics. We do group metrics with the same set of dimensions (and the same unit) in a single document at the moment. However, there's no guarantee that all metrics are sent in the same batch, and the batches could be split up further within the OTel collector. For example, via the batchprocessor.

Another use case where we recently ran into this bug is when we wanted to add a _doc_count field for histogram metrics. Adding the doc count necessitates that each metric is stored in a separate document (as each histogram may have a different count). However, that would trigger this duplicate detection bug.

axw commented 3 weeks ago

This is also affecting support for CloudWatch Metric Streams: https://github.com/elastic/integrations/pull/11239#issuecomment-2372815977

Basically the same as what Felix described above: metrics may be delivered in two separate requests to the backend, with the same timestamp & dimensions, but non-overlapping sets of metrics.