Open salvatore-campagna opened 1 year ago
Pinging @elastic/es-analytics-geo (Team:Analytics)
A solution that can work in the short term is introducing additional component that generates the _id for time series documents. In this issue, we mention that another timestamp could be used as an additional component to the _id. This would allow indexing a document with same tsid and timestamp (but different metric fields) in the same data stream
I think the solution here is to make the name of the metrics part of the _tsid. This makes more sense to me rather than relying on the _id as the metric name is part of the identity of a time series. But currently, we don't treat it that way.
Have we tried adding jitter to the timestamp, to avoid conflicts? This should also play well with downsampling; we'll have no extra dimension values while the metric values will fall into the same bucket and get aggregated.
Maybe we're mixing two different things here. This is not the issue about late arrivals and offsets that we've talked about in the context of aggregated trace metrics.
While we can think of workarounds for this issue, we do need to acknowledge that this is a bug that needs fixing. We're not taking the metric names into account in defining the time series when we need to.
More specifically, we're violating the first bullet point in the OpenTelemetry Timeseries Model specification:
In this low-level metrics data model, a Timeseries is defined by an entity consisting of several metadata properties:
- Metric name
- Attributes (dimensions)
- Value type of the point (integer, floating point, etc)
- Unit of measurement
Iiuc, the solution above with the timestamp jitter should address the duplicate detection issue. This is just a workaround, I agree that we need to get the right fix in place here - although I'd rather we don't include more field names in the tsid
as this has performance implications.
Fwiw I'm not sure this is a violation of the model. The metric name is part of the metadata properties, just not part of the key - unless I'm mistaken about the definition of "metadata".
But what is the jitter we want to apply here? A random one? We don't necessarily know if there's a second batch coming.
Right now, the workaround is to manually add the metric names into the _tsid by creating a hash of the metric names via the fingerprint processor. See also https://github.com/elastic/integrations/blob/5261e0c6579e1bba856a129c0392de5749627914/packages/prometheus/data_stream/remote_write/elasticsearch/ingest_pipeline/default.yml.
Right, maybe something as simple as a random within [-50ms, 50ms]. If the batches are sent every 10s or so, this should have a negligible impact when there's a single batch.
Here's another integration where the TSDB migration is blocked due to this issue: https://github.com/elastic/integrations/issues/6568#issuecomment-1741085205
To be more precise I think we are also breaking that contract because we don't take into account the metric type and unit of measurements. I guess this might be ok if some form of naming convention is used when defining the metric name such as heap_memory_bytes_total
versus heap_memory_kilobytes_total
because the name already include the metric unit of measurement. This is also REQUIRED in a way for Elasticsearch because we would not support indexing two fields with the same name and different type.
Anyway if an integration wants to use the same name for two metrics but using different unit of measurement and/or different value type we are back to the same issue...two different metrics with same _tsid. Take for instance the following example (written as if it is an Elasticsearch mapping which normally would not work as said above)
"disk_storage_total": {
"type": long,
"metric_type": "counter",
"meta": {
"unit": "bytes"
}
},
"disk_storage_total": {
"type": integer,
"metric_type": "counter",
"meta": {
"unit": "kilobytes"
}
}
In this case we would not be able to index those metrics. So probably there is no integration doing this but I guess the fact that it is possible means we have find a way to support it.
You're right that we technically violate that, at the moment, too. However, these kinds of conflicts are far less common compared to different documents with the same dimensions.
I've discussed the issue regarding the different units with @axw the other day. He brought up that we could map the unit to a well-known/special dimension when dealing with OTel data. That way we'd be compliant with the OTel spec.
IINM, this is not resolved by https://github.com/elastic/elasticsearch/pull/98023
Pinging @elastic/es-storage-engine (Team:StorageEngine)
There's also a similar issue with OTel metrics. We do group metrics with the same set of dimensions (and the same unit) in a single document at the moment. However, there's no guarantee that all metrics are sent in the same batch, and the batches could be split up further within the OTel collector. For example, via the batchprocessor
.
Another use case where we recently ran into this bug is when we wanted to add a _doc_count
field for histogram metrics. Adding the doc count necessitates that each metric is stored in a separate document (as each histogram may have a different count). However, that would trigger this duplicate detection bug.
This is also affecting support for CloudWatch Metric Streams: https://github.com/elastic/integrations/pull/11239#issuecomment-2372815977
Basically the same as what Felix described above: metrics may be delivered in two separate requests to the backend, with the same timestamp & dimensions, but non-overlapping sets of metrics.
Elasticsearch Version
8.7 and above
Installed Plugins
No response
Java Version
bundled
OS Version
All
Problem Description
This issue has been observed while integrating Prometheus remotes writes into time series database. It happens as a result of batching happening before metrics are written to TSDB. If we have two (consecutive) batches (possibly with other batches in between) writing the second batch might fail due to TSDB duplicate detection in case the first one had the same set of dimensions (names and values)and timestamp. This happens because of the way we generate "the primary key" for documents stored in TSDB.
Each document is identified by a pair
(_tsid, timestamp)
so clients can generate batches that include metrics with the same _tsid and timestamp. Note also that clients might do pre-aggregation of metrics on the timestamp field which might result in more chances of duplications because documents are generated at specific points in time.Example
Batch 1
Batch 2
Here we have two metrics,
gauge
andcounter
whose dimensions (names and values) are the same.The _tsid for the first batch is the same as the _tsid of the second batch, something similar to:
Also the
timestamp
is the same.The result is that duplicate detection kicks in, rejecting the second batch as a duplicate.
Integrations are working-around this issue by adding the metric name as an additional "dummy" dimension field. In that case the gauge and counter metric have a different name, which results in a different _tsid which results in writing two documents with different _tsid (and same timestamp). Adding this additional dimension (metric name), anyway, results in two documents being stored instead of one resulting in tsdb storage overhead.
This is considered a bug since we reject data as a result of incorrect duplicate detection. The two batches include data about two different metrics even if they share the same set of dimensions (_tsid). The contract of metric protocols, such as Prometheus remote write or OTLP metrics is that the metric name is part of the identity of a time series.
Steps to Reproduce
Just try to index two documents with the same dimensions, each with a different set of metrics, for instance one with a counter and one with a gauge.
Logs (if relevant)
No response