Open BenB196 opened 2 weeks ago
Looking at the history of:
Added as part of the original PR #4253
It's not clear why this was added, I suspect that this could be removed, and this issue could be resolved.
Looking at this a bit more closely, I'm not actually sure if this is a "bug" or intended.
Using a more specific example:
"_source": {
"@timestamp": "2024-10-26T12:50:14.571Z",
"@version": "1",
"agent": {
"ephemeral_id": "012b62b9-8748-4257-b148-4e82191cfdd8",
"id": "d3c8a4ad-d4c1-41a6-bc4d-32942f79f522",
"name": "monitoring-fq875",
"type": "metricbeat",
"version": "8.14.3"
},
"data_stream": {
"dataset": "istio.istiod_metrics",
"namespace": "private.default.production",
"type": "metrics"
},
"ecs": {
"version": "8.6.0"
},
"elastic_agent": {
"id": "d3c8a4ad-d4c1-41a6-bc4d-32942f79f522",
"snapshot": false,
"version": "8.14.3"
},
"event": {
"agent_id_status": "auth_metadata_missing",
"dataset": "istio.istiod_metrics",
"duration": 7565596,
"ingested": "2024-10-26T12:50:25Z",
"kind": "metric",
"module": "istio"
},
"host": {
"architecture": "x86_64",
"containerized": true,
"hostname": "monitoring-fq875",
"id": "047f4adf0d834eaa883d97a880781760",
"name": "monitoring-fq875",
"os": {
"codename": "focal",
"family": "debian",
"kernel": "5.4.0-137-generic",
"name": "Ubuntu",
"platform": "ubuntu",
"type": "linux",
"version": "20.04.6 LTS (Focal Fossa)"
}
},
"istio": {
"istiod": {
"labels": {
"instance": "istiod.istio-system:15014",
"job": "prometheus",
"version": "1.23.1"
},
"labels_id": "rhdvqrHt7hTr7GH5lFq2mD31JGA=",
"metrics": {
"pilot_xds": {
"value": 5
}
}
}
},
"metricset": {
"period": 10000
},
"tags": "beats_input_raw_event"
}
[2024-10-26T12:50:25,557][WARN ][logstash.outputs.elasticsearch][elastic-agent][elastic_agent_elasticsearch_output] Failed action {:status=>409, :action=>["create", {:_id=>nil, :_index=>"metrics-istio.istiod_metrics-private.default.production", :routing=>nil}, {"prometheus"=>{"labels"=>{"version"=>"1.23.1", "instance"=>"istiod.istio-system:15014", "job"=>"prometheus"}, "pilot_xds"=>{"value"=>5}}, "event"=>{"module"=>"prometheus", "dataset"=>"istio.istiod_metrics", "duration"=>8044930}, "tags"=>["beats_input_raw_event"], "@timestamp"=>2024-10-26T12:50:14.571Z, "ecs"=>{"version"=>"8.0.0"}, "@version"=>"1", "agent"=>{"ephemeral_id"=>"012b62b9-8748-4257-b148-4e82191cfdd8", "version"=>"8.14.3", "id"=>"d3c8a4ad-d4c1-41a6-bc4d-32942f79f522", "name"=>"monitoring-fq875", "type"=>"metricbeat"}, "metricset"=>{"name"=>"collector", "period"=>10000}, "data_stream"=>{"namespace"=>"private.default.production", "dataset"=>"istio.istiod_metrics", "type"=>"metrics"}, "service"=>{"type"=>"prometheus", "address"=>"http://istiod.istio-system:15014/metrics"}, "host"=>{"hostname"=>"monitoring-fq875", "containerized"=>true, "architecture"=>"x86_64", "id"=>"047f4adf0d834eaa883d97a880781760", "name"=>"monitoring-fq875", "os"=>{"version"=>"20.04.6 LTS (Focal Fossa)", "name"=>"Ubuntu", "codename"=>"focal", "type"=>"linux", "platform"=>"ubuntu", "family"=>"debian", "kernel"=>"5.4.0-137-generic"}}, "elastic_agent"=>{"version"=>"8.14.3", "id"=>"d3c8a4ad-d4c1-41a6-bc4d-32942f79f522", "snapshot"=>false}}], :response=>{"create"=>{"status"=>409, "error"=>{"type"=>"version_conflict_engine_exception", "reason"=>"[yFVUdPrnE3bEh5JiAAABksjglas][LEIgNgvoLGR0pnOGLdGZAmTvTk69cY8zMZwD0_9-b9Zq6XJvE2Y_RiZOv_8u@2024-10-26T12:50:14.571Z]: version conflict, document already exists (current version [1])", "index_uuid"=>"_UXCMxmYQSmzU4EePR_Gkw", "shard"=>"0", "index"=>".ds-metrics-istio.istiod_metrics-private.default.production-2024.10.24-000107"}}}}
These 2 events are almost identical, the only difference is that the event.duration value is different:
"duration": 7565596,
-> "duration"=>8044930
It'd seem really weird to add duration as a TSDS dimension, but that seems to be the only difference between these 2 events. I'm not sure if this should really be a "bug" that gets fixed, or left as is.
Integration Name
Istio [istio]
Dataset Name
istio.istiod_metrics
Integration Version
0.6.0
Agent Version
8.14.3
Agent Output Type
logstash
Elasticsearch Version
8.15.3
OS Version and Architecture
Container
Software/API Version
Istio 1.23.1
Error Message
No response
Event Original
No response
What did you do?
I recently was looking into an issue and noticed that Logstash was reporting a high number of document conflicts with Istio.
What did you see?
Istio Metrics pipeline incorrectly overrides the
istio.istiod.labels.job
value and causes a high number of document conflicts.Here are 2 events that were considered "duplicates", but in reality, no events exist in Elastic that would have matched this if the
job
label wasn't overwritten from its original value.What did you expect to see?
I expect to see these documents properly ingested.
Anything else?
The issue appears that the Istio labels are used to generate a fingerprint:
https://github.com/elastic/integrations/blob/42826c851cd38df1cb229b31f184f68ee89f7a80/packages/istio/data_stream/istiod_metrics/elasticsearch/ingest_pipeline/default.yml#L31-L34
Which is then used as a TSDS dimension:
https://github.com/elastic/integrations/blob/42826c851cd38df1cb229b31f184f68ee89f7a80/packages/istio/data_stream/istiod_metrics/fields/fields.yml#L4-L7
The problem is, is that one of the key "dimension" labels is the
job
label, is always overwritten (before generating the fingerprint):https://github.com/elastic/integrations/blob/42826c851cd38df1cb229b31f184f68ee89f7a80/packages/istio/data_stream/istiod_metrics/elasticsearch/ingest_pipeline/default.yml#L23-L26
It's not clear why this value is overwritten in the first place, but with the change to TSDS and dimensions, it now seems to cause a high number of Istiod Metrics to be dropped.