Closed mashhurs closed 3 months ago
There is another potential cause for these 409 conflicts.
When integrations write to a TSDS enabled index, the document id is defined as "a hash of the document’s dimensions and @timestamp".
The document's dimensions are defined in the integration, and when events are sent at a frequency > 1 per millisecond, and the dimensions are insufficient to disambiguate those events, a version conflict will arise. This has already been seen in the integrations for the elastic agent and mysql, and I suspect there are more that can cause the issue
Had created a doc for fingerpint case but not sure if we have a recommended resolution for TSDS case: https://github.com/mashhurs/logstash/blob/docs-409-issue/docs/static/troubleshoot/ts-elasticsearch.asciidoc
Mostly, we get document level http-409 with following cases:
1. Customers are manually setting document _id
(not specific to agent -> LS)
They need to ensure _id
uniqueness. If you see document_id
setting (maybe with a dynamic sprintf
style like "%{[@metadata][id]}"
) in es-output plugin of pipeline configs, most likely this is a cause. This can be double confirmed if you see exact doc ID, {:_id=>supposed-to-be-unique-id, ...
.
2. [elastic-agent -> LS] Logstash is experiencing backpressure
When {ls} faces backpressure, it cannot acknowledge the events back to elastic-agent and as a result agent timeouts, resends the events. However, previously sent events are (fully or partially) indexed into ES. From the agent side, extending timeout in elastic-agent configuration may improve the situation. However, it is highly recommended to
worker_unitilization
and queue_backpressure
: https://www.elastic.co/guide/en/logstash/current/node-stats-api.html#flow-stats. If ES is a slow performant, better ask ES team to help.3. [elastic-agent -> LS] Time series data stream (TSDS) based integration
_id
for TSDS documents is a hash value, calculated by integrations based on https://www.elastic.co/guide/en/elasticsearch/reference/current/tsds.html#time-series-dimension[document's dimentions] and @timestamp
. Depending on dimension granularity and the time frequency ES receives documents, _id
might not be unique.
Based on TSDS guide, it can be identified if index is time_series
or not by taking a look at its index template (index.mode: time_series
). In the index template, we can also see which fields are dimensions ("time_series_dimension": true
). You may need to analyze figuring out dimension + @timestamp
uniqueness.
With this symptom, I haven't seen the _id
is explicitly set, it is mostly nil
(see the error example below). Reproduced fresh example of error:
[2024-08-26T15:11:43,587][WARN ][logstash.outputs.elasticsearch][main][5da2de1a6015f56bca61a3878c1e397ea47ce4da8031f78e07087f62e900c6c6] Failed action {:status=>409, :action=>["create", {:_id=>nil, :_index=>"metrics-system.network-default", :routing=>nil}, {"@timestamp"=>2024-08-26T21:48:48.268Z, "agent"=>{"id"=>"95e1ef4e-6e7a-4cfd-88e8-98df506ad305", "type"=>"metricbeat", "ephemeral_id"=>"9f7fc9b8-a93c-4252-8312-41668e9bb310", "name"=>"mashhurs-host, "version"=>"8.14.1"}, "system"=>{"network"=>{"out"=>{"packets"=>1, "dropped"=>0, "bytes"=>100, "errors"=>0}, "name"=>"utun0", "in"=>{"packets"=>0, "dropped"=>0, "bytes"=>0, "errors"=>0}}}, "service"=>{"type"=>"system"}, "metricset"=>{"name"=>"network", "period"=>1000}, "ecs"=>{"version"=>"8.0.0"}, "data_stream"=>{"namespace"=>"default", "type"=>"metrics", "dataset"=>"system.network"}, "tags"=>["beats_input_raw_event"], "event"=>{"duration"=>30177083, "dataset"=>"system.network", "module"=>"system"}, "@version"=>"1", "elastic_agent"=>{"id"=>"95e1ef4e-6e7a-4cfd-88e8-....", "snapshot"=>false, "version"=>"8.14.1"}}], :response=>{"create"=>{"status"=>409, "error"=>{"type"=>"version_conflict_engine_exception", "reason"=>"[X2S_eqC1kCpFpb1lAAABkZCpuww][LI_yqguJnd4yG2gWc-sq5af-yVYaXIwj-eRIWXnMt35ym4EgBCfmRK6iivMF@2024-08-26T21:48:48.268Z]: version conflict, document already exists (current version [1])", "index_uuid"=>"YJE1kdgbTvuNeS23sbDVQg", "shard"=>"0", "index"=>".ds-metrics-system.network-default-2024.08.26-000003"}}}}
4. Can be ignored
es-output has a silence_errors_in_log
option specifically useful for this situations. Since the doc is already indexed, it is up to your preference to ignore it.
Example config:
output {
elasticsearch {
silence_errors_in_log => ["version_conflict_engine_exception"]
}
}
Tell us about the issue
Description:
There are various situation where ES may reject the event with document already exist. Purpose of this issue to collect such cases and add a short documentation (under the whichever suitable place, in troubleshooting or support doc or
es-output
) as we are getting same question over and over._id
. For example,tenable_sc
integration may havelogs-tenable_sc.vulnerability-{version}
&logs-tenable_sc.plugin-{version}
ingest pipelines which have fingerprint sets the_id
:Example log when Logstash receives a rejected event:
URL:
Example: https://www.elastic.co/guide/en/logstash/current/introduction.html
Anything else?