[Bug]: internal versioning can not be used for optimistic concurrency control.

BasicUser206 commented 3 months ago

Plugin Version

logstash-filter-elastic_integration (0.1.8)

Logstash Version

8.13.4

Java Version

Temurin-17.0.11+9

Host Info

Linux logstashhost 5.4.17-2136.331.7.el7uek.x86_64 #3 SMP Mon May 6 14:17:55 PDT 2024 x86_64 x86_64 x86_64 GNU/Linux

What happened?

[2024-06-11T19:40:14,005][ERROR][logstash.outputs.elasticsearch][elasticagent][6c0af2785942041b35f5af6a32e78eba98f61ee97a5d1755bb436c594e3] Encountered a retryable error (will retry with exponential backoff) {:code=>400, :url=>"
https://xxx.xxx.xxx.xxx:9200/_bulk?filter_path=errors,items.*.error,items.*.status"
, :content_length=>36643, :body=>"{\"error\":{\"root_cause\":[{\"type\":\"action_request_validation_exception\",\"reason\":\"Validation Failed: 1: internal versioning can not be used for optimistic concurrency control. Please use `if_seq_no` and `if_primary_term` instead;2: internal versioning can not be used for optimistic concurrency control. Please use `if_seq_no` and `if_primary_term` instead;

This error is repeated 125 times.

Current config:

input { 
  elastic_agent {
    port => 9613
    ssl_enabled => true
    ssl_certificate_authorities => ["****crt"]
    ssl_certificate => "****.crt"
    ssl_key => "****key"
    ssl_verify_mode => "none"
  }
}

filter {
elastic_integration {
  username => ***
  password => ***
  ssl_enabled => true
  ssl_verification_mode => none
  hosts => ["https://****:9200", "https://****:9200", "https://***:9200", "https://***:9200"]
}

mutate {
remove_field => [
"[@version]",
"[_version]",
"[_version_type]"
]
}

fingerprint {
  method => ["UUID"]
  target => ["_id"]
}

output {
   elasticsearch {
     template_name => "test7"
     data_stream => false
     manage_template => "false"
     ilm_enabled => true 
     ilm_rollover_alias => "rollover-test7"
     ilm_pattern => "000001"
     ssl => true 
     ssl_certificate_verification => false
     user => ****
     password => ***
     hosts => ["https://****:9200", "https://***:9200", "https://***:9200", "https://****:9200"]
     index => ["test-%{+YYYY.MM.DD}"]
     action => "index"
     document_id => "%{_id}"
}
}

Had an open support ticket (01625720). Was advised this plugin is not supported. Elastic, Logstash, Kibana, ElasticAgent, are all on version 8.13.4. This pipeline has never worked with elastic_integration enabled.

Generally, I'm trying to accomplish getting all the fields / formatting / translations, etc handled. WinLogBeat processed all the logs on the client side. Now with Elastic Agent using FileBeat the processing is part of the ingest pipeline (setting related.user, related.ip, winlog.logon.type, etc). This is part one of my migration from Beats to Elastic Agent.

yaauie commented 2 months ago

High-level analysis:

The elastic_integration captures the metadata from the ingest pipeline's execution, and persists it onto a metadata field [@metadata][_ingest_document] on the event (docs). It also sets the [@metadata][target_ingest_pipeline] field to the value _none.
The downstream Elasticsearch Output uses those metadata fields as default values for its version, version_type, document_id, and index fields, and the [@metadata][target_ingest_pipeline] as a default value for its pipeline field
The transformations in Elastic Integrations generally assume that that the events will end up in a data stream, so when the result is sent using an elasticsearch output that has data_stream => false we can run into issues where it has metadata properties that don't make sense absent the construct of data streams.

It is possible that deleting the [@metadata][_ingest_document][version] and [@metadata][_ingest_document][version_type] prior to sending the event to Elasticsearch will allow the events to get through.

filter {
  mutate {
    remove_field => [ "[@metadata][_ingest_document][version]", "[@metadata][_ingest_document][version_type]" ]
  }
}

But Elastic integrations are meant to be "holistic" packages that contain directives for how Elastic Agent should collect, how Ingest Pipelines should transform, and how Elasticsearch should persist. When we start changing individual components (e.g., disabling data streams in the storage layer), we can create conflicts that are difficult to support.

BasicUser206 commented 2 months ago

Removing those two fields does work, and things are ingesting into the correct index. I haven't inspected everything. However, the related.x fields are there and so is winlog.logon.type. It is pulling down the pipeline, and running the pipeline.

Would I be able to rely on this long term? Ideally as an end user, it would be nice for the elastic_integration to have a data_stream => true/false option to handle this.

yaauie commented 2 months ago

Would I be able to rely on this long term?

I do not know. I personally would not.

The internals of the specific Elastic Integrations that are run for the events that your specific pipeline are handling are internal to that Elastic Integration. The developers of that integration operate under the assumption that the events are going to be stored in the manner that their integration package specifies, so while we can observe that removing those fields allows the events to be ingested into your differently-configured index, we cannot guarantee that will hold true as the integration continues to evolve.

it would be nice for the elastic_integration to have a data_stream => true/false option

Whether or not the events being passed through the elastic_integration filter will end up being sent to an elasticsearch data stream is not a concern of the elastic_integration plugin. This plugin simply performs the transformation that is specified for the integration.

For context, I think the following bears repeating, with emphasis added:

[...] Elastic integrations are meant to be "holistic" packages that contain directives for how Elastic Agent should collect, how Ingest Pipelines should transform, and how Elasticsearch should persist. When we start changing individual components (e.g., disabling data streams in the storage layer), we can create conflicts that are difficult to support.

-- @yaauie in issue comment

BasicUser206 commented 2 months ago

I get what you're saying. However, I would refer to the current documentation.

https://www.elastic.co/guide/en/elasticsearch/reference/current/size-your-shards.html#create-a-sharding-strategy

The best way to prevent oversharding and other shard-related issues is to create a sharding strategy. A sharding strategy helps you determine and maintain the optimal number of shards for your cluster while limiting the size of those shards.

Unfortunately, there is no one-size-fits-all sharding strategy. A strategy that works in one environment may not scale in another. A good sharding strategy must account for your infrastructure, use case, and performance expectations.

I have a sharding strategy. It works, we are very happy with the performance we are getting. You're not allowing me to use my sharding strategy that's been developed over the years.

The current Elastic managed sharding strategy contradicts the docs. Why are CPU, DiskIO, Filesystem, etc metrics being pulled out into different indices under different namespaces? This needlessly inflates shard count and overhead per the docs.

New users might will love it. It will make their switch to ElasticSearch much easier. What about us long term customers?

TSDS seems like it's awesome for metrics. I love the potential space savings, and as I don't currently index metrics in the main stack (monitoring cluster). I would 100% use TSDS for metrics. Per docs, only metrics will see space savings. So, it's not worth it for me to redevelop my sharding strategy.

This is Logstash. Let us advanced users use it how we want to. That's one reason why we use Logstash. If the Elastic Agent isn't going to send me the fully formed log ready for ingestion. Give us a method to do accomplish that, and fit our current strategy.

I do very much appreciate your assistance getting that specific issue solved. I know this isn't your fault so don't take it that way. I've been working on getting this resolved for at least a month. Just to now find out it only supports data streams? It's a bit frustrating. If I can't rely on it, I'm back to where I was originally.

However, this is resolved. So, we can close the bug report.

yaauie commented 2 months ago

I get it. And I will be glad to pass along the feedback and to champion a resolution that works for your use-case.

The trouble is I can't just magically add an option to this plugin that changes the behaviour of things outside of this plugin's control. I do not know the complete set of ramifications for using regular indices with data as-transformed by integrations defined with data streams in-mind, and therefore can't define a feature flag to accommodate those things outside the scope of this plugin. I don't know if you'll end up with duplication in failure edge-cases, or whether your chosen sharding strategy will be better or worse for this particular integration.

What I do know is that the Integration is a Unit, and that I personally would be adverse to the risk of bisecting that Unit in a way that is neither documented nor explicitly defined by those who maintain that Unit.

It is possible that doing so will have no ill effect, and that using the fragments of the Integration that work for you will just continue to work. I simply do not want to make promises that this plugin by definition cannot uphold.

BasicUser206 commented 2 months ago

Fair enough, thanks Yaauie.

elastic / logstash-filter-elastic_integration