elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.79k stars 24.69k forks source link

ILM policy conflict with allocation filters #114540

Open srri opened 3 hours ago

srri commented 3 hours ago

Elasticsearch Version

8.13.3

Installed Plugins

No response

Java Version

bundled

OS Version

Elastic Cloud

Problem Description

We have a cluster (running on Elastic Cloud) with the following configuration:

We keep 10 days of documents in hot, before migrating it to cold, and are primarily using data streams. We have the requirement that while data is in hot, it must remain mutable (mainly updating/deleting documents). However, once it is in cold, it doesn't need to be mutated further. Because of this requirement, we cannot use the force-merge operation on the hot tier, as each rollover would make indices read only while still in hot. Conversely, cold tier does not support the force-merge operation. At the request of Elasticsearch support, it was suggested we could add a warm tier, with an allocation filter of cold, and enable force-merge, such that we could run force-merge on data that is old.

This partially works, but we end up with indices that get stuck in ILM, and require manual intervention.

This is the ILM policy of the data stream (terraform):

resource "elasticstack_elasticsearch_index_lifecycle" "lifecycle-policy-events" {
  name = "lifecycle-policy-events"
  hot {
    min_age = "0ms"
    set_priority {
      priority = 0
    }
    rollover {
      max_primary_shard_size = "50gb"
      max_age                = "1d"
    }
  }

  warm {
    allocate {
      require = jsonencode({
        data = "cold"
      })
      number_of_replicas = 1
    }
    forcemerge {
      max_num_segments = 1
    }
    min_age = "10d"
  }

  cold {
    min_age = "11d"
  }

  frozen {
    min_age = "30d"
    searchable_snapshot {
      snapshot_repository = "found-snapshots"
    }
  }
  delete {
    min_age = "90d"
    delete {}
  }
}

When this executes, backing indices once they are 10 days old, end up in the warm phase, which then become stuck (without error), on this step:

Waiting for [6] shards to be allocated to nodes matching the given filters

Upon checking the settings of the stuck index, we see:

  "routing": {
          "allocation": {
            "include": {
              "_tier_preference": "data_hot"
            },
            "require": {
              "data": "cold"
            },
            "total_shards_per_node": "-1"
          }
        },

And upon checking the allocation API, I can see for the two cold nodes, the following decisions:

      "roles": [
        "data_cold",
        "remote_cluster_client"
      ],
      "node_decision": "no",
      "weight_ranking": 4,
      "deciders": [
        {
          "decider": "awareness",
          "decision": "NO",
          "explanation": "there are [2] copies of this shard and [3] values for attribute [logical_availability_zone] ([zone-0, zone-1, zone-2] from nodes in the cluster and no forced awareness) so there may be at most [1] copies of this shard allocated to nodes with each value, but (including this copy) there would be [2] copies allocated to nodes with [node.attr.logical_availability_zone: zone-0]"
        },
        {
          "decider": "data_tier",
          "decision": "NO",
          "explanation": "index has a preference for tiers [data_hot] and node does not meet the required [data_hot] tier"
        }
      ]
-----
      "roles": [
        "data_cold",
        "remote_cluster_client"
      ],
      "node_decision": "no",
      "weight_ranking": 5,
      "deciders": [
        {
          "decider": "data_tier",
          "decision": "NO",
          "explanation": "index has a preference for tiers [data_hot] and node does not meet the required [data_hot] tier"
        }
      ]

Ultimately, if I make the following API call to the index settings, the index is able to proceed.

PUT /.ds-logs-events-elasticsearchidentificationevent-2024.09.29-001495/_settings
{
  "index": {
    "routing": {
      "allocation": {
        "require": {
          "data": null
        }
      }
    }
  }
}

After that API call, the settings then look like the following (with the data_warm added automatically):

        "routing": {
          "allocation": {
            "include": {
              "_tier_preference": "data_warm,data_hot"
            },
            "total_shards_per_node": "-1"
          }
        },

But without this, it will remain stuck.

Is this a bug? it feels like a conflict since the index is in warm but still retaining the:

              "_tier_preference": "data_hot"

Steps to Reproduce

See above.

Logs (if relevant)

No response

srri commented 3 hours ago

Also, would it make any difference if I allocated to 'data':'hot' instead of 'data':'cold' in the warm phase since the data would still be residing there?