elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
1.15k stars 24.84k forks source link

Guard data stream from double referencing an index when replacing a backing index #113090

Open gmarouli opened 1 month ago

gmarouli commented 1 month ago

Elasticsearch Version

8.x

Installed Plugins

No response

Java Version

bundled

OS Version

N/A

Problem Description

Data streams offer an option to replace a backing index replaceBackingIndex.

This method finds the reference of a backing index and replaces it with a new index.

Current behaviour: However, it does not check if this new index is already referenced by this data stream. The other methods that are adding new backing indices to the data stream use the method validateDataStreamAlreadyContainsIndex to avoid double referencing an index.

Expected behaviour A data stream should "guard" itself from double referencing an index. We should either return the same instance of the data stream or throw an error. This needs to be investigated.

Steps to Reproduce

This is not the easiest reproduction path and it might not be the only, but it's one we are aware of:

  1. Start a cluster with data tiers, including frozen. Also create ILM policies, index template and a couple of settings
PUT _index_template/my-template
{
  "template": {
    "settings": {
      "index": {
        "lifecycle": {
          "name": "my-policy"
        },
        "number_of_shards": "1",
        "number_of_replicas": "0"
      }
    }
  },
  "index_patterns": [
    "my-*"
  ],
  "data_stream": {},
  "composed_of": []
}

PUT _ilm/policy/my-policy
{
  "policy": {
    "phases": {
      "hot": {
        "min_age": "0ms",
        "actions": {
          "rollover": {
            "max_docs": "1"
          }
        }
      },
      "frozen": {
        "min_age": "5m",
        "actions": {
          "searchable_snapshot": {
            "snapshot_repository": "found-snapshots"
          }
        }
      }
    }
  }
}
  1. Index some data:
POST my-data/_doc/
{
      "@timestamp": "2024-08-18T11:12:00",
      "message": "this is a very important doc"
}
  1. Wait a bit, so the index will move to the frozen tier
GET _data_stream/my data

{
  "data_streams": [
    {
      "name": "my-data",
      "timestamp_field": {
        "name": "@timestamp"
      },
      "indices": [
        {
          "index_name": "partial-.ds-my-data-2024.09.18-000001",
          "index_uuid": "XXXXXX",
          "prefer_ilm": true,
          "ilm_policy": "my-policy",
          "managed_by": "Index Lifecycle Management"
        },
...
  1. Get the snapshot information to restore the first index
GET my-data?filter_path=*.settings.index.store.snapshot.snapshot_name

{
  "partial-.ds-my-data-2024.09.18-000001": {
    "settings": {
      "index": {
        "store": {
          "snapshot": {
            "snapshot_name": "2024.09.18-.ds-my-data-2024.09.18-000001-my-policy-xxxxxxxxx"
          }
        }
      }
    }
  },
...
  1. Restore index as regular index
POST _snapshot/found-snapshots/2024.09.18-.ds-my-data-2024.09.18-000001-my-policy-xxxxxxxxx/_restore
{
  "indices": ".ds-my-data-2024.09.18-000001",
  "feature_states": [
    "none"
  ],
  "include_aliases": false,
  "ignore_index_settings": "index.lifecycle.name"
}
  1. Remove index metadata, like in the doc here
POST .ds-my-data-2024.09.18-000001/_ilm/remove

At this point we have the index restored, but no ILM policy is applied

GET /.ds-my-data-2024.09.18-000001/_ilm/explain

{
  "indices": {
    ".ds-my-data-2024.09.18-000001": {
      "index": ".ds-my-data-2024.09.18-000001",
      "managed": false
    }
  }
}
  1. Add index back to data stream
POST /_data_stream/_modify
{
  "actions": [
    {
      "add_backing_index": {
        "index": ".ds-my-data-2024.09.18-000001",
        "data_stream": "my-data"
      }
    }
  ]
}

At this point the index is part of the data stream, but ILM cannot manage it due to missing ILM policy

GET _data_stream/my data

{
  "data_streams": [
    {
      "name": "my-data",
      "timestamp_field": {
        "name": "@timestamp"
      },
      "indices": [
       {
          "index_name": ".ds-my-data-2024.09.18-000001",
          "index_uuid": "YYYYYY",
          "prefer_ilm": true,
          "managed_by": "Unmanaged"
        },
        {
          "index_name": "partial-.ds-my-data-2024.09.18-000001",
          "index_uuid": "XXXXXX",
          "prefer_ilm": true,
          "ilm_policy": "my-policy",
          "managed_by": "Index Lifecycle Management"
        },
...
  1. Add an ILM policy to the index (it can be the original or a new one, as long as it has a frozen tier)
PUT .ds-my-data-2024.09.18-000001/_settings
{
 "index.lifecycle.name": "my-policy"
}

At this point, we can see that the index is managed by ILM again

GET _data_stream/my-data

{
  "data_streams": [
    {
      "name": "my-data",
      "timestamp_field": {
        "name": "@timestamp"
      },
      "indices": [
       {
          "index_name": ".ds-my-data-2024.09.18-000001",
          "index_uuid": "YYYYYY",
           "ilm_policy": "my-policy",
          "managed_by": "Index Lifecycle Management"
        },
        {
          "index_name": "partial-.ds-my-data-2024.09.18-000001",
          "index_uuid": "XXXXXX",
          "prefer_ilm": true,
          "ilm_policy": "my-policy",
          "managed_by": "Index Lifecycle Management"
        },
...

ILM will go through the steps, but when it's time to mount the searchable snapshot it will see that this snapshot is already mounted and it will replace .ds-my-data-2024.09.18-000001 with partial-.ds-my-data-2024.09.18-000001 and create the double reference we mentioned above:

GET _data_stream/my-data

{
  "data_streams": [
    {
      "name": "my-data",
      "timestamp_field": {
        "name": "@timestamp"
      },
      "indices": [
        {
          "index_name": "partial-.ds-my-data-2024.09.18-000001",
          "index_uuid": "XXXX",
          "prefer_ilm": true,
          "ilm_policy": "my-policy",
          "managed_by": "Index Lifecycle Management"
        },
        {
          "index_name": "partial-.ds-my-data-2024.09.18-000001",
          "index_uuid": "XXXX",
          "prefer_ilm": true,
          "ilm_policy": "my-policy",
          "managed_by": "Index Lifecycle Management"
        },
...

Logs (if relevant)

No response

elasticsearchmachine commented 1 month ago

Pinging @elastic/es-data-management (Team:Data Management)