andreidan commented 3 years ago

When indices in a CCR environment are managed by ILM the follower indices must be converted to regular indices. This is taken care of by the unfollow action and is injected automatically if rollover, shrink or the searchable_snapshot action are configured. However, the leader index must have the index.lifecycle.indexing_complete setting configured to true in order for the unfollow action to succeed. ILM takes care of managing the indexing_complete setting as part of the rollover action. Namely, after the index is rolled over the setting for the managed index is configured to true (as the next generation index is the write index). This subtlety is not explicitly documented and I think we should document it in the ILM docs. Also, the CCR docs should define a full guide on managing indices with ILM.

### Tasks

elasticmachine commented 3 years ago

Pinging @elastic/es-docs (Team:Docs)

elasticmachine commented 3 years ago

Pinging @elastic/es-distributed (Team:Distributed)

elasticmachine commented 3 years ago

Pinging @elastic/es-core-features (Team:Core/Features)

inqueue commented 3 years ago

ILM policies and index templates need to be manually applied to follower clusters. This would be done anyway if configuring the follower for HA, though it would be helpful to explicitly mention it in the docs.

Leaf-Lin commented 2 years ago

I've written a step-by-step tutorial, is it ok for @debadair to also look into converting this into a doc page?

CCR with searchable snapshot

Summary

CCR as it is designed today does not replicate searchable snapshot indices. Fortunately, the cold/frozen indices are expected to be read-only, so the following procedure outlines what is the solution to have data ageing through ILM policies and still be “replicated” across clusters.

The key points are:

During the hot and warm phase, CCR will replicate the leader index as follower index from leader cluster to follower cluster.
Before moving to the cold or frozen phase on the follower cluster, CCR will check leader index has completed indexing Leader index must have index.lifecycle.indexing_complete set to true (can be activated by rollover action)
On the follower cluster, ILM cold/frozen phase should contain an additional unfollow action.
Once the follower cluster verified leader index has completed indexing, it will unfollow the leader, then it will convert the index into a searchable snapshot and move into the cold/frozen phase.
Once the leader cluster verified follower is no longer following, it will convert the index into a searchable snapshot index while moving into the cold/frozen phase.

The end results are:

The leader cluster will have searchable snapshots stored on the leader repository.
The follower cluster will have searchable snapshots on the follower repository.
In the cold/frozen phase, data will no longer be replicated between leader and follower. (but these are expected to be read-only, so this should not be an issue)

Step-by-step Tutorial to set up CCR with Searchable Snapshots (Hot + Frozen Tier)

Step 1. Prerequisites

Ensure both of your clusters have the required data tiers and snapshot repository set up.

Data streams with rename follower patterns, as of writing (8.0.1), does not support CCR ^1. The following example was written with timeseries data without data streams using rollover alias based on https://www.elastic.co/guide/en/elasticsearch/reference/current/getting-started-index-lifecycle-management.html#manage-time-series-data-without-data-streams

Step 2. Setup ILM policy with rollover and searchable snapshot

On the leader cluster, create an ILM policy containing rollover in the hot phase and searchable snapshot in the frozen phase.

The following policy means 1 min after index creation, the index (with the -000001 appendix) will be rollover into a new index with incremental appendix (ie -000002). They will both remain on the hot tier until after 5 mins from index creation time, it will move data into snapshot repository, and mount a partial index partial-<my-index>-000001 on the frozen node.

### On the leader cluster
PUT _ilm/policy/timeseries_policy
{
  "policy": {
    "phases": {
      "hot": {                                
        "actions": {
          "rollover": {
            "max_age": "1m"
          }
        }
      },
      "frozen": {
        "min_age": "5m",         
        "actions": {
          "searchable_snapshot" : {
            "snapshot_repository" : "my-leader-snapshots-repo"
          }                        
        }
      }
    }
  }
}

1 min and 5 mins age in the above policy are for testing purposes only, realistic policy should have much longer retention.
Optional, for testing purposes only, we have adjusted the default lifecycle poll interval from 10mins to 10seconds to ensure changes are visible in a shorter time span.
```
### Execute on both the leader and follower cluster
PUT _cluster/settings
{
"transient": {
"indices.lifecycle.poll_interval": "10s"
}
}
```
Step 3. Create an index template to apply the lifecycle policy

On the leader cluster, create an index template to apply the lifecycle policy. This is taken from https://www.elastic.co/guide/en/elasticsearch/reference/current/getting-started-index-lifecycle-management.html

### On the leader cluster
PUT _index_template/timeseries_template
{
  "index_patterns": ["timeseries-*"],                 
  "template": {
    "settings": {
      "index.lifecycle.name": "timeseries_policy",      
      "index.lifecycle.rollover_alias": "timeseries"    
    }
  }
}

Step 4. Bootstrap the initial time series index with a write index alias.

To get things started, you need to bootstrap an initial index and designate it as the write index for the rollover alias specified in your index template. The name of this index must match the template’s index pattern and end with a number. On rollover, this value is incremented to generate a name for the new index.

For example, the following request creates an index called timeseries-000001 and makes it the write index for the timeseries alias.

### On the leader cluster
PUT timeseries-000001
{
  "aliases": {
    "timeseries": {
      "is_write_index": true
    }
  }
}

Step 5. Setup ILM policy with rollover and searchable snapshot on the follower cluster

On the follower cluster, we need to set up ILM policy similar (but not identical) to step 2. Note index age and ILM policies are independent on both clusters. The additional action in the follower cluster is the unfollow action in the frozen phase.

### On the follower cluster
PUT _ilm/policy/timeseries_policy
{
  "policy": {
    "phases": {
      "hot": {                                
        "actions": {
          "rollover": {
            "max_age": "1m"
          }
        }
      },
      "frozen": {
        "min_age": "5m",                     
        "actions": {
          "unfollow" : {},
          "searchable_snapshot" : {
            "snapshot_repository" : "my-follower-snapshots-repo"
          }                        
        }
      }
    }
  }
}

Step 6. Create an index template to apply the lifecycle policy on the follower cluster

On the follower cluster, we need to set up an index template similar to step 3.

### On the follower cluster
PUT _index_template/timeseries_template
{
  "index_patterns": ["timeseries"],                   
  "template": {
    "settings": {
      "index.lifecycle.name": "timeseries_policy"     
    }
  }
}

Step 7. Setup remote cluster

On the follower cluster, set up a remote cluster pointing to the leader cluster. More details can be found in https://www.elastic.co/guide/en/elasticsearch/reference/current/ccr-getting-started-tutorial.html

### On the follower cluster
PUT _cluster/settings
{
  "persistent": {
    "cluster" : {
      "remote" : {
        "my_leader" : {  
          "seeds" : [
            "127.0.0.1:9300" 
          ]
        }
      }
    }
  }
}

Step 8. Setup cross cluster replication

Configure the following settings to replicate timeseries data from the leader cluster to the follower cluster. It’s optional to have the -copy appendix in the follow_index_pattern.

### On the follower cluster
PUT /_ccr/auto_follow/timeseries_pattern
{
  "remote_cluster" : "my_leader",
  "leader_index_patterns" :
  [
    "timeseries*"
  ],
  "follow_index_pattern" : "{{leader_index}}-copy" 
}

Step 9. Ingest data

Now we are ready to ingest data on the leader cluster. This can be done via Beats or Logstash or any other clients you have. Make sure to point to the alias timeseries for the data ingestion.

On the leader cluster

POST timeseries/_doc
{
  "message": "logged the request",
  "@timestamp": "1591890611"
}

Verify that data appears in the timeseries-000001 index on the leader cluster.
- GET timeseries/_search
Verify that data appears in the timeseries-000001-copy index on the follower cluster.
- GET timeseries/_search

...Continue putting data into timeseries…

After 1 min, data should appear in a new index timeseries-000002 on the leader cluster.
- GET timeseries/_search → Observe the name of the _index has now changed
After 1 min, data should appear in a new index timeseries-000002-copy on the follower cluster.
- GET timeseries/_search → Observe the name of the _index has now changed

...Continue putting data into timeseries…

After 5 mins, the timeseries-000001 index will become partial-timeseries-000001 (a searchable snapshot index) on the leader cluster.
- Use GET _cat/shards to observe this index now lives on the frozen tier
After 5 mins, the timeseries-000001-copy index will become partial-timeseries-000001-copy (a searchable snapshot index) on the follower cluster.
- Use GET _cat/shards to observe this index now lives on the frozen tier
Verify the number of docs and content are still identical in both indices.
- Use GET partial-timeseries-000001/_count on the leader cluster
- Use GET partial-timeseries-000001-copy/_count on the follower cluster

llermaly commented 1 year ago

@andreidan @Leaf-Lin

The unfollow is injected on the leader side, but if you want to use the delete phase you still have to add this setting to the follower.

https://support.elastic.dev/knowledge/view/d1d8aae7

Otherwise you will see this on the leader side:

CleanShot 2023-05-19 at 11 29 07

The follower can do ILM deletes without problems, but if you deleted the follower indices then this auto follow error should go away I think.

elasticsearchmachine commented 8 months ago

Pinging @elastic/es-data-management (Team:Data Management)

elastic / elasticsearch

[DOCS] Document managing indices with ILM in the context of CCR #67668

CCR with searchable snapshot

Summary

Step-by-step Tutorial to set up CCR with Searchable Snapshots (Hot + Frozen Tier)

Step 1. Prerequisites

Step 2. Setup ILM policy with rollover and searchable snapshot

Step 3. Create an index template to apply the lifecycle policy

Step 4. Bootstrap the initial time series index with a write index alias.

Step 5. Setup ILM policy with rollover and searchable snapshot on the follower cluster

Step 6. Create an index template to apply the lifecycle policy on the follower cluster

Step 7. Setup remote cluster

Step 8. Setup cross cluster replication

Step 9. Ingest data

On the leader cluster