Handling system indices in CCR for disaster recovery

Leaf-Lin commented 2 years ago

Description

Similar to handling system indices in cross-cluster snapshot/restore, today, handling system indices in CCR for disaster recovery is sub-optimal.

It would be great if we can have feature_states taking care of replicating system indices without corrupting the follower cluster state.

Some of the system indices contain cluster unique id which means when replicated to another cluster, it may cause unexpected conflicts. As of writing, replicating indices like .kibana or .security indices could be beneficial for maintaining the latest version of saved objects or user roles/permissions. But this also requires users to manually delete a local copy before replicating, and things could get complicated during the upgrade.

Related: https://github.com/elastic/elasticsearch/issues/86121

elasticmachine commented 2 years ago

Pinging @elastic/es-core-infra (Team:Core/Infra)

elasticmachine commented 2 years ago

Pinging @elastic/es-distributed (Team:Distributed)

bytebilly commented 2 years ago

Thanks Leaf for opening this issue.

System indices should actually be hidden behind a "feature", and have an opaque handling of data. The fact that under the hood they are "regular-ish" indices should be considered an implementation detail and have no impact on the end user.

If the content of the system index is specific to a cluster, and Elasticsearch has no understanding of this data, I'm wondering if there may exist a generic way to solve the problem or if it would require different caveats for each system index that should provide a specific way to "replicate" itself.

cc @tvernum @rjernst

elastic / elasticsearch

Handling system indices in CCR for disaster recovery #86168

Description