Master fail-over is not handled in AutoFollowCoordinator. Old master will keep polling leader cluster for new indices matching the patterns.
Steps to Reproduce
create a leader cluster with some indices
create a follower cluster, configure some autofollowing patterns
cause master election in a follower cluster (old master must not be restarted as it will clear this state)
Expected result
AutoFollowCoordinator is stopped on old master
AutoFollowCoordinator is running on newly elected master
Actual result
AutoFollowCoordinator is running simultaneously on old and newly elected master
Logs (if relevant)
Multiple repeating entries like one below on a follower node that is no longer elected as a master:
Error occured while cleaning followed leader indices
org.elasticsearch.cluster.NotMasterException: no longer master, failing [update_auto_follow_metadata]
Leader cluster would have multiple poll cluster tasks running
cluster:monitor/state mo5JrIs7Q9SXmV2gULkJ3Q:461449263 - transport 1663660577559 07:56:17 13.5s 10.46.88.208 instance-0000000001
cluster:monitor/state mxxRAHr_Tiik33tXfancsw:322295710 - transport 1663660578646 07:56:18 12.4s 10.46.88.207 instance-0000000000
cluster:monitor/state mo5JrIs7Q9SXmV2gULkJ3Q:461449305 mxxRAHr_Tiik33tXfancsw:322295710 transport 1663660578647 07:56:18 12.4s 10.46.88.208 instance-0000000001
cluster:monitor/state mo5JrIs7Q9SXmV2gULkJ3Q:461449306 - transport 1663660578716 07:56:18 12.3s 10.46.88.208 instance-0000000001
cluster:monitor/state mo5JrIs7Q9SXmV2gULkJ3Q:461449313 - transport 1663660579175 07:56:19 11.9s 10.46.88.208 instance-0000000001
In case a new matching index is created in the leader cluster then duplicate PutFollowAction would be issued (from old master and newly elected master). One of them will fail and will record following failure in GET /_ccr/stats in recent_auto_follow_errors:
{
"leader_index": "leader_cluster:my-index-1",
"timestamp": 1662034484876,
"auto_follow_exception": {
"type": "snapshot_restore_exception",
"reason": "[_ccr_leader_cluster:_latest_/_latest_] cannot restore index [my-index-1] because an open index with same name already exists in the cluster. Either close or delete the existing index or restore the index under a different name by providing a rename pattern and replacement name"
}
Elasticsearch Version
all versions with auto follow feature
Problem Description
Master fail-over is not handled in
AutoFollowCoordinator
. Old master will keep polling leader cluster for new indices matching the patterns.Steps to Reproduce
Expected result
Actual result
Logs (if relevant)
Multiple repeating entries like one below on a follower node that is no longer elected as a master:
Leader cluster would have multiple poll cluster tasks running
In case a new matching index is created in the leader cluster then duplicate
PutFollowAction
would be issued (from old master and newly elected master). One of them will fail and will record following failure inGET /_ccr/stats
inrecent_auto_follow_errors
:Workaround
Restart old master