The allocate action appears to take place after the searchable snapshot action. This means a setting like total_shards_per_node (e.g. set at index creation time) can cause an index to become stuck in the cold phase and unable to complete its searchable snapshot action if the number of cold nodes can't accommodate total_shards_per_node.
Steps to Reproduce
Set up a 4x node cluster: 3x hot nodes + 1x cold node.
# Set ILM poll interval to 10s for faster testing
PUT _cluster/settings
{
"persistent": {
"indices.lifecycle.poll_interval": "10s"
}
}
# Confirm rollover after ~10s
GET _cat/indices/*test*?v
# Check allocation of our searchable snapshot
GET _cat/shards/restored-test-000001?v
index shard prirep state docs store dataset ip node
restored-test-000001 0 p STARTED 0 227b 227b 10.46.66.98 instance-0000000003
restored-test-000001 1 p UNASSIGNED
restored-test-000001 2 p UNASSIGNED
Elasticsearch Version
8.15.3
Installed Plugins
No response
Java Version
bundled
OS Version
ESS
Problem Description
Related to:
The allocate action appears to take place after the searchable snapshot action. This means a setting like
total_shards_per_node
(e.g. set at index creation time) can cause an index to become stuck in the cold phase and unable to complete its searchable snapshot action if the number of cold nodes can't accommodatetotal_shards_per_node
.Steps to Reproduce
Set up a 4x node cluster: 3x hot nodes + 1x cold node.
We can unstick the index by clearing
total_shards_per_node
We can also add an
allocate
-->total_shards_per_node
in a warm phase as a workaround.Logs (if relevant)
No response