elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
1.26k stars 24.86k forks source link

ES doesn't exhaust options for allocation leaving unassigned shards. #12273

Open johntdyer opened 9 years ago

johntdyer commented 9 years ago

Shared 5 will not get assigned after an upgrade from 1.5.0 to 1.6.0.

[root@ls2-es-lb ~]# curl -XGET "http://localhost:9200/_cluster/state/routing_table,routing_nodes/logstash-cdr-2015.05.18" | jq '.'
{
  "allocations": [],
  "routing_nodes": {
    "nodes": {
      "Ts0HJNFvSGy2JVd31VlotQ": [
        {
          "index": "logstash-cdr-2015.05.18",
          "shard": 1,
          "relocating_node": null,
          "node": "Ts0HJNFvSGy2JVd31VlotQ",
          "primary": false,
          "state": "STARTED"
        },
        {
          "index": "logstash-cdr-2015.05.18",
          "shard": 2,
          "relocating_node": null,
          "node": "Ts0HJNFvSGy2JVd31VlotQ",
          "primary": false,
          "state": "STARTED"
        }
      ],
      "6AS8BMZKQkivehCUWANRdQ": [
        {
          "index": "logstash-cdr-2015.05.18",
          "shard": 3,
          "relocating_node": null,
          "node": "6AS8BMZKQkivehCUWANRdQ",
          "primary": true,
          "state": "STARTED"
        },
        {
          "index": "logstash-cdr-2015.05.18",
          "shard": 1,
          "relocating_node": null,
          "node": "6AS8BMZKQkivehCUWANRdQ",
          "primary": true,
          "state": "STARTED"
        }
      ],
      "6fs0j8RWQ2esU7wgvAPcdg": [
        {
          "index": "logstash-cdr-2015.05.18",
          "shard": 4,
          "relocating_node": null,
          "node": "6fs0j8RWQ2esU7wgvAPcdg",
          "primary": false,
          "state": "STARTED"
        },
        {
          "index": "logstash-cdr-2015.05.18",
          "shard": 2,
          "relocating_node": null,
          "node": "6fs0j8RWQ2esU7wgvAPcdg",
          "primary": true,
          "state": "STARTED"
        }
      ],
      "srLX4NZDTIaHq9qBVsxcZw": [
        {
          "index": "logstash-cdr-2015.05.18",
          "shard": 0,
          "relocating_node": null,
          "node": "srLX4NZDTIaHq9qBVsxcZw",
          "primary": true,
          "state": "STARTED"
        },
        {
          "index": "logstash-cdr-2015.05.18",
          "shard": 3,
          "relocating_node": null,
          "node": "srLX4NZDTIaHq9qBVsxcZw",
          "primary": false,
          "state": "STARTED"
        }
      ],
      "DnCwjImuRFOsranelYuOaw": [
        {
          "index": "logstash-cdr-2015.05.18",
          "shard": 5,
          "relocating_node": null,
          "node": "DnCwjImuRFOsranelYuOaw",
          "primary": true,
          "state": "STARTED"
        }
      ],
      "3ZOu2V5xSX-BxL2Osd5l7A": [
        {
          "index": "logstash-cdr-2015.05.18",
          "shard": 4,
          "relocating_node": null,
          "node": "3ZOu2V5xSX-BxL2Osd5l7A",
          "primary": true,
          "state": "STARTED"
        },
        {
          "index": "logstash-cdr-2015.05.18",
          "shard": 0,
          "relocating_node": null,
          "node": "3ZOu2V5xSX-BxL2Osd5l7A",
          "primary": false,
          "state": "STARTED"
        }
      ]
    },
    "unassigned": [
      {
        "index": "logstash-cdr-2015.05.18",
        "shard": 5,
        "relocating_node": null,
        "node": null,
        "primary": false,
        "state": "UNASSIGNED"
      }
    ]
  },
  "routing_table": {
    "indices": {
      "logstash-cdr-2015.05.18": {
        "shards": {
          "2": [
            {
              "index": "logstash-cdr-2015.05.18",
              "shard": 2,
              "relocating_node": null,
              "node": "6fs0j8RWQ2esU7wgvAPcdg",
              "primary": true,
              "state": "STARTED"
            },
            {
              "index": "logstash-cdr-2015.05.18",
              "shard": 2,
              "relocating_node": null,
              "node": "Ts0HJNFvSGy2JVd31VlotQ",
              "primary": false,
              "state": "STARTED"
            }
          ],
          "5": [
            {
              "index": "logstash-cdr-2015.05.18",
              "shard": 5,
              "relocating_node": null,
              "node": "DnCwjImuRFOsranelYuOaw",
              "primary": true,
              "state": "STARTED"
            },
            {
              "index": "logstash-cdr-2015.05.18",
              "shard": 5,
              "relocating_node": null,
              "node": null,
              "primary": false,
              "state": "UNASSIGNED"
            }
          ],
          "1": [
            {
              "index": "logstash-cdr-2015.05.18",
              "shard": 1,
              "relocating_node": null,
              "node": "6AS8BMZKQkivehCUWANRdQ",
              "primary": true,
              "state": "STARTED"
            },
            {
              "index": "logstash-cdr-2015.05.18",
              "shard": 1,
              "relocating_node": null,
              "node": "Ts0HJNFvSGy2JVd31VlotQ",
              "primary": false,
              "state": "STARTED"
            }
          ],
          "3": [
            {
              "index": "logstash-cdr-2015.05.18",
              "shard": 3,
              "relocating_node": null,
              "node": "srLX4NZDTIaHq9qBVsxcZw",
              "primary": false,
              "state": "STARTED"
            },
            {
              "index": "logstash-cdr-2015.05.18",
              "shard": 3,
              "relocating_node": null,
              "node": "6AS8BMZKQkivehCUWANRdQ",
              "primary": true,
              "state": "STARTED"
            }
          ],
          "0": [
            {
              "index": "logstash-cdr-2015.05.18",
              "shard": 0,
              "relocating_node": null,
              "node": "3ZOu2V5xSX-BxL2Osd5l7A",
              "primary": false,
              "state": "STARTED"
            },
            {
              "index": "logstash-cdr-2015.05.18",
              "shard": 0,
              "relocating_node": null,
              "node": "srLX4NZDTIaHq9qBVsxcZw",
              "primary": true,
              "state": "STARTED"
            }
          ],
          "4": [
            {
              "index": "logstash-cdr-2015.05.18",
              "shard": 4,
              "relocating_node": null,
              "node": "3ZOu2V5xSX-BxL2Osd5l7A",
              "primary": true,
              "state": "STARTED"
            },
            {
              "index": "logstash-cdr-2015.05.18",
              "shard": 4,
              "relocating_node": null,
              "node": "6fs0j8RWQ2esU7wgvAPcdg",
              "primary": false,
              "state": "STARTED"
            }
          ]
        }
      }
    }
  },
  "cluster_name": "tropo-es"
}

I tried to force a re-route w/ the following script but it didnt work

for h in 3ZOu2V5xSX-BxL2Osd5l7A srLX4NZDTIaHq9qBVsxcZw 6fs0j8RWQ2esU7wgvAPcdg 6AS8BMZKQkivehCUWANRdQ DnCwjImuRFOsranelYuOaw Ts0HJNFvSGy2JVd31VlotQ; do
  curl -sw "%{http_code}" -XPOST -d '{ "commands" : [ { "allocate" : { "shard": 5, "index": "logstash-cdr-2015.05.18", "node" : "'"$h"'"  } } ] }'   'http://ls2-es-lb.int.tropo.com:9200/_cluster/reroute?pretty'  | jq '.'

# jdyer at JOHNDYE-M-F9G6 in ~/Projects/logstash-input-stomp on git:master o [13:37:32]
$ for h in 3ZOu2V5xSX-BxL2Osd5l7A srLX4NZDTIaHq9qBVsxcZw 6fs0j8RWQ2esU7wgvAPcdg 6AS8BMZKQkivehCUWANRdQ DnCwjImuRFOsranelYuOaw Ts0HJNFvSGy2JVd31VlotQ; do
for>   curl -sw "%{http_code}" -XPOST -d '{ "commands" : [ { "allocate" : { "shard": 5, "index": "logstash-cdr-2015.05.18", "node" : "'"$h"'"  } } ] }'   'http://ls2-es-lb.int.tropo.com:9200/_cluster/reroute?pretty'  | jq '.'
for> done
{
  "error": "ElasticsearchIllegalArgumentException[[allocate] allocation of [logstash-cdr-2015.05.18][5] on node [ls2-es1.int.tropo.com][3ZOu2V5xSX-BxL2Osd5l7A][ls2-es1][inet[/10.1.0.103:9300]]{master=false} is not allowed, reason: [YES(shard is not allocated to same node or host)][YES(node passes include/exclude/require filters)][YES(primary is already active)][YES(below shard recovery limit of [2])][YES(allocation disabling is ignored)][YES(allocation disabling is ignored)][YES(no allocation awareness enabled)][NO(too many shards for this index on node [2], limit: [2])][YES(target node version [1.6.0] is same or newer than source node version [1.6.0])][YES(enough disk for shard on node, free: [468.2gb])][YES(shard not primary or relocation disabled)]]",
  "status": 400
}
400
{
  "error": "ElasticsearchIllegalArgumentException[[allocate] allocation of [logstash-cdr-2015.05.18][5] on node [ls2-es2.int.tropo.com][srLX4NZDTIaHq9qBVsxcZw][ls2-es2][inet[/10.1.0.102:9300]]{master=false} is not allowed, reason: [YES(shard is not allocated to same node or host)][YES(node passes include/exclude/require filters)][YES(primary is already active)][YES(below shard recovery limit of [2])][YES(allocation disabling is ignored)][YES(allocation disabling is ignored)][YES(no allocation awareness enabled)][NO(too many shards for this index on node [2], limit: [2])][YES(target node version [1.6.0] is same or newer than source node version [1.6.0])][YES(enough disk for shard on node, free: [469.7gb])][YES(shard not primary or relocation disabled)]]",
  "status": 400
}
400
{
  "error": "ElasticsearchIllegalArgumentException[[allocate] allocation of [logstash-cdr-2015.05.18][5] on node [ls2-es3.int.tropo.com][6fs0j8RWQ2esU7wgvAPcdg][ls2-es3][inet[/10.1.0.101:9300]]{master=false} is not allowed, reason: [YES(shard is not allocated to same node or host)][YES(node passes include/exclude/require filters)][YES(primary is already active)][YES(below shard recovery limit of [2])][YES(allocation disabling is ignored)][YES(allocation disabling is ignored)][YES(no allocation awareness enabled)][NO(too many shards for this index on node [2], limit: [2])][YES(target node version [1.6.0] is same or newer than source node version [1.6.0])][YES(enough disk for shard on node, free: [472.2gb])][YES(shard not primary or relocation disabled)]]",
  "status": 400
}
400
{
  "error": "ElasticsearchIllegalArgumentException[[allocate] allocation of [logstash-cdr-2015.05.18][5] on node [ls2-es4.int.tropo.com][6AS8BMZKQkivehCUWANRdQ][ls2-es4][inet[/10.1.0.104:9300]]{master=false} is not allowed, reason: [YES(shard is not allocated to same node or host)][YES(node passes include/exclude/require filters)][YES(primary is already active)][YES(below shard recovery limit of [2])][YES(allocation disabling is ignored)][YES(allocation disabling is ignored)][YES(no allocation awareness enabled)][NO(too many shards for this index on node [2], limit: [2])][YES(target node version [1.6.0] is same or newer than source node version [1.6.0])][YES(enough disk for shard on node, free: [481gb])][YES(shard not primary or relocation disabled)]]",
  "status": 400
}
400
{
  "error": "ElasticsearchIllegalArgumentException[[allocate] allocation of [logstash-cdr-2015.05.18][5] on node [ls2-es5.int.tropo.com][DnCwjImuRFOsranelYuOaw][ls2-es5][inet[/10.1.0.55:9300]]{master=false} is not allowed, reason: [NO(shard cannot be allocated on same node [DnCwjImuRFOsranelYuOaw] it already exists on)][YES(node passes include/exclude/require filters)][YES(primary is already active)][YES(below shard recovery limit of [2])][YES(allocation disabling is ignored)][YES(allocation disabling is ignored)][YES(no allocation awareness enabled)][YES(shard count under limit [2] of total shards per node)][YES(target node version [1.6.0] is same or newer than source node version [1.6.0])][YES(enough disk for shard on node, free: [466.9gb])][YES(shard not primary or relocation disabled)]]",
  "status": 400
}
400
{
  "error": "ElasticsearchIllegalArgumentException[[allocate] allocation of [logstash-cdr-2015.05.18][5] on node [ls2-es6.int.tropo.com][Ts0HJNFvSGy2JVd31VlotQ][ls2-es6.int.tropo.com][inet[/10.1.0.106:9300]]{master=false} is not allowed, reason: [YES(shard is not allocated to same node or host)][YES(node passes include/exclude/require filters)][YES(primary is already active)][YES(below shard recovery limit of [2])][YES(allocation disabling is ignored)][YES(allocation disabling is ignored)][YES(no allocation awareness enabled)][NO(too many shards for this index on node [2], limit: [2])][YES(target node version [1.6.0] is same or newer than source node version [1.6.0])][YES(enough disk for shard on node, free: [483.3gb])][YES(shard not primary or relocation disabled)]]",
  "status": 400
}
400

this is the only unassigned shared since the restart and I am not sure how to get it back to green. Any advice ?

Thanks

dakrone commented 9 years ago

@johntdyer you can see why the shard won't allocate to the node in the output of the reroute command, specifically this line:

NO(too many shards for this index on node [2], limit: [2])

This comes from the index.routing.allocation.total_shards_per_node setting (looks like it's been set to 2).

In the future however, issues like this should be opened on the discussion forums instead of as issues here.

johntdyer commented 9 years ago

Lee,

 Why is this only effecting this shard?   All the other shards reassigned after the rolling restart.  My problem seems limited to just this single shard of this single index.

John

Sent from my iPhone

On Jul 15, 2015, at 1:57 PM, Lee Hinman notifications@github.com wrote:

@johntdyer you can see why the shard won't allocate to the node in the output of the reroute command, specifically this line:

NO(too many shards for this index on node [2], limit: [2]) This comes from the index.routing.allocation.total_shards_per_node setting (looks like it's been set to 2).

In the future however, issues like this should be opened on the discussion forums instead of as issues here.

— Reply to this email directly or view it on GitHub.

dakrone commented 9 years ago

Why is this only effecting this shard? All the other shards reassigned after the rolling restart.

It looks like each of your nodes already has the maximum 2 shards for this index (the setting from above). This shard happened to be last and thus won't be allocated. You need to increase the total_shards_per_node setting, or add another node.

johntdyer commented 9 years ago

@dakrone - I am sorry for my naivety on this but I am still confused.... I can see under routing_table.indices.logstash-cdr-2015.05.18.shards that shard 5 is not assigned

      {
              "index": "logstash-cdr-2015.05.18",
              "shard": 5,
              "relocating_node": null,
              "node": "DnCwjImuRFOsranelYuOaw",
              "primary": true,
              "state": "STARTED"
            },
            {
              "index": "logstash-cdr-2015.05.18",
              "shard": 5,
              "relocating_node": null,
              "node": null,
              "primary": false,
              "state": "UNASSIGNED"
            }
``

and it is still not clear to me why this is only happening with this one index, and only happened after an upgrade from 1.5.0 to 1.6.0... 
dakrone commented 9 years ago

Digging a little deeper here:

Explanation

You have six nodes:

Each of these nodes has two shards on it, except for one:

Node shard 1 shard 2
Ts0HJNFvSGy2JVd31VlotQ `logstash-cdr-2015.05.18[1][r]` `logstash-cdr-2015.05.18[2][r]`
6AS8BMZKQkivehCUWANRdQ `logstash-cdr-2015.05.18[3][p]` `logstash-cdr-2015.05.18[1][p]`
6fs0j8RWQ2esU7wgvAPcdg `logstash-cdr-2015.05.18[4][r]` `logstash-cdr-2015.05.18[2][p]`
srLX4NZDTIaHq9qBVsxcZw `logstash-cdr-2015.05.18[0][p]` `logstash-cdr-2015.05.18[3][r]`
DnCwjImuRFOsranelYuOaw `logstash-cdr-2015.05.18[5][p]`  
3ZOu2V5xSX-BxL2Osd5l7A `logstash-cdr-2015.05.18[4][p]` `logstash-cdr-2015.05.18[0][r]`

The unassigned shard is logstash-cdr-2015.05.18[5][r]

Usually, this should would be assigned to DnCwjImuRFOsranelYuOaw, however, you can see in the output of the reroute why it is not:

{
  "error": "ElasticsearchIllegalArgumentException[[allocate] allocation of [logstash-cdr-2015.05.18][5] on node [ls2-es5.int.tropo.com][DnCwjImuRFOsranelYuOaw][ls2-es5][inet[/10.1.0.55:9300]]{master=false} is not allowed, reason: [NO(shard cannot be allocated on same node [DnCwjImuRFOsranelYuOaw] it already exists on)][YES(node passes include/exclude/require filters)][YES(primary is already active)][YES(below shard recovery limit of [2])][YES(allocation disabling is ignored)][YES(allocation disabling is ignored)][YES(no allocation awareness enabled)][YES(shard count under limit [2] of total shards per node)][YES(target node version [1.6.0] is same or newer than source node version [1.6.0])][YES(enough disk for shard on node, free: [466.9gb])][YES(shard not primary or relocation disabled)]]",
  "status": 400
}

Specifically this line:

NO(shard cannot be allocated on same node [DnCwjImuRFOsranelYuOaw] it already exists on)

This is because the primary for that shard already exists on the "DnCw" node, so ES cannot assign the replica to the node.

Additionally, Elasticsearch will not rebalance the shards on the other nodes until all UNASSIGNED shards are assigned, so they will not move.

_Elasticsearch is stuck in a state waiting for space to allocate the unassigned shard because it cannot assign it to the only node with space._\ So from the perspective of Elasticsearch, the shard cannot be allocated anywhere, which is why it is unassigned.


Workarounds

For a temporary workaround, there are two options:

Elasticsearch should be able to assign this to one of the other nodes if you increment this to 3, then re-balance to equalize the number of shards per node. Once it's been allocated, you can lower it back to 2

If you swap it with another shard, ES will be able to allocate the unassigned shard because it is not the same exact shard being allocated on the same node.

Why?

Why did this happen with the 1.6.0 upgrade? It is a by-product of the nodes being restarted, and bad luck for the allocation of shard here.

Further action

I think we can consider this ticket a bug report for this behavior, as we should try as hard as possible to prevent it!

clintongormley commented 9 years ago

The total_shards_per_node setting is documented (in master: https://www.elastic.co/guide/en/elasticsearch/reference/master/allocation-total-shards.html) to sometimes cause unassigned shards.

It's a hard limit in a process which, for the most part, relies on heuristics and, as such, is a bit of a hack. I'd prefer to remove the setting and instead solve the problem by trying harder to spread out shards from the same index. See #12279 for more on this.

nik9000 commented 9 years ago

That is a dangerous one to remove. It's one of the most important parts of keeping the Wikimedia cluster up and running smoothly. The hard limit it provides is useful because putting two enwiki shards next to each other will bring the node down. On Jul 17, 2015 7:52 AM, "Clinton Gormley" notifications@github.com wrote:

The total_shards_per_node setting is documented (in master: https://www.elastic.co/guide/en/elasticsearch/reference/master/allocation-total-shards.html) to sometimes cause unassigned shards.

It's a hard limit in a process which, for the most part, relies on heuristics and, as such, is a bit of a hack. I'd prefer to remove the setting and instead solve the problem by trying harder to spread out shards from the same index. See #12279 https://github.com/elastic/elasticsearch/issues/12279 for more on this.

— Reply to this email directly or view it on GitHub https://github.com/elastic/elasticsearch/issues/12273#issuecomment-122254883 .

clintongormley commented 9 years ago

@nik9000 i'm only proposing removing it if we support a better option that doesn't suffer from the same issues.

robert-blankenship commented 6 years ago

@clintongormley I was able to repro this using an index (configured to require box_type="hot") with a single shard and a cluster with a single valid node (with box_type="hot"). I used index.routing.allocation.require.total_shards_per_node=1... The shard was basically stuck in the UNASSIGNED state indefinitely and the index was red. (version 5.4.0). I also had a master node (data was disabled on it) and 2 other nodes with (box_type="warm").

TLDR: Removing the index.routing.allocation.require.total_shards_per_node=1 requirement fixed it, even though the configuration should have been valid because my index only had 1 shard.

EDIT PEBCAK https://en.wiktionary.org/wiki/PEBCAK The actual property name is index.routing.allocation.total_shards_per_node

ywelsch commented 6 years ago

@robert-blankenship what did the allocation explain API say when the shard was unassigned?

It looks to me like you used a setting that does not exist as such. The setting is index.routing.allocation.total_shards_per_node and not index.routing.allocation.require.total_shards_per_node. What you specified was a requires clause (see allocation filtering) with a custom attribute total_shards_per_node (coincidentally having the same name as the total_shards_per_node setting) specifying that only nodes that have the custom total_shards_per_node attribute set to 1 should have a shard of this index.

The problem you had looks to me unrelated to the original issue here.

robert-blankenship commented 6 years ago

You're right, thanks @ywelsch !

DaveCTurner commented 6 years ago

Relates https://discuss.elastic.co/t/replica-shards-do-not-get-assigned-not-consistent-even-though-they-can/123968/4

danielkasen commented 6 years ago

Curious if there is any movement on this as I just ran into the issue in 6.3.2

elasticmachine commented 6 years ago

Pinging @elastic/es-distributed

ywelsch commented 6 years ago

Curious if there is any movement on this as I just ran into the issue in 6.3.2

This is a fundamental problem with the current shard balancer and something that cannot be easily addressed in the current implementation. The current implementation uses an incremental approach to balancing that focuses on speed, but can sometimes end up in local minima. Our general advice is to avoid over-constraining the allocation settings. That said, we're considering alternatives to the current balancer, but these are all only at the research stage yet.

ramunasd commented 1 year ago

I wonder if this is still issue after 8 years of being reported and known.

Just tested with recent ES version 8.10.2, completely default settings.

Easily reproducable on 3 nodes cluster. It almost always fails to allocate last shard when using these index settings:

PUT test_index
{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1,
    "routing": {
      "allocation": {
        "total_shards_per_node": 2
      }
    }
  }
}

Screenshot 2023-09-21 at 17 27 20

It's clearly visible that correct allocation is possible - just move shard n1 to es02 and then shard n2 to es01.

elasticsearchmachine commented 7 months ago

Pinging @elastic/es-distributed (Team:Distributed)