ES shard allocation bug

LuPan92 commented 4 months ago

Elasticsearch Version

Version: 7.17.18, Build: default/tar/8682172c2130b9a411b1bd5ff37c9792367de6b0/2024-02-02T12:04:59.691750271Z, JVM: 11.0.20

Installed Plugins

No response

Java Version

11.0.20

OS Version

Linux bsa5295 3.10.0-1160.108.1.el7.x86_64 #1 SMP Thu Jan 25 16:17:31 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Problem Description

When the path.data length of the es data node exceeds 20, all shards of the same index will be allocated to one path. Causes disk io skew when writing.

Steps to Reproduce

My test steps are as follows

elasticsearch.yml

cluster.name: ISOP_1720490318878
http.port: 19399
network.host: bsa5295
node.name: bsa5295
bootstrap.memory_lock: false
bootstrap.system_call_filter: false
node.master: true
node.data: true
path.logs: /home/worker/elasticsearch/logs
path.data: /home/sdf/elasticsearch/data,/home/sdg/elasticsearch/data,/home/sdh/elasticsearch/data,/home/sdi/elasticsearch/data,/home/sdb/elasticsearch/data,/home/sdc/elasticsearch/data,/home/sdd/elasticsearch/data,/home/sde/elasticsearch/data,/home/sdj/elasticsearch/data,/home/sdk/elasticsearch/data,/home/sdf/elasticsearch_1/data,/home/sdg/elasticsearch_1/data,/home/sdh/elasticsearch_1/data,/home/sdi/elasticsearch_1/data,/home/sdb/elasticsearch_1/data,/home/sdc/elasticsearch_1/data,/home/sdd/elasticsearch_1/data,/home/sde/elasticsearch_1/data,/home/sdj/elasticsearch_1/data,/home/sdk/elasticsearch_1/data,/home/sdf/elasticsearch_2/data,/home/sdg/elasticsearch_2/data
transport.tcp.port: 9300
gateway.expected_nodes: 1
action.auto_create_index: .watches,.triggered_watches,.watcher-history-*,.kibana*,.security,.monitoring*
discovery.seed_hosts: [bsa5295]
cluster.initial_master_nodes: [bsa5295]
thread_pool.write.queue_size: 2000
indices.recovery.max_bytes_per_sec: 200mb
cluster.routing.allocation.node_concurrent_recoveries: 10
cluster.max_shards_per_node: 5000
cluster.routing.allocation.same_shard.host: true
cluster.routing.allocation.disk.watermark.low: 90%
cluster.routing.allocation.disk.watermark.high: 95%
cluster.fault_detection.follower_check.timeout: 180s
cluster.fault_detection.follower_check.retry_count: 10
cluster.fault_detection.follower_check.interval: 10s
cluster.publish.timeout: 1800s
indices.fielddata.cache.size: 10%
indices.memory.index_buffer_size: 10%
xpack.ml.enabled: false
cluster.election.duration: 30s
cluster.join.timeout: 360s
node.processors: 80

Create index my_index1

curl -X PUT "bsa5295:19399/my_index1" -H 'Content-Type: application/json' -d'
{
"settings": {
"number_of_shards": 25,
"number_of_replicas": 0
}
}'

View index uuid

[worker@bsa5295 ~]$ curl bsa5295:19399/_cat/indices | grep my_index1
green open  my_index1                                 fI4auV0lRtmxeYN8XrXf8g 25 0         0      0   5.5kb   5.5kb

View the path corresponding to the shard
You can see that all shards are allocated under /home/sdj/elasticsearch
expected behavior:

When the path.data configured on the data node is multi-path, it is expected that all shards of a single index can be distributed nearly evenly to each path.
Almost all shards take the same path, which is not in line with our expectations. Because when writing and querying the index data, only a few disk IO resources can be utilized at the same time.

Logs (if relevant)

No response

elasticsearchmachine commented 4 months ago

Pinging @elastic/es-distributed (Team:Distributed)

mhl-b commented 4 months ago

Does this answer your question?

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/important-settings.html#_multiple_data_paths

If needed, you can specify multiple paths in path.data. Elasticsearch stores the node’s data across all provided paths but keeps each shard’s data on the same path.

Elasticsearch does not balance shards across a node’s data paths. High disk usage in a single path can trigger a high disk usage watermark for the entire node. If triggered, Elasticsearch will not add shards to the node, even if the node’s other paths have available disk space. If you need additional disk space, we recommend you add a new node rather than additional data paths.

LuPan92 commented 4 months ago

Does this answer your question?

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/important-settings.html#_multiple_data_paths

If needed, you can specify multiple paths in path.data. Elasticsearch stores the node’s data across all provided paths but keeps each shard’s data on the same path. Elasticsearch does not balance shards across a node’s data paths. High disk usage in a single path can trigger a high disk usage watermark for the entire node. If triggered, Elasticsearch will not add shards to the node, even if the node’s other paths have available disk space. If you need additional disk space, we recommend you add a new node rather than additional data paths.

I checked the disk usage of each path in path.data. The high disk usage watermark we configured has not yet been reached. The disk usage of each path is as follows: 企业微信截图_fbab5f0c-2483-4b70-bc19-85ba7b50b571

Supplement: When I reduce the length of path.data to less than 20 paths, the problem magically disappears

mhl-b commented 4 months ago

When the path.data length of the es data node exceeds 20, all shards of the same index will be allocated to one path. Causes disk io skew when writing.

Not sure whats the disk io skew in your case, you might need to check your disk performance. About all shards goes to the same path, then it's documented and expected behaviour. See link provided above first paragraph. Following:

If needed, you can specify multiple paths in path.data. Elasticsearch stores the node’s data across all provided paths but keeps each shard’s data on the same path.

LuPan92 commented 4 months ago

expected behavior:

When the path.data configured on the data node is multi-path, it is expected that all shards of a single index can be distributed nearly evenly to each path.
Almost all shards take the same path, which is not in line with our expectations. Because when writing and querying the index data, only a few disk IO resources can be utilized at the same time.

mhl-b commented 4 months ago

Thanks for your interested in Elasticsearch. We are closing this issue as multiple data path feature is deprecated and we are not going to fix this issue.

LuPan92 commented 1 month ago

I have found the answer to this question and wrote a detailed blog. ES 最隐藏的 shard 分配问题

elastic / elasticsearch