Closed LuPan92 closed 4 months ago
Pinging @elastic/es-distributed (Team:Distributed)
Does this answer your question?
If needed, you can specify multiple paths in path.data. Elasticsearch stores the node’s data across all provided paths but keeps each shard’s data on the same path.
Elasticsearch does not balance shards across a node’s data paths. High disk usage in a single path can trigger a high disk usage watermark for the entire node. If triggered, Elasticsearch will not add shards to the node, even if the node’s other paths have available disk space. If you need additional disk space, we recommend you add a new node rather than additional data paths.
Does this answer your question?
If needed, you can specify multiple paths in path.data. Elasticsearch stores the node’s data across all provided paths but keeps each shard’s data on the same path. Elasticsearch does not balance shards across a node’s data paths. High disk usage in a single path can trigger a high disk usage watermark for the entire node. If triggered, Elasticsearch will not add shards to the node, even if the node’s other paths have available disk space. If you need additional disk space, we recommend you add a new node rather than additional data paths.
I checked the disk usage of each path in path.data. The high disk usage watermark
we configured has not yet been reached. The disk usage of each path is as follows:
Supplement: When I reduce the length of path.data to less than 20 paths, the problem magically disappears
When the path.data length of the es data node exceeds 20, all shards of the same index will be allocated to one path. Causes disk io skew when writing.
Not sure whats the disk io skew in your case, you might need to check your disk performance. About all shards goes to the same path, then it's documented and expected behaviour. See link provided above first paragraph. Following:
If needed, you can specify multiple paths in path.data. Elasticsearch stores the node’s data across all provided paths but keeps each shard’s data on the same path.
expected behavior:
When the path.data configured on the data node is multi-path, it is expected that all shards of a single index can be distributed nearly evenly to each path.
Almost all shards take the same path, which is not in line with our expectations. Because when writing and querying the index data, only a few disk IO resources can be utilized at the same time.
Thanks for your interested in Elasticsearch. We are closing this issue as multiple data path feature is deprecated and we are not going to fix this issue.
I have found the answer to this question and wrote a detailed blog. ES 最隐藏的 shard 分配问题
Elasticsearch Version
Version: 7.17.18, Build: default/tar/8682172c2130b9a411b1bd5ff37c9792367de6b0/2024-02-02T12:04:59.691750271Z, JVM: 11.0.20
Installed Plugins
No response
Java Version
11.0.20
OS Version
Linux bsa5295 3.10.0-1160.108.1.el7.x86_64 #1 SMP Thu Jan 25 16:17:31 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Problem Description
When the path.data length of the es data node exceeds 20, all shards of the same index will be allocated to one path. Causes disk io skew when writing.
Steps to Reproduce
My test steps are as follows
elasticsearch.yml
Create index my_index1
View index uuid
View the path corresponding to the shard
You can see that all shards are allocated under
/home/sdj/elasticsearch
expected behavior:
Logs (if relevant)
No response