Comparison to built in rebalance?

phillbaker commented 4 years ago

I believe that ES will rebalance based on disk usage when a node breaches the high watermark threshold (https://www.elastic.co/guide/en/elasticsearch/reference/6.8/disk-allocator.html#disk-allocator).

Controls the high watermark. It defaults to 90%, meaning that Elasticsearch will attempt to relocate shards away from a node whose disk usage is above 90%.

This script seems like it would be useful in cases where that threshold could not be adjusted (e.g. AWS Elasticsearch I believe). Are there other cases where this script takes action differently than the built in re-allocator?

johnseekins commented 4 years ago

From the readme:

Elasticsearch's built in rebalancing tries to balance index count, which can end with certan nodes loaded with very large shards while other nodes hold small ones. This tool swaps large shards with small shards to balance out the disk usage.

What we ran into was single large shards on an instance were causing high watermark problems, so moving that single shard to another, less busy host before ES attempts internal rebalancing made for a more consistent experience. It's essentially additional disk rebalancing on top of the built in tool.

phillbaker commented 4 years ago

Thanks - what types of issues did you see with the internal rebalancing (after hitting the high watermark)?

On Thu, Oct 8, 2020 at 6:32 PM John Seekins notifications@github.com wrote:

From the readme:

Elasticsearch's built in rebalancing tries to balance index count, which can end with certan nodes loaded with very large shards while other nodes hold small ones. This tool swaps large shards with small shards to balance out the disk usage.

What we ran into was single large shards on an instance were causing high watermark problems, so moving that single shard to another, less busy host before ES attempts internal rebalancing made for a more consistent experience. It's essentially additional disk rebalancing on top of the built in tool.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/datto/es-disk-rebalance/issues/1#issuecomment-705857704, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAXCKKYQWDMW3ZHOPYZRKLSJY4YXANCNFSM4SJKZKCA .

mhoydis13 commented 4 years ago

One of our clusters is very large, nearly 2PB total stored on disk. It hosts a variety of indices which have widely variable shard sizes. Some megabytes, some tens of gigabytes. The native elasticsearch balance system is only aiming to level out the number of shards per node - it does not take into account the size of the shards. So, you might have 100 shards on each node, but one node has terrabytes to store and the other node has megabytes to store.

This script searches for opportunities to swap large shards with small shards - aiming for each node to share a equal storage burden while also sharing the same shard count burden. It works pretty good! We run it daily.

phillbaker commented 4 years ago

Sorry for the confusion here, understood that the default shard allocator seeks to balance shard count across nodes. My question was around the high watermark threshold which engages the allocator based on disk usage. I think I've found the answer to my question.

In the case that the high watermark is breached on a node, the allocator shifts shards until the node returns to below the watermark. This improves the average disk usage across the cluster, but it's certainly a long way from optimal. With many shards of variable size, this difference can be dramatic as pointed out above.

It depends on the query patterns, but it seems on average a node that has more data is more likely to be queried, so there are likely performance gains from balancing disk usage.
As well, the allocator will get a node to or slightly below the watermark (I didn't read the code), but if the node is left close to the watermark, shards could easily grow and force another reallocation. So once a cluster is balanced on disk usage, it may reduce re-balancing frequency.

These advantages need to be balanced against the work/pressure rebalancing may put on a cluster - the included options for shard and node percentage limit the reallocations, however, there may be other safeguards that should be considered.

Thanks for the work on this script and open sourcing it.

datto / es-disk-rebalance

Comparison to built in rebalance? #1