elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
1.12k stars 24.83k forks source link

Make shard balancing aware of forced awareness attributes #73498

Open dakrone opened 3 years ago

dakrone commented 3 years ago

In order to reduce DTS costs for cross-zone data transfer, we should investigate whether it would be beneficial to make shard balancing aware of forced awareness attributes.

Currently shard balancing only deals with attributes or forced awareness in a yes/no manner (ie, can this shard go to node X? yes or no). Implementing this would mean that BalancedShardsAllocator would have to know about node attributes, and prefer to relocate shards to nodes with a matching attribute value.

See this image:

1B6A17EA-6E5A-4678-BE87-4253303F546C

It would be beneficial if the BalancedShardsAllocator knew to prefer the scenario on the left, where no inter-zone transfers occur, rather than the scenario on the right.

elasticmachine commented 3 years ago

Pinging @elastic/es-distributed (Team:Distributed)

henningandersen commented 3 years ago

I tentatively think this should already be handled here, but would be good to validate that we have a test for it.

DaveCTurner commented 3 years ago

I think that helps only if the number of shards is a multiple of the number of zones, which is not that common. In other cases you can get stuff like this happening: image

Leaf-Lin commented 2 years ago

This appears to be resolved by https://github.com/elastic/elasticsearch/issues/73496, can we close this issue? @dakrone ?

dakrone commented 2 years ago

Relocation via snapshot only solves this in a "now it doesn't cost anything" manner, it doesn't fix the actual shard balancing behavior, so I don't think this can be closed yet.

DaveCTurner commented 2 years ago

My understanding was that this issue was all about the network costs of cross-zone transfers, so I'm not sure what else remains. Could you explain more clearly what needs to be addressed still?

dakrone commented 2 years ago

Could you explain more clearly what needs to be addressed still?

The relocate-via-snapshot case only fixes the cost issue for a user on Cloud, right? So if we want to fix it for everyone, we'd have to fix the scenario you mentioned in https://github.com/elastic/elasticsearch/issues/73498#issuecomment-850286892 for regular non-snapshot relocation. Unless I am misunderstanding something?

DaveCTurner commented 2 years ago

That's true, you need to be running in an environment that supports relocate-via-snapshot to get the benefits, but I don't think we have any plans to address this in other environments so I'd rather close this to indicate that we're considering this to be done (for now at least).

dakrone commented 2 years ago

That's sounds fine with me, as long as we're specific/descriptive about it.