apache / solr-operator

Official Kubernetes operator for Apache Solr
https://solr.apache.org/operator
Apache License 2.0
243 stars 112 forks source link

Rebalance ephemeral Solr cloud after a rolling restart #615

Closed HoustonPutman closed 8 months ago

HoustonPutman commented 10 months ago

When the solr operator does a rolling restart of a SolrCloud that uses ephemeral data, it will move replicas off of a node before restarting it. Therefore by the end of the rolling restart, at least one node won't have any replicas on it because no other nodes have had their replicas moved after it was restarted (because it was the last node to be restarted). This results in an unbalanced cluster.

The solution is to use the new Balance Replicas feature of Solr (introduced in 9.3), like we do for Scale Up operations, after all nodes are successfully live after being restarted. Now that we have locked cluster operations, this should be much easier.