When the solr operator does a rolling restart of a SolrCloud that uses ephemeral data, it will move replicas off of a node before restarting it. Therefore by the end of the rolling restart, at least one node won't have any replicas on it because no other nodes have had their replicas moved after it was restarted (because it was the last node to be restarted). This results in an unbalanced cluster.
The solution is to use the new Balance Replicas feature of Solr (introduced in 9.3), like we do for Scale Up operations, after all nodes are successfully live after being restarted. Now that we have locked cluster operations, this should be much easier.
When the solr operator does a rolling restart of a SolrCloud that uses ephemeral data, it will move replicas off of a node before restarting it. Therefore by the end of the rolling restart, at least one node won't have any replicas on it because no other nodes have had their replicas moved after it was restarted (because it was the last node to be restarted). This results in an unbalanced cluster.
The solution is to use the new Balance Replicas feature of Solr (introduced in 9.3), like we do for Scale Up operations, after all nodes are successfully live after being restarted. Now that we have locked cluster operations, this should be much easier.