Currently when a SolrCloud is scaled down (the SolrCloud.spec.replicas option decreases), replicas are left on the Solr Pods that are being decommissioned.
This is problematic, because the cluster state will be unhealthy until the SolrCloud is scaled back up, and that pod is recreated.
When doing a rolling restart of SolrClouds with ephemeral data, the Solr operator will move data (replicas) off of a Solr pod before that Solr pod is deleted. This same logic can be used to ensure that the clusterState of Solr is healthy as the cluster is scaled down.
Right now, the safest way of ensuring this is to do a scale down, 1 pod at a time.
The current Solr REPLACENODE API does not accept a list of nodes to put the new replicas on.
Therefore, if we were trying to remove the last two pods in the cluster at the same time, we couldn't ensure that the replicas of one decommissioned pod don't end up on the other decommissioned pod.
However there is an exception of the cluster is scaling down to 0 pods.
There are a couple of things we could do in this case:
Delete all collections
Leave the replicas there (if there is persistent storage and we are not deleting PVCs that have been scaled down)
I say that in the beginning we just use the second option, as I don't think it will be a popular thing to do anyways, and we can always add in the deletion of all data at another time.
Parent Ticket: https://github.com/apache/solr-operator/issues/536
Currently when a SolrCloud is scaled down (the
SolrCloud.spec.replicas
option decreases), replicas are left on the Solr Pods that are being decommissioned. This is problematic, because the cluster state will be unhealthy until the SolrCloud is scaled back up, and that pod is recreated.When doing a rolling restart of SolrClouds with ephemeral data, the Solr operator will move data (replicas) off of a Solr pod before that Solr pod is deleted. This same logic can be used to ensure that the clusterState of Solr is healthy as the cluster is scaled down.
Right now, the safest way of ensuring this is to do a scale down, 1 pod at a time. The current Solr REPLACENODE API does not accept a list of nodes to put the new replicas on. Therefore, if we were trying to remove the last two pods in the cluster at the same time, we couldn't ensure that the replicas of one decommissioned pod don't end up on the other decommissioned pod.
However there is an exception of the cluster is scaling down to 0 pods. There are a couple of things we could do in this case:
I say that in the beginning we just use the second option, as I don't think it will be a popular thing to do anyways, and we can always add in the deletion of all data at another time.