apache / solr-operator

Official Kubernetes operator for Apache Solr
https://solr.apache.org/operator
Apache License 2.0
246 stars 111 forks source link

Support managed scale up of SolrClouds #567

Closed HoustonPutman closed 1 year ago

HoustonPutman commented 1 year ago

Parent Ticket: https://github.com/apache/solr-operator/issues/536

Currently when a SolrCloud is scaled up (the SolrCloud.spec.replicas option increases), new Solr Pods are created and no replicas are placed on them until the user intervenes manually. It is unlikely that most people want (more) dedicated "compute" Solr nodes when scaling their cluster up. So what we want to do is move replicas to these newly-created Solr pods once they are up and running.

Unfortunately, as of Solr 9.0, the UtilizeNode API was removed from Solr along with Autoscaling, so we need a new API to accomplish this. The BalanceReplicas API is currently under development, so that it can be used for this feature. Instead of just utilizing new nodes, the BalanceReplicas API would balance replicas across the entire cluster. It will also balance across just a subset of nodes, but we likely don't need to use that feature. It might be something to think about for users spanning Solr clouds across multiple kubernetes clusters...

This scale-up operation would need to be a cluster operation, much like the scale-down operation. We need to make sure other cluster operations, such as scaling down and rolling restarts don't happen at the same time.

However if a user tries to scale-up and there is an issue creating new pods, such as a lack of resources, etc. We do need to make sure that the scale-up can be aborted so that the resources can be lowered. In that case the Scale up would need to fail, and let the rolling restart operation start. After the rolling restart operation, the scale-up can then try again.