banzaicloud / koperator

Oh no! Yet another Apache Kafka operator for Kubernetes
Apache License 2.0
783 stars 195 forks source link

Kafka broker not coming back during scale-down #776

Closed ecojan closed 2 years ago

ecojan commented 2 years ago

Is your feature request related to a problem? Please describe. During a scale-down operation if a broker that is to be removed is deleted in a non controlled fashion (while the data is being drained from the Kafka broker to other brokers) the operator doesn't bring the broker back up to finish this and then remove it again. This in turn will slow down new replicas getting in sync (as they will have to pull from 2 or even worse, 1 in sync replica). Even more concerning, if there is a K8s cluster rollout restart (due to VM upgrades for examples) if the Kafka pods don't come back up, this will result in offline partitions.

Describe the solution you'd like to see If a broker is killed in a non controlled fashion while it's also being drained, the broker should be brought back in the cluster and removed after all replicas are removed.

bartam1 commented 2 years ago

Dear @ecojan! Please if you can test this solution and review the PR. Thank you!

cc @amuraru

ecojan commented 2 years ago

Thank you @bartam1 will pick it up to test in a local environment, meanwhile will also look and review the PR! Let's continue the discussion there.