Closed 4n4nd closed 1 year ago
Hey @4n4nd, I haven't had a chance to look into this yet, but the tests are failing due to https://github.com/IBM/operator-for-redis-cluster/actions/runs/3873873803/jobs/6629153343#step:4:34. Looks like a linter problem?
It looks like all the tests pass. I will test this out ASAP! Thanks!
@4n4nd, do you have a reliable way of reproducing this issue?
@4n4nd, do you have a reliable way of reproducing this issue?
I believe you can set tolerations on k8s to not evict pods even when the node is down (docs).
The bigger issue is that since it fails one sanity check every time (terminatingpod check), it doesn't perform other checks. And this operator is designed to perform actions only when any of the sanitychecks needs it. This is why the operator essentially gets stuck in a non-reconciling state.
The operator will now disassociate pods that are stuck in terminating state for over a minute. Also, it will add a label
"redis-operator.k8s.io/marked-for-termination" = "true"
, in case the pods need to be cleaned up later.Resolves #84