Closed srteam2020 closed 2 years ago
Can you confirm it happens when deletePVC is set to true because otherwise it's expected
@cscetbon Thanks for reply!
Can you confirm it happens when deletePVC is set to true because otherwise it's expected
Yes we set deletePVC to true and the PVC is supposed to be deleted.
The PVC does not get deleted because the controller crashes at a particular point and cannot fullfill all the reconcile updates. We have read the source code very carefully to draw the conclusion. More concretely, decommissioned pod is deleted and podLastOperation
is still StatusOngoing
. Although the controller can restart, it cannot make progress to delete the PVC from this inconsistent state.
We are currently trying to send a PR to fix it. A potential approach is to switch the update/delete order to avoid the inconsistent state.
This bug is hard to trigger as it only manifest when crash happens at particular timing. But once triggered, the controller will not be able to recover. We actually have an open-sourced tool that can reliably reproduce this bug (when deletePVC set to true) which helps us diagnosis the problem. Please let us know if you also want to reliably reproduce the bug and we can help you on that.
Bug Report
We find that scaling down a single dc rack (by reducing
nodesPerRacks
) might end up in a dirty state (the pod is deleted but the pvc is still there) if the operator crashes in the middle of a reconcile and restarts. The accidental dirty state will also prevent the operator from handling any future user request.More concretely, when scaling down the dc rack (statefulset), casskop will do the following:
CR.podLastOperation.status
(previouslyStatusOngoing
) toStatusFinalizing
(but have not issuedUpdate
to k8s yet)podLastOperation
(toStatusFinalizing
)podLastOperation
isStatusOngoing
, try to get the decommissioned pod. If there is an error, the operator directly returns the error. Otherwise, the operator continues its reconcile.podLastOperation
isStatusFinalizing
, try to get the decommissioned pod. If encounterNotFound
error, delete the PVC and setpodLastOperation.status
tostatusDone
Say we set
nodesPerRacks
from 2 to 1. The operator will run the above steps. If the operator pod crashes after step 2, the decommissioned pod will be deleted (as the statefulset is resized), butpodLastOperation
is stillStatusOngoing
(since 3 is not executed yet). After the operator pod restarts, it will go to branch 4.i, and since the last pod is already deleted, there will beNotFound
error when trying to get the last pod. The operator simply ends this round of reconcile by returning with the error and is never able to clean up the PVC or serve further user requests.What did you do? Set the
nodesPerRacks
from 2 to 1What did you expect to see? The pod and the pvc get deleted.
What did you see instead? Under which circumstances? The pod is deleted but the pvc is still there. And any further user operation is refused by the operator.
Environment
casskop version: f87c8e05c1a2896732fc5f3a174f1eb99e936907 (master branch)
Kubernetes version information: 1.18.9
Cassandra version: 3.11
Possible Solution A potential solution is to directly issue
Update
after changingCR.podLastOperation.status
toStatusFinalizing
in step 1. So that even if the operator crashes in the middle of reconcile, it should still be able to resize the statefulset and delete the pvc, and move toStatusDone
eventually.Additional context We are willing to send a PR to help fix this issue.