Canonical Kubernetes is an opinionated and CNCF conformant Kubernetes operated by Snaps and Charms, which come together to bring simplified operations and an enhanced security posture on any infrastructure.
GNU General Public License v3.0
20
stars
6
forks
source link
Cleanup on failed `k8s bootstrap` an `k8s join-cluster` attempts #521
Bootstrap control plane, bootstrap worker and join control plane hooks are refactored. We always defer a function that checks the result of the hook.
In the case of preRemove, we simply log the error and proceed, otherwise the node is removed by microcluster peers but not the underlying dqlite database, breaking the cluster
In the case of k8s bootstrap, remove all configs, stop control plane services, then use ResetClusterMember. This resets the microcluster state. Note that this runs automatically by k8sd, no manual action from the client is required.
In the case of k8s join-cluster for worker nodes, similarly revert configs, then use ResetClusterMember
In the case of k8s join-cluster for control plane nodes: When the postJoin hook runs, the node has already joined microcluster. Therefore, we need to revert configs, but also make sure to use DeleteClusterMember, such that the failed node is removed from the cluster before resetting.
Notes
The wait for a node to be not Pending before removing has been moved to the k8s remove-node command, instead of delaying the completion of the k8s join-cluster command.
Summary
Merge after #520
Changes
k8s bootstrap
, remove all configs, stop control plane services, then useResetClusterMember
. This resets the microcluster state. Note that this runs automatically by k8sd, no manual action from the client is required.k8s join-cluster
for worker nodes, similarly revert configs, then useResetClusterMember
k8s join-cluster
for control plane nodes: When thepostJoin
hook runs, the node has already joined microcluster. Therefore, we need to revert configs, but also make sure to useDeleteClusterMember
, such that the failed node is removed from the cluster before resetting.Notes
k8s remove-node
command, instead of delaying the completion of thek8s join-cluster
command.