Today we had an issue with nodes not being in a good state, with volumes not even mounting, and the solution to the problem was to get rid of the "poisoned" node.
It turns out that doing that is not as straightforward as with BOSH 😅
It'd be great to have this operational aspect of hush-house documented somewhere.
the tl;dr:
remove the instance from the instance group in GCP
delete the instance
wait for the pod to go away, and scale up to be triggered
Today we had an issue with nodes not being in a good state, with volumes not even mounting, and the solution to the problem was to get rid of the "poisoned" node.
It turns out that doing that is not as straightforward as with BOSH 😅
It'd be great to have this operational aspect of
hush-house
documented somewhere.the tl;dr: