Open stuartwdouglas opened 1 month ago
IIRC this used to work prior to the change to a pull model. As part of the runner state machine, there were timeouts for readiness after which a runner was rejected by the controller and a new runner scheduled.
The runners time out and restart AFAIK, the issue is that if the new runner fails as well the 'deploy' operation just hangs. At some point the controller needs to decide that the deployment just isn't working and abort, keeping the old deployment if it exists.
Ah, I see what you're saying 👍
At present if you deploy something that ends up in
CrashLoopBackOff
FTL will wait forever. We need to be able to handle failed deployments without hanging.