When launching multiple VMs and one fails, only kill that one.

CloudBindle / youxia

A VM provisioner and fleet management tool.

GNU General Public License v3.0

2 stars 1 forks source link

Another issue I've discovered here is that the workers that provisioned OK might actually have enough time to pull a job from the queue before they are reaped. So you could have scenarios where your job queue is draining but no work is getting done because the entire fleet is killed when one or two of them fail to provision. Maybe instead of killing the fleet at the end, would it be possible to do it at the beginning when a failure with one node is detected? Ideally, only the failed node should be reaped, but I realize that might be difficult to do (it would probably involve parsing the text output of ansible).

CloudBindle / youxia

When launching multiple VMs and one fails, only kill that one. #49